Current devel
interface:
We have good IDE support and typing throughout, made possible by lambda functions. There’s really no way to
get rid of the second “scope” argument, though.
Example 1: Annotate number of heterozygotes
vds = vds.annotate_variants(
nCaseHets = vds.gs.filter(lambda g, scope: g.isHet() & scope.sa.isCase).count())
Example 2: Have a key table with schema a: String
, b: Int32
, c: Int32
, want to group by a
and compute the sum of 1 / b
where c
is larger than 5.
grp = kt.group_by(kt.a)
grp.aggregate(sum = grp.b.filter(lambda b, scope: scope.c > 5.0).map(lambda b, _: 1 / b).sum())
Proposed interface 1
In this interface, we use an anonymous indicator “X” to hold everything in the current scope. You lose
IDE support / tab completion because X is untyped until runtime.
Example 1: Annotate number of heterozygotes
vds = vds.annotate_variants(
nCaseHets = vds.gs.filter(lambda g: g.isHet() & X.sa.isCase).count())
Example 2: Have a key table with schema a: String
, b: Int32
, c: Int32
, want to group by a
and compute the sum of 1 / b
where c
is larger than 5.
grp = kt.group_by(kt.a)
grp.aggregate(sum = grp.b.filter(lambda b: X.c > 5.0).map(lambda b: 1 / b).sum())
Proposed interface 2
In this model, aggregables need to be explicitly constructed by the agg
method (either apply or an instance method would work). This allows the typing and IDE support.
Example 1: Annotate number of heterozygotes
vds = vds.annotate_variants(
nCaseHets = vds.g.agg().filter(vds.g.isHet() & vds.sa.isCase).count())
Example 2: Have a key table with schema a: String
, b: Int32
, c: Int32
, want to group by a
and compute the sum of 1 / b
where c
is larger than 5.
kt.group_by(kt.a)\
.aggregate(sum = agg(1 / kt.b).filter(kt.c > 5.0).sum())