The original aggregator interface, which assumed one aggregable input per aggregator, has broken down with multiple input aggregators (linreg, correlation, group_by) and, I would argue, count (which has zero or one value, depending). The problem is the semantics of filter and explode. Right now correlation supports neither, and linreg supports them in the first argument (and it is implicitly applied to the second) and group_by … has unexpected semantics but I don’t remember the details. (@jigold?)
I propose the following. Filter and explode go outside the aggregation expression:
hl.agg.filter(hl.len(x.alleles) == 2, hl.agg.count())
explode now takes a lambda that includes the exploded value:
hl.agg.explode(x.genes, lambda gene: hl.agg.counter(hl.tuple([gene, x.GT])))
where the argument is individual values of the collection. In particular,
x.genes inside the counter would have the (constant) array value as
gene iterated over the values.
filter and explode should be able to be applied to the second argument of
group_by in the obvious way.