Proposal for aggregators


#1

Aggregators are implemented in terms of standard operations after continuatation passing (conditionals, looping over arrays, variable binding), but at the level of the IR they have a completely different (and duplicate) representation. I propose to unify these representations. This will have the advantage that optimizations that improve standard expressions will improve aggregator expressions as well. For example, once we have common subexpression eliminate, it will immediate apply to the (collective) extracted aggregators.

My proposal is as follows:

  • There are still two scopes, the normal scope and the aggregable scope. TAggregable still exists (for now) and describes the aggregable scope, but it no longer has an elementType.

  • (ApplyAggOp a op args) becomes (ApplyAggOp x op args) where x has type void.

  • AggMap, AggFilter and AggFlatMap are all gone.

  • (SeqOp a) exists in the IR pre extract aggregators, has type Void, and represents the (mutating) seqOp on the containing RegionValueAggregator of the containing ApplyAggOp.

Thus, the new IR represents the current aggregators after doing continutation passing.

So:

(ApplyAggOp Sum
  (AggMap __uid_3
      (AggFilter __uid_4
        (AggIn)
        pred)
      value))

becomes:

(ApplyAggOp Sum
  (If
    pred
    (SeqOp value)
    (Begin)))

which, after extraction, becomes:

(If
  pred
  (SeqOp value i agg)
  (Begin)))

where i is the index of the aggregator corresponding to the (ApplyAggOp Sum …) and agg is the corresponding CodeAggregator[_].

I guess this can either be done in toIR, or in ExtractAggregators as a first step.


#2

how do we do explode / filter?


#3

If by explode, you mean flatMap, an expression like:

(ApplyAggOp Sum
    (AggFlatMap a name body))

will get compiled as:

(ApplyAggOp Sum
    (ArrayFor a x
        (SeqOp x)))

For more on ArrayFor, see my recent PR: https://github.com/hail-is/hail/pull/3422