LowerTableIR and nested CDAs

The Problem

Consider the following python script:

ht1 = ...
ht2 = ...

ht2.aggregate(hl.agg.count() + ht1.aggregate(hl.agg.count(), _localize=False)

This will generate an IR that looks like:

(TableAggregate
  ht2
  (ApplyBinOp +
    (ApplyAggOp count)
    (TableAggregate
      ht1
      (ApplyAggOp count))))

My initial design for lowering TableAggregate was roughly the following:

  1. Compute requiredness over the whole IR to be lowered
  2. Recursively lower with the RequirednessAnalysis as auxillary information
  3. When hitting a TableAggregate, lower the child by extracting with the computed requiredness.

I expected that we’d only need to compute requiredness once at the beginning. However, Arcturus pointed out that the above strategy is insufficient for the above IR, which has a nested TableAggregate inside the query.

I investigated further, and none of the current IRs support a pattern that has a relational IR inside a value IR (the newRow for TableMapRows, the filter for TableFilter, etc).

Proposal 1: restructure IR before running LowerToCDA

One option to fix this is to introduce a rewriting pass that runs before LowerToCDA, which extracts these nested patterns as RelationalLet IRs near the root of the computation.

This has an additional benefit, which is that we gain the same property guaranteed by InterpretNonCompilable in the TableIR.execute runtime: a TableAggregate IR that appears with nesting depth > 0 will still only be run once.

The complexity here is that we need to deal with figuring out how these RelationalLet nodes are lowered to normal lets, with bindings inserted as broadcast values in CDA where necessary.

Proposal 2: recursively lower value IR children

Another option is to recursively lower value IR children of nodes like TableFilter/TableAggregate in LowerTableIR. This is simpler, but has a few negative properties:

  1. We need to recompute requiredness for each of these nodes. This scales quadratically with the size of an IR.
  2. We can emit CDAs inside of loops, and we currently don’t have a code motion pass to clean this up.

After some more thought and a discussion with Patrick (thanks Patrick!) I’m leaning toward proposal 1, with the following mechanism for handling relational lets:

  1. Lowering functions take as parameter an additional Env[Type], the available relational let bindings at that IR.
  2. Any time we generate a CDA node, we include all bindings in the aforementioned Env in the broadcast vals struct.
  3. This will insert broadcast vals into every CDA, not just the usages, but the pruner will clean them up where not needed.

Here is another perspective:

I’d call that a subquery (although it is a trivial one), and I’d say it should error because we don’t support subqueries. There user should explicitly use rbind to write a subquery free pipeline:

hl.rbind(ht1.aggregate(hl.agg.count(), _localize=False),
  lambda ht1count: ht2.aggregate(hl.agg.count() + ht1count))

extracts these nested patterns as RelationalLet IRs

I’m confused why these will be relational lets. I thought relational lets are only needed when pulling out values in “freestanding” matrix or table IR. IR in lowering are always value IR at the toplevel. Therefore, subqueries can be pulled up to the value level and bound in a normal let.

Proposal 2

We need to recompute requiredness for each of these nodes. This scales quadratically with the size of an IR.

I don’t understand why this is necessary. Just pass in the requiredness analysis recursively.

We can emit CDAs inside of loops, and we currently don’t have a code motion pass to clean this up.

I’m also confused by this. If you have a relational node in a loop, you need to run it on each iteration. Or do you mean in a loop inside e.g. TableFilter? That has to get pulled out.

To implement proposal 2, you need lowering to keep track of and return the set of relational nodes that have been pulled out so they can be placed in globals when you hit the value level again.

These proposals essentially seem the same to me: either you pull things out beforehand, or you pull them out while you do the lowering. I’d probably do proposal 2 since LowerTableIR already has all the infrastructure to do this. You can just throw the lowered subqueries in the globals, right?

This doesn’t work right now – How do I embed this in a TableFilter, for instance, without writing the entire table query in a single Python expression? Python would also need to have different binding/scope rules.

We’ll have patterns like:

RelationalLet 
  foo
  TableCollect ...
  TableAggregate
    TableFilter
       some table
       some expression of RelationalRef("foo")
    some query

A normal Let won’t put “foo” into the environment where it’s needed, which is inside a relational IR within the body.

I’m referring to IR that might be generated by a user like:

StreamMap
  Stream Range 0 N 1
  idx
  ArrayRef
    TableCollect ...
     idx

We obviously want to run the TableCollect only once. Proposal #1 has the nice property that this will be lifted out. I think it’s hard to use the _localize=False stuff to do horrible things in this model, which I like.

This doesn’t work right now – How do I embed this in a TableFilter, for instance, without writing the entire table query in a single Python expression?

You could build it modularly but you’d have to pass the aggregated value in. Without subqueries, that’s what you need to do.

Python would also need to have different binding/scope rules.
A normal Let won’t put “foo” into the environment where it’s needed, which is inside a relational IR within the body.

I see. I’ve never really understood relational lets.

I’d argue relational lets should go away and let bindings should be visible in relational nodes. I see why this was hard to implement in the Table.execute world, but this is easy to do in lowering.

StreamMap …

What is that StreamMap inside?

Proposal #1 has the nice property …

I’m saying the output of Proposal #1 and #2 should be the same. We need to understand what IR we want to produce, and implement that. There is no reason to pull something out in #1 and not pull it out in #2. Whether the code is a separate pass or done inside of lowering seems incidental to me.

Proposal #2 was the super-simple but wrong strategy of recursively running the entire (analyze + transform) pass, and is mostly here as record of the design process, and as a contrast to the right solution.

Something like this:

hl.eval(hl.range(0, 10).map(lambda i: ht.collect(_localize=False)[i])

We’ve discussed this in the past and I’m on board with this being the right design, but it’s a fair bit of work, and probably not worth doing right now until some of our urgent goals are met. RelationalLet/RelationalRef do not exist in lowered IR.