Optimization ideas

cseed · April 24, 2018, 3:10am

This thread is to keep a list of ideas for the IR optimizer. Compare: Ideas for improving generated code.

After dead field elimination (which @tpoterba is working on), I think common subexpression elimination will have the next highest impact. @konradjk’s pipelines regularly slow massive amounts of duplicated code.
Push filters down. (AggFilter through AggMap: PRed in: https://github.com/hail-is/hail/pull/3429) Relational filters through maps and esp. joins.
Downward code motion. If an expresison is bound in a let in an outer context but only referenced in one part of a conditional, it can be moved into the conditional. It might be natural to do this as part of CSE.
Hoist loop invariants. Expressions inside Array* operations that don’t depend on the elements should be bound by a let outside.
Code hoisting. If you’re doing a per-element operation (that is, indexed by row and col axis), but there is a subexpression that only depends on the row, it should be compute first per-row. Same for global (invariant) expressions in table operations. As a simple case, we saw this as a problem with string constants (although that can be fixed in Python … by doing this to begin with).
InsertFields(MakeStruct(…), …) can be turned into a MakeStruct. (@tpoterba did you have this somewhere?)
replace simple regular expressions with String operations like startsWith, endsWith, contains, etc.
ArrayMap(MakeArray(…), …) should be simplified if the individual terms are constant-foldable. Same for ArrayFilter.
fuse Matrix/Table filter and maps if aggregators don’t get in the way. (Should filters even have aggregators?) Esp. important for MatrixMapCols.

cseed · April 24, 2018, 3:13am

I will create a separate thread to discuss opportunities to improve the generated code.

cseed · April 25, 2018, 5:18pm

From @konradjk: mt = mt.drop_cols().select_rows(*mt.row_key) is 2-3 orders of magnitude faster than mt = mt.select_rows(*mt.row_key).drop_cols()

tpoterba · April 25, 2018, 8:24pm

is this the col annotations copy for each row?

Topic		Replies	Views
Ideas for improving generated code	0	689	April 24, 2018
Expression compiler use cases	0	636	October 24, 2017
Blocked MT Woes	0	586	May 22, 2020
Path to get rid of AST	11	810	April 23, 2018
Proposed changes to the python select/annotate/drop interface for keys	0	616	April 2, 2018

Optimization ideas

Related topics