Path to get rid of AST

cseed · April 10, 2018, 4:17pm

I think we’re getting close to get rid of AST. Things left I see:

Support for strings. Actually, I think these will work now if you do registerCode and take Longs pointing to (string) region values, and convert between region and JVM strings.
We need a way to capture types (for str, json). We could also emit code, but I think this will be easier and these operations aren’t particularly performance sensitive. Basically, I to generate the following sort of code in Emit for types:
```
  var _t: Type = null
  def t: Type = { if (_t == null) _t = Type.parse("Struct { f: Float64, ... }"); _t }
```
The same thing for the reference genome, as JSON.
We need to generate comparison code for types in Emit.
We need support for dict and set. With deforestation, I think this is easy: dicts and sets are just (sorted) array region values. We need ToSet and ToDict, which take containers and sort them. Operations ArrayMap, ArrayFilter and ArrayFold can just be ContainerMap, ContainerFilter and ContainerFold. We need to implement SetContains and DictGet. I think that’s it.
We need to move everything out of FunctionRegistry. Most stuff can go as-is with the existing IRFunctionRegistry support. This includes region value aggregators that haven’t been implemented yet.
Remaining methods on MatrixTable and Table that use AST and don’t have IR alternatives need them. In many cases, these can be simplified by pushing functionality into Python and not evaluating expressions at all. These include:
- ~~annotateColsTable~~
- ~~selectCols~~
- ~~queryEntries, queryGlobal, queryCols, queryRows~~
- ~~maximalIndependentSet~~
- ~~the regression methods~~
- ~~filterEntries~~
- ~~annotateGlobals variants~~
- ~~Table.aggregate~~
- MatrixTable.aggregateRowsByKey
- ExportPlink (wip @jigold)
- MatrixTable.groupColsBy
- ~~MatrixTable.makeKT~~
- ibd maf
- FilterAlleles
- Table.aggregate (aggregate is overloaded, this is the group_by variant)
- ExportGen (uses queryVA)

Finally, we need a way to build the IR from Python without going through the Parser/AST. I’ve already started that.

cseed · April 10, 2018, 8:44pm

Just so we don’t duplicate work:

@wang comparisons, infrastructure for capturing types and the genome reference, dict and set, and overseeing the FunctionRegistry conversion
@jigold is doing the MT.query* functions

If you take something on, please note it here.

jigold · April 10, 2018, 9:14pm

I’m doing Table.annotateGlobals and MatrixTable.annotateGlobals.

tpoterba · April 11, 2018, 9:46am

I’ll do filterEntries. I want to learn how to implement it as an annotate node.

cseed · April 11, 2018, 6:47pm

@tpoterba My bad, I did filterEntries this morning before seeing this: https://github.com/hail-is/hail/pull/3354 . I should have posted/checked here first. Sorry!

jigold · April 12, 2018, 9:44pm

I’ll do export_gen and export_plink.

tpoterba · April 13, 2018, 12:25am

I’ll do regression. I may regret this.

tpoterba · April 16, 2018, 6:53pm

I already did Table.aggregate with the interpreter stuff.

tpoterba · April 17, 2018, 10:56am

Doing annotateColsTable

jigold · April 18, 2018, 4:24pm

In the process of working on ExportPlink, I put MatrixTable.colsTable into the IR as well as Table.export.

jigold · April 20, 2018, 9:57pm

I am working on selectCols.

jigold · April 23, 2018, 5:53pm

I’m working on MaximalIndependentSet.

Topic		Replies	Views
Compiler Roadmap	1	685	February 14, 2018
Optimization ideas	3	974	April 25, 2018
Hail2 Python to-do items	3	614	January 17, 2018
Proposed changes to the python select/annotate/drop interface for keys	0	617	April 2, 2018
Notes on EmitStream	0	597	December 16, 2019

Path to get rid of AST

Related topics