Hail2 Python to-do items

tpoterba · December 16, 2017, 7:50pm

Necessary code work:

Add the rest of the core methods from VDS/KT to api2 (#2591 does most for KT, order_by is the only outstanding KT method that’s not moved to table there. Same needs to be done for VDS, this isn’t too hard)
Add the non-core methods to hail.methods / hail.genetics.methods
- some stuff here is much harder than the rest, like filter_alleles
- This is mostly just labor, but some require more thought than others, like moving TDT to use hail2 expr
Support intervals in the index_* methods. It’s possible now to join by locus, but not using the annotateLociTable fast path.
Move to Python 3 so argument order is preserved
Test the hail2 api much more rigorously than we do now (at the very least, call each parameter branch for each method!
Typecheck the expression language. This isn’t super trivial, and making a nice system to integrate our typecheck module and expressions will require some thoughtful design work.
Some more organization around the package: monkey patching with import hail.genetics is an idea I like, but want to think about the edge cases first.
Implement history for hail2

Documentation

Document the index_* methods / joins
Translate the Hail Overview tutorial
Make new tutorials to replace the 2 expr ones we have
Fill in docs on api2 methods (they’re not all there yet)
Fill in docs on expression language (things like __mul__ on NumericExpression haven’t been documented)
Write “integrative docs” that provide how-tos for common types of workflows. Show the power of annotate / select / group_by/aggregate, etc.

Longer term QoL:

Move over tests to Python as much as possible. I looked at the linear regression suite and it can be moved entirely into Python without many problems.
Write a type parser in Python. The nested calls into the JVM for Type._from_java make the library feel extremely sluggish on teensy data.
Integrate RV with C/C++, so we can transmit data much more efficiently between Python and Java.
Rethink the expr language function registry, because many functions there can be implemented in terms of others in Python.
add back in de novo

tpoterba · January 16, 2018, 8:45pm

also:

add infoScore aggregator to Python
improve PCA docs and add mean_center=True parameter

tpoterba · January 16, 2018, 8:53pm

fix names of annotations generated everywhere to be python compliant (underscores, not camel case)

tpoterba · January 17, 2018, 8:57pm

Sphinx style:

Use :class: and :meth: rather than :py:class: and :py:meth:.
Use absolute identifiers where possible: :meth:`.linreg` instead of :meth:`hail.methods.linreg`

Topic		Replies	Views
Lifting over methods from 0.1 to 0.2	12	952	January 29, 2018
Hail2 Python interface discussion	21	1314	November 27, 2017
Python organizational update	0	657	February 13, 2018
User-defined functions	0	607	November 4, 2017
Hail2 python docs style	1	603	December 11, 2017

Hail2 Python to-do items

Necessary code work:

Documentation

Longer term QoL:

Related topics