Necessary code work:
- Add the rest of the core methods from VDS/KT to api2 (#2591 does most for KT, order_by is the only outstanding KT method that’s not moved to table there. Same needs to be done for VDS, this isn’t too hard)
- Add the non-core methods to hail.methods / hail.genetics.methods
- some stuff here is much harder than the rest, like filter_alleles
- This is mostly just labor, but some require more thought than others, like moving TDT to use hail2 expr
- Support intervals in the
index_*
methods. It’s possible now to join by locus, but not using the annotateLociTable fast path. - Move to Python 3 so argument order is preserved
- Test the hail2 api much more rigorously than we do now (at the very least, call each parameter branch for each method!
- Typecheck the expression language. This isn’t super trivial, and making a nice system to integrate our typecheck module and expressions will require some thoughtful design work.
- Some more organization around the package: monkey patching with import hail.genetics is an idea I like, but want to think about the edge cases first.
- Implement history for hail2
Documentation
- Document the
index_*
methods / joins - Translate the Hail Overview tutorial
- Make new tutorials to replace the 2 expr ones we have
- Fill in docs on api2 methods (they’re not all there yet)
- Fill in docs on expression language (things like
__mul__
on NumericExpression haven’t been documented) - Write “integrative docs” that provide how-tos for common types of workflows. Show the power of annotate / select / group_by/aggregate, etc.
Longer term QoL:
- Move over tests to Python as much as possible. I looked at the linear regression suite and it can be moved entirely into Python without many problems.
- Write a type parser in Python. The nested calls into the JVM for Type._from_java make the library feel extremely sluggish on teensy data.
- Integrate RV with C/C++, so we can transmit data much more efficiently between Python and Java.
- Rethink the expr language function registry, because many functions there can be implemented in terms of others in Python.
- add back in de novo