Lifting over methods from Hail 0.1 to Hail 0.2
One of the high-priority agenda items right now is completing the Hail 0.2 API. We’ve all been working super hard in the last months, and have a chance to get all of that work into the hands of some beta-testers very soon. There are already some (brave) users pushing on devel
, and as soon as the API is complete and stable we can start pushing more people in.
The generic interface is getting pretty complete, but the majority of the statistical and genetics methods are conspicuously absent. We need to change that! To that end, I’m farming out the method liftover task to the group. I think this post has what you’ll need to do that, but let me know if something is unclear so I can add more info.
Organization
In Hail 0.2, the Table and MatrixTable APIs are totally generic. That means no genetics-specific stuff goes there, which has an added benefit of giving us an excuse not to have the über-object we had in 0.1.
Instead, genetics methods will go in a separate subpackage called methods
. The user will not write something like:
vds = vds.variant_qc().sample_qc()
but instead
ds = methods.variant_qc(ds)
ds = methods.sample_qc(ds)
Although this is certainly more verbose, other design decisions have already guaranteed that the big pipelines of chained method calls are dead, though in some cases smaller pipelines will still be possible.
Numpy style docs
As part of the liftover to Hail 0.2, we’re rewriting our Python docs in Numpy style. This docstring style is much more readable in plaintext format (like when someone uses built in ipython help methods), and can be interpreted into the same html docs using the sphinx plugin napoleon
. You’ll need to pip install sphinxcontrib-napoleon
if you want to build the docs locally.
Good examples of Numpy style docs are available here. You can also see examples by looking at the Hail code already lifted over.
Part of readability is keeping lines short enough (<= 80 char) to print in reasonably-sized terminals. I wrote a javascript tool to make it easier to format doc paragraphs. PRs with long doc lines will be rejected!
Other documentation liftover tasks
Our numeric types were renamed (Double=>Float64, etc), and few of the 0.1 methods were updated in devel. You’ll probably need to lift over types if your method includes annotatation docs.
Ensure that the language is as generic as possible – use row / column / entry instead of variant/sample/genotype where possible, this may be a bit confusing for users in the short term but having everything documented consistently will probably be better long-term.
In general, we’re using ‘field’ instead of ‘annotation’ now. A 0.1 “variant annotation” is now a “row field”, and so on.
Steps
- Decide where the method should live in
hail.methods
. There currently exist files
qc.py
,statgen.py
,family_methods.py
. Organization withinmethods
doesn’t really matter; feel free to add a new file if these don’t fit. - Update the method signature and code. Instead of returning a VariantDataset or KeyTable,
the method should return a MatrixTable or Table. - Add the
@require_biallelic
decorator if necessary. - Update docs. This is where I expect most of the time to be spent. You’ll need to:
a. Translate the doc to Numpy style
b. Follow the doc liftover tasks above - Update tests in
hail.methods.tests.py
. Look at the Scala test suite for your method – is it easy to rewrite in Python using the Table/MatrixTable primitives? If so, do it! If not, port
over any tests fromhail.api1.tests.py
, and maybe add a few trivial examples. - Add the method to
__all__
inhail.methods.__init__.py
(don’t forget to change the import too). - Add the method to
docs/methods/index.rst
. Alternatively, if someone wants to figure out whether sphinx can do this automatically (a few hours spent trying to get..automodule::
to work turned up nothing), that would be awesome! - Important: make a judgment call about whether the method is sufficiently generic, or if it is a candidate for more work. Amanda’s work to make PCA generic by having it take an expression for the numeric matrix entry was a huge win – can we do this to your method too? If yes, please communicate that to the group but feel free to leave it for a separate PR (unless it’s easy or you’re excited).
- Also important: communicate any feedback about interface pain points, learning barriers, or inflexibilities to the group!