Grand plan for Python typechecking and type annotations

Grand plan for Python type annotations

I’ve been experimenting with Python 3 type annotations in a few recent PRs, and I really like them. Here’s the PEP. With some work, moving our codebase to use type annotations will provide us with the following:

  • Better IDE support for us and our users
  • Automatic documentation
  • Integrated runtime typechecking
  • Static typechecking with mypy

I’ll address these point by point

Better IDE support

This one is low-hanging fruit. If we add type annotations to our public interfaces, everybody wins. See here for an example. However, for the immediate future, we’ll exist in a world where we have 3 types of types:

  • documented types in docstrings
  • typechecked types declared in decorators
  • unchecked type annotations

This is obviously not ideal. So…

Automatic documentation + Integrated runtime typechecking

Some cursory experiments have shown that it’s not possible to generate documentation for any method with a decorator. Since pretty much every public Hail method is decorated, this prevents us from combining “documented types” or “typechecked types” into the type annotations independently, instead requiring that we go from a 3-type world to a 1-type world in one go.

Here’s an example of one of a few Python3 modules that do something similar: https://github.com/agronholm/typeguard
I’ve poked around there (and did a bit last year as well) for ideas. I don’t think it’ll be too hard to transform our current system to use a similar approach.

I imagine that instead of methods looking like:

@typecheck(a=str, b=int)
def foo(a, b):
    # body

they will look like:

def foo(a: str, b: int) -> Foo:
   typecheck()
    # body

There’s one thing we lose here, which is the “transformer” typecheckers (which convert to expr, and convert strs to types and ReferenceGenomes currently). This seems like an OK price to pay, and will probably make our code more easily maintained. A method that takes a ReferenceGenome will probably need to look something like:

RG = Union[ReferenceGenome, str]
def locus(contig: str, position: int, rg: RG) -> Locus:
    rg = get_rg(rg)

Static typechecking with mypy

We looked into this when our Python interface first came out, and concluded it wasn’t very mature. Patrick brought up the idea again before the holidays, and I told him that it wasn’t very mature – I could have been wrong this time around. I don’t have much to say here, but it’s certainly worth looking into after we’ve sorted out the other items here.

I must admit I will miss the auto-conversion. Seems to add a lot of boilerplate. Dunno if that’s worse than the documentation boilerplate.

yeah, I agree. There may be a way to do it still.

I think I’ve convinced myself that the IDE benefits and simplification of docs + code is worth the additional boilerplate. Happy to discuss tomorrow/later this week.