User-defined functions

User-defined functions, esp. with the hail2 interface, are now straightforward. Existing functions should be broken into two classes: intrinsic, those must be implemented in the JVM (primitive type operations, say, and calling out to 3rd party libraries like Apache Commons Math), and those that could be user-defined in Python. This will greatly improve the value of hail-contrib, since it will allow people to define expression language extensions as well as operations on tables and matrices.

Here is a proposed syntax:

add = Function("add", TInt32(), x = TInt32(), y = TInt32())
add.set_body(add.x + add.y)

vds.annotate_variants(totalNonRef = add(vds.va.nHet, vds.va.nHomVar))

Now, to play devil’s advocate against my own proposal: with the advent of hail2, can’t we just define expr language functions directly in Python? Yes! The above example can also be written as:

def add(x, y):
  return x +  y

vds.annotate_variants(totalNonRef = add(vds.va.nHet, vds.va.nHomVar)

So why even bother? A few thoughts:

  • Defining functions in Python amounts to inlining. We might not want to inline large functions, and this gives us an abstraction that allows us make that decision.

  • This gives us an abstraction barrier for type checking. This way, we get a type error at the site of the function call instead of in the bowels of the body of the function with which the caller might not be familiar.

  • Finally, and most importantly, we can allow functions to be recursive:

    fact = RecursiveFunction("fact", TInt32(), x = TInt32())
    fact.set_body(
        where(fact.x == 1,
              1
              fact.x * fact(x - 1)))
    

This would immediately make the expression language significantly more powerful.

Thoughts?