User-defined functions, esp. with the hail2 interface, are now straightforward. Existing functions should be broken into two classes: intrinsic, those must be implemented in the JVM (primitive type operations, say, and calling out to 3rd party libraries like Apache Commons Math), and those that could be user-defined in Python. This will greatly improve the value of hail-contrib, since it will allow people to define expression language extensions as well as operations on tables and matrices.
Here is a proposed syntax:
add = Function("add", TInt32(), x = TInt32(), y = TInt32())
add.set_body(add.x + add.y)
vds.annotate_variants(totalNonRef = add(vds.va.nHet, vds.va.nHomVar))
Now, to play devil’s advocate against my own proposal: with the advent of hail2, can’t we just define expr language functions directly in Python? Yes! The above example can also be written as:
def add(x, y):
return x + y
vds.annotate_variants(totalNonRef = add(vds.va.nHet, vds.va.nHomVar)
So why even bother? A few thoughts:
-
Defining functions in Python amounts to inlining. We might not want to inline large functions, and this gives us an abstraction that allows us make that decision.
-
This gives us an abstraction barrier for type checking. This way, we get a type error at the site of the function call instead of in the bowels of the body of the function with which the caller might not be familiar.
-
Finally, and most importantly, we can allow functions to be recursive:
fact = RecursiveFunction("fact", TInt32(), x = TInt32()) fact.set_body( where(fact.x == 1, 1 fact.x * fact(x - 1)))
This would immediately make the expression language significantly more powerful.
Thoughts?