Required schemas in 0.2

Here’s a proposed interface and spec for how something like split_multi_hts should be checked in 0.2:

@handle_py4j 
@typecheck(...)
def split_multi_hts(dataset, ...args):
    """docstring"""
    dataset._require_fields('split_multi_hts',
                            GT=TCall(),
                            AD=TArray(TInt32()),
                            DP=TInt32(),
                            GQ=TInt32(),
                            PL=TArray(TInt32()))
    ...rest of the method...

If you pass a dataset without the GT field, you’d get an error like:

ValueError: 'split_multi_hts': required field 'GT' is missing
    Expected a dataset with entry fields:
        'GT': Call
        'AD': Array[Int32]
        'DP': Int32
        'GQ': Int32
        'PL': Array[Int32]

If you pass a dataset with a GT field, but it’s row-indexed, you’d get:

ValueError: 'split_multi_hts': field 'GT': expected entry field, found row field
    Expected a dataset with entry fields:
        'GT': Call
        'AD': Array[Int32]
        'DP': Int32
        'GQ': Int32
        'PL': Array[Int32]

If you pass a dataset with a GT of type TInt32, you’d get:

ValueError: 'split_multi_hts': field 'GT': expected type Call, found type Int32
    Expected a dataset with entry fields:
        'GT': Call
        'AD': Array[Int32]
        'DP': Int32
        'GQ': Int32
        'PL': Array[Int32]

I feel mildly concerned that the six lines detailing all the expected fields will distract from the actual issue.

What is the argument for including them?

I would like the interface to check all the fields at once, so if you passed the schema:

{ GT: Call, AD Int32 }

you’d receive an error like:

ValueError: 'split_multi_hts': Found four issues:
  1) field 'AD': expected type Call, found type Int32
  2) required field 'DP' is missing (should have type Int32)
  3) required field 'GQ' is missing (should have type Int32)
  4) required field 'PL' is missing (should have type Array[Int32])

yeah, that’s much better. Let’s do that!