Required schemas in 0.2

tpoterba · January 16, 2018, 2:16pm

Here’s a proposed interface and spec for how something like split_multi_hts should be checked in 0.2:

@handle_py4j 
@typecheck(...)
def split_multi_hts(dataset, ...args):
    """docstring"""
    dataset._require_fields('split_multi_hts',
                            GT=TCall(),
                            AD=TArray(TInt32()),
                            DP=TInt32(),
                            GQ=TInt32(),
                            PL=TArray(TInt32()))
    ...rest of the method...

If you pass a dataset without the GT field, you’d get an error like:

ValueError: 'split_multi_hts': required field 'GT' is missing
    Expected a dataset with entry fields:
        'GT': Call
        'AD': Array[Int32]
        'DP': Int32
        'GQ': Int32
        'PL': Array[Int32]

If you pass a dataset with a GT field, but it’s row-indexed, you’d get:

ValueError: 'split_multi_hts': field 'GT': expected entry field, found row field
    Expected a dataset with entry fields:
        'GT': Call
        'AD': Array[Int32]
        'DP': Int32
        'GQ': Int32
        'PL': Array[Int32]

If you pass a dataset with a GT of type TInt32, you’d get:

ValueError: 'split_multi_hts': field 'GT': expected type Call, found type Int32
    Expected a dataset with entry fields:
        'GT': Call
        'AD': Array[Int32]
        'DP': Int32
        'GQ': Int32
        'PL': Array[Int32]

dking · January 16, 2018, 2:23pm

I feel mildly concerned that the six lines detailing all the expected fields will distract from the actual issue.

What is the argument for including them?

I would like the interface to check all the fields at once, so if you passed the schema:

{ GT: Call, AD Int32 }

you’d receive an error like:

ValueError: 'split_multi_hts': Found four issues:
  1) field 'AD': expected type Call, found type Int32
  2) required field 'DP' is missing (should have type Int32)
  3) required field 'GQ' is missing (should have type Int32)
  4) required field 'PL' is missing (should have type Array[Int32])

tpoterba · January 16, 2018, 2:24pm

yeah, that’s much better. Let’s do that!

Topic		Replies	Views
Required Types: Syntax and Semantics	9	791	November 7, 2017
The future of types	3	635	November 18, 2017
0.2 beta checklist	1	723	December 1, 2017
Hail expr types in Python	3	713	February 16, 2018
Proposed changes to the python select/annotate/drop interface for keys	0	617	April 2, 2018

Required schemas in 0.2

Related topics