Remove attributes from Type

I’d like to remove attributes from Type. Now that generic is going in #2480 we’re manipulating the entry (genotype) schema more by directly constructing structs, and we’re seeing errors to differing types that only differ by attributes. (This leads to extremely unhelpful error messages, like: type on branches of if must the same, found: T and T.)

They are used in two places (besides routines that manipulate them): in import_vcf, where they are set, and in export_vcf, where they are used to build the relevant header lines. I suggest dropping the extra header informat in import_vcf (or return them as a python dict) and give export_vcf an optional argument that overrides header details, like:

  format_attributes={'AD': {'Number': 'R', 'Description': 'Allelic Depth'}},
  info_attributes={'AF': {'Number': 'A', Description: '...'}})

Attributes are optional and override the defaults that come from the Hail types.


That’s effectively what @lfrancioli and I do anyway (we override a bunch of ones we know with some functions), so I think that’d be fine by me (assuming that code block meant to say vds.export_vcf()?). An optional return as a dict for import_vcf would be useful (we’ll probably just save as json and then modify what we need).

Cotton may not like this, but we could save this information as global fields.

I"m happy to get rid of attributes as long as there is a mechanism to write VCF INFO field. That said, I have personally found the INFO field useful when reading someone’s else VCF in order to interpret fields I wasn’t familiar with – so something to keep in mind as VDS becomes more widespread and people start sharing more data.

I think we should have something equivalent to SAS’s PROC LABEL. See here for examples:

@jigold made the change, you can see the new interface here: