I think we misunderstand one another. For every type, we have a way to convert annotations of that type to values of a “Spark type” (SparkAnnotationImpex
). We currently define this mapping for every type. The question/suggestion I made: the correspondence between values (whether realized as region values or annotations) should only be defined on FundamentalType
. A ComplexType
can be imported-to/exported-from Spark
by using the ComplexType
’s realization. I suppose this is irrelevant if we are imminently removing the ability to convert a KeyTable
to/from Spark DataFrame
.
I agree that this is about physical types. If physical types will be added co-temporally with the above proposed changes, then these operations belong there. If not, I think it simplifies the code and the conception to treat FundamentalType
s as the only types that have alignment
and byteSize
(i.e. until further notice FundamentalType
s are one-to-one with the physical types).
I envision a Variant’s JSON-representation including the genome reference. A la:
{ gr: "GRCh38"
, contig: 1
, pos: 1
, ref: "A"
, alt: "B"
}
I suppose there are situations wherein we want to define the genome reference once for an array of Variant
s, but that behavior seems somewhat custom and requiring a fair bit more complicated logic than the current JSONAnnotationImpex
contains.
Ah, I misunderstood your comment on my PR to rv-ify export_plink
. I thought you wanted to created a VariantView so that we could treat anything with the right fields as a Variant.
Agreed. I’m referring to User-defined functions, wherein the user must explicitly annotate types on functions. It seems reasonable that I might write a function on structs with a GT
and and a PL
field. The user can rewrite their function to take two arguments, gt
and pl
. This is fine, but certainly not the nicest possible programming interface. This is not an urgent matter (UDFs don’t currently exist), but it comes to mind when I think about how our users can best interact with Hail.