This document is intended to give a more complete overview of the state and further
development of encoded types or ETypes. I seek to describe the current API which
I believe to be mostly complete, and to sketch out proposed or in progress work
on new ETypes.
What is an EType?
ETypes are our way of interpreting bytes recieved from or written to some
stream. There probably needs to be a distiction between the
and our notion of an encoded type as our
EType subclasses currently delegate
some elements of encoding to other part’s of hail’s reader/writer stack. For
example, LEB integer compression and uniform compression (like lz4) are not
From a programming perspective,
ETypes are the new way for building encoders
and decoders. To create an EType, we use
defaultFromPType or the parser from
When creating encoders or decoders
object methods should be used (there are instance methods with identical names).
We cache the results of these functions so we don’t need to compile the same
method over and over again.
The EType objects contain the methods used to generate the code needed to
encode/decode the bytes involved. The abstract
EType class contains common
functionality, while the subclasses contain internal methods that return
One feature of the implementation is that every invocation of
buildDecoder for an EType creates a new method. This gives us excellent
visibility as to where hail is spending it’s time in profiling.
Our current ETypes map very closely onto our underlying memory representations
for our types. There is no EType notion of a string, we use binary instead.
There is no encoded sets; we use arrays. We would like to extend this to ETypes
that currently have no PType that represents them. For example:
- Packed integer encoding like Stream VByte.
- Transposition of arrays of structs to structs of arrays (local column store)
- Occupancy list encoding for sparse data.