PValues proposal

This is a proposal to add a physical value class hierarchy, PValues, for representing values of given physical types in the backend.

What problem is this solving? Right now, the backend currently assumes values are represented by a single primitive JVM value whose type depends on the virtual type (e.g. an array is a Code[Long] pointing to the array contents, etc.)

There is an interest in value representations that have non-standard JVM types or multiple JVM types.

The motivating example is a Let of a MakeStruct. This currently builds and populations the struct in memory, but it is better to generate a local variable for each struct field and populate them with the MakeStruct arguments. GetField turns into a local variable reference.

What’s the proposal? There is a new abstract base class PValue. For each virtual type, there is a corresponding abstract PValue type that has the interface of operations that can be performed on a type of that physical value. Then each concrete physical type has a corresponding physical value that its constructor returns.

Here’s the rough picture for arrays. I imagine a new PMissingArrayValue which represents arrays whose elements are all missing. In this case, the necessary state to store the array is just an Int, which is not the conical JVM type for array.

abstract class PValue {
  val t: PType
}

abstract class PArrayValue {
  def loadLength(): Code[Int]
  def elementIsMissing(i: Code[Int]): Code[Boolean]
  def getElement(i: Code[Int]): PValue
}

class PCanonicalArrayValue {
  val p: Code[Long]
  ...
}

class PMissingArrayValue {
  val length: Code[Int]
  ...
}

EmitTriplet will then be changed to have v: PValue instead of Code[_]. To make working with primitives easy in the emitter, I think PValue needs (checked) conversion routines to primitive Code[T] types.

I think this interacts with streams in two ways (see New stream design proposal).

A stream should be a stream of PValue. As in the motivating example, this will allow us to use destructured structs inside of streams in the table lowering for example. This will be an enormous win.

Second, we have stream types, and streams should themselves have PStreamValues. Unlike other physical values that carry JVM values, a PStreamValue should carry not values but the logic to emit code to efficiently iterate over the stream §values. PStreamValues should essentially package up the object returned by EmitStream. I think this means EmitStream has the same signature as Emit and they can be unified.