NDArray emitter redesign

tpoterba · May 18, 2021, 6:52pm

NDArray loops are a severe performance pain point of lowered methods like linear_regression_rows_nd. John and I propose a new kind of NDArray emitter that should alleviate some of this performance pressure.

Current design

The current NDArrayEmitter intermediate looks like this:

abstract class NDArrayEmitter(val outputShape: IndexedSeq[Value[Long]], val elementType: SType) {
  val nDims = outputShape.length

  def outputElement(cb: EmitCodeBuilder, idxVars: IndexedSeq[Value[Long]]): PCode

While simple, this design forces implementations to do a lot of work in the inner loop where the code generated by output_element is placed.

Proposed design

Instead, we propose the following design:

abstract class NDArrayProducer {

  def elementType: SType

  val shape: IndexedSeq[Value[Long]]
  
  // global initialization
  val init: EmitCodeBuilder => Unit

  // initialize or reset an axis.
  val initAxis: IndexedSeq[(EmitCodeBuilder) => Unit]

  // step an axis by some number of elements
  val stepAxis: IndexedSeq[(EmitCodeBuilder, Value[Int]) => Unit]

  // load current 
  def loadElementAtCurrentAddr(cb: EmitCodeBuilder): SCode
}

This producer can be “consumed” by using the interface above. As an example, to iterate over each element, a consumer can generate a number of loops equal to the number of axes, call initAxis(axisIndex)(cb) before each loop, and increment the element pointer by calling stepAxis(axisIndex)(cb, const(1)) within the body of each loop. The consumer can consume the element by calling loadElementAtCurrentAddr inside the innermost loop.

What this solution is not:

This solution is not an end-all solution to NDArray performance problems in the JVM.

What this solution is:

This design is an easy (< 1 week of work) way to drastically improve the generated bytecode, and hopefully get NDArray performance to a place where we can start ripping out Spark implementations.

Topic		Replies	Views
Notes on EmitStream	0	597	December 16, 2019
NDArray Matmul Performance Improvements	4	695	February 10, 2020
New stream design proposal	3	864	February 21, 2020
On the subject of EmitTriplet	6	618	March 5, 2020
Optimization ideas	3	974	April 25, 2018

NDArray emitter redesign

Current design

Proposed design

What this solution is not:

What this solution is:

Related topics