NDArray Memory Management

The impetus for this is to support NDArray arguments to recursive functions, but the design should be more generally useful for being able to free NDArrays when they’re no longer needed, e.g. the two inputs to a block matrix multiply.

The gist of it is:

With a new NDArray PType, we can consider an NDArray value to be a pointer reference-counted object. The actual reference-counted object is owned by the region pool, which frees it when references decrement to 0, as with regular regions. Storing this value in a region has the additional step of adding a reference to it from that region. When a region is cleared, the reference count is also decremented; if that region held the only reference(s), then the object is freed/released back to the pool. As an example, in pseudocode:

NDArray {
    pool: RegionPool
    id: Long
    refs: 0
    def decrement() {
        refs -= 1
        if (refs == 0) pool.freeNDArray(this)
    }
}
RegionPool {
   … 
   ndArrays: ArrayBuilder[NDArray]
}
RegionMemory {
    …
    ndArrayRefs: ArrayBuilder[NDArray]
    def free() {
        …
        ndArrayRefs.result().foreach { nd => nd.decrement() }
    }
}

The NDArray object itself could just build on the Region reference-counting infrastructure, or could be its own separate thing.

Using this, if TailLoop copies arguments to Recur into a fresh region and uses that region for every iteration through the function, any NDArrays that are constructed within the body of the recursive function but unnecessary for the next iteration will be freed when the region is, but any references in the new args for Recur will preserve necessary NDArrays (without having to do a full copy of potentially-large NDArray data).


We could imagine extending this further so that in an NDArrayMultiply, for example, we can decrement references to the inputs so that they are freed if not used elsewhere, preventing us from keeping large intermediates in a linear algebra operation.

Also: in something like NDArraySum where the result has the same shape as the inputs, if one of the inputs has a refcount of 1 before we multiply, modify it in place!

1 Like