Last night while feverishly thinking of the world of opportunities opened up by

DNDArray, I realized that 1:1 partitioners are far too restrictive! I can

achieve good key-change performance with a general class of partitioners that I

call Rectangular Partitioners (they’re similar to but more general than

BlockMatrix Grid Partitioners).

What DNDArrays need is cheap permutation and prefixation of keys. When is this

cheap?

Define a partitioner as a key type and a sequence of intervals. An interval’s

end-points are structs of type the partitioner’s key type.

Define “partitioner, p, restricted to field f” as a new partitioner whose

sequence of intervals is p’s intervals where each end-point has been projected

to just the field f. For example:

```
Partitioner(
Interval(hl.struct(x=0, y=0, z=0), hl.struct(x=0, y=3, z=5)),
Interval(hl.struct(x=0, y=0, z=6), hl.struct(x=0, y=3, z=10)),
Interval(hl.struct(x=0, y=3, z=0), hl.struct(x=0, y=10, z=5)),
Interval(hl.struct(x=0, y=3, z=6), hl.struct(x=0, y=10, z=10)),
```

If we restrict to “z” we have:

```
Partitioner(
Interval(hl.struct(z=0), hl.struct(z=5)),
Interval(hl.struct(z=6), hl.struct(z=10)),
Interval(hl.struct(z=0), hl.struct(z=5)),
Interval(hl.struct(z=6), hl.struct(z=10)),
```

If we restrict to “y” we have:

```
Partitioner(
Interval(hl.struct(y=0), hl.struct(y=3)),
Interval(hl.struct(y=0), hl.struct(y=3)),
Interval(hl.struct(y=3), hl.struct(y=10)),
Interval(hl.struct(y=3), hl.struct(y=10)),
```

A partitioner is “rectangular” if for every field, f, in its key type, the

partitioner restricted to f contains no partially overlapping intervals. Two

intervals partially overlap if they overlap but are not equal.

If a partitioner is rectangular, then permuting the key fields can be achieved

with only a local sort. The order of the partitions changes in the expected

way. For two-key rectangular partitioners, this corresponds to switching the

ordering between row-major and column-major.

This avoids a shuffle, which is already a huge win, but we can do even better. A

local sort can be avoided if our data structure has a cheap way to permute the

sort-by-indices. `hl.nd.array`

is one such data structure! If we open our mind

to partitions that are not just arrays of records but *ndarrays* of records,

then key permutations are just matrix transpositions (or whatever the tensor

version of that is called)!

Rectangular partitioners permit filtering the rows and columns of ndarrays

*without repacking the blocks*! Hail’s table join logic will find the

corresponding partitions. All that remains is to teach Hail how to work with

ndarrays as partitions instead of arrays.

This essay was inspired by working on DNDArray: