From Offsets to Pointers

with this, which is more towards a global optimization of the data layout within a stage

Yes, please! :slight_smile:

Then if a field gets possibly-modified by an operator, it gets allocated into a different previously-empty struct offset, leaving the unmutated field values in place.

Yes, or we do a sharing/liveness analysis in the code generator and decide the original field is unused afterwards and just overwrite it.

We probably need to get in front of a whiteboard and make sure we’re heading the same way

Yes, I’d like that. Sorry I’ve been out all week. I’m on the up-swing. Happy to do this early next week.

making sure that the big data is actively managed rather falling through holes into GC-and-auto-close()

Yes, these are the changes @dking are making now. They should make all malloc heap allocation explicit without relying on GC at all. This is essential for (1) getting values off the JVM heap so we can pass them to/from C++ and (2) allowing malloc and GC to peacefully coexist.

@dking is using the auto-close thing to free regions, but it’s totally explicit and not at all driven by GC. (I think he’s overly enamored with this pattern and it obscures the intention.)

A region is just a custom linear memory allocator with no-cost bulk free. There is no reason why the code generator can’t also allocate values elsewhere using other allocators like standard malloc/free or other specialized allocators.

Once we’re all in C++, region values are just values: allocated data structures that might be allocated in regions (if suitable), the stack if they can be, or anywhere else.

OK, that makes sense to me. IIRC the Endeca/Oracle system I worked on took a
similar approach of figuring out a good layout for each stage - but I never got deep into that part of query/layout-planning.

I don’t mean to be poking too many holes, I just have to start from skepticism and ask questions until I figure out what I’m missing. Think I’m getting the slightly-bigger picture now.