I’d like to be able to make interval more general–basically adding the option to specify whether the endpoints should be included or excluded. (The reason I want to do this is because I need to use different kinds of intervals in the partitioner, but I think it’d be more generally nice to be able to have these).
TInterval would have 3 fields—itype, start, and end—where itype is an Int32 that contains the two bits of information (startInclusive and endInclusive) and Interval.contains, Interval.overlaps, and Interval.isEmpty would need to account for all the possibilities.
Seems cool to me. Current intervals are inclusive of both, right?
In scala / RV land, I’d use two actual boolean fields for programmer-sanity. Two booleans in RV-land should be pretty efficient anyway (I think 8 bytes each).
So one thing that I ran into while propagating up into user-visible-land:
I updated the toString representation (and the parser) to recognize ("[", "(") and ("]", ")") around an interval, and interpret accordingly. The JSON representation just stores the booleans as additional fields.
How much of this do we want to expose to users? Right now, I have the parser recognizing an interval as start-end both with enclosing brackets and without enclosing brackets (this defaults to [a, b)). This is to maintain compatibility with the python-side Interval.parse as it currently works (I don’t believe this version is necessary anywhere else) but I think trying to parse something of form [a, b) etc. would also work, and maybe it would be cleaner to require that from now on? I was going to document + expose in python in a future pull request, if we want that to happen.
I fixed up the IntervalList parser to take advantage of the new stuff—instead of taking end+1 as an exclusive endpoint, it just takes end as an inclusive one.