On interval endpoints

wang · February 6, 2018, 8:07pm

I’d like to be able to make interval more general–basically adding the option to specify whether the endpoints should be included or excluded. (The reason I want to do this is because I need to use different kinds of intervals in the partitioner, but I think it’d be more generally nice to be able to have these).

TInterval would have 3 fields—itype, start, and end—where itype is an Int32 that contains the two bits of information (startInclusive and endInclusive) and Interval.contains, Interval.overlaps, and Interval.isEmpty would need to account for all the possibilities.

dking · February 6, 2018, 8:14pm

Seems cool to me. Current intervals are inclusive of both, right?

In scala / RV land, I’d use two actual boolean fields for programmer-sanity. Two booleans in RV-land should be pretty efficient anyway (I think 8 bytes each).

wang · February 6, 2018, 8:26pm

Current intervals are [a, b). I forgot we had booleans that were pretty small—that seems pretty good.

wang · February 8, 2018, 1:34pm

So one thing that I ran into while propagating up into user-visible-land:

I updated the toString representation (and the parser) to recognize ("[", "(") and ("]", ")") around an interval, and interpret accordingly. The JSON representation just stores the booleans as additional fields.
How much of this do we want to expose to users? Right now, I have the parser recognizing an interval as start-end both with enclosing brackets and without enclosing brackets (this defaults to [a, b)). This is to maintain compatibility with the python-side Interval.parse as it currently works (I don’t believe this version is necessary anywhere else) but I think trying to parse something of form [a, b) etc. would also work, and maybe it would be cleaner to require that from now on? I was going to document + expose in python in a future pull request, if we want that to happen.
I fixed up the IntervalList parser to take advantage of the new stuff—instead of taking end+1 as an exclusive endpoint, it just takes end as an inclusive one.

Are there other things that I might be missing?

dking · February 14, 2018, 6:05pm

I like [l,r) as a syntax, personally. I think some of our users will always prefer l-r, even though it’s kind of ambiguous.

cseed · February 16, 2018, 7:35pm

I feel like @tpoterba is going to lament the genetics usability hit of this suggestion, but here goes:

we can do a good job on input. Support [a, b) syntax as well as 2:57-258 for interval,
parse_interval should support the above formats, but shouldn’t be a genetics-specific,
consistently use the precise and unambiguous [a, b) syntax on output,
add genetics_interval_str(interval) -> str (name? ugh.) that converts interval to the 2:57-258 syntax.

Thoughts?

Topic		Replies	Views
RVDPartitioner interval in IR	2	570	April 30, 2020
Move RegionValue interface onto types?	3	625	November 3, 2017
New iterator abstractions	0	574	February 28, 2018
Path to get rid of AST	11	810	April 23, 2018
Required Types: Syntax and Semantics	9	791	November 7, 2017

On interval endpoints

Related topics