Better Python Error Messages Idea

I had an idea last week about a potential way to give users better error messages in some cases, and I wanted to see if you all thought it sounded feasible. This is inspired by Tim’s trick with ArrayRef here: https://github.com/hail-is/hail/pull/7792/ , though I don’t want to individually update IR nodes like this, and I don’t want to send the python trace to the JVM.

Proposal

  1. Create a new IR node, working name is UserError. This represents an error that happens because a user makes a mistake that won’t be identified until runtime, like an index out of bounds, bad dictionary key or something. It would look like: UserError(msg: String, id: Int). This is a lot like the current Die, except that it 1. Has an id number that will correspond to a saved python stack trace, and 2. Won’t throw a FatalError, but instead some new type of error.

  2. Either augment CaseBuilder.or_error or create some new function that both returns one of these IRs and also saves the current python stack trace using traceback to an array or something on HailContext. In the ArrayRef case, we’d now generate an IR in Python that does a bounds check before indexing, throwing a UserError if out of bounds

  3. Add a new type of error on the Java side that is thrown when UserError nodes are evaluated. When it’s thrown, it’ll include the id present in the UserError IR. In Python, we’ll receive this error, use the id to look up the corresponding stack trace, and then throw the python stack trace that we saved. That should give an effect similar to ArrayRef, without bloating the IR.

For the most part, this seems like a reasonable proposal to me. My 2 concerns are

  1. Writing checks in Python instead of Scala. While I like the idea / developer experience of writing checks in Python, I worry there’s the potential to emit slower code this way.

  2. Is the stack_traces array that I’m saving on the python HailContext going to grow forever? When can I clear it? I think maybe this can be solved by not saving to the HailContext, but instead saving it on the UserError IR, and then only when we are about to send to JVM will we do a pass that pulls the stack traces out of the IR, replaces them with ID numbers, and saves them in a stack_traces array which then only lives until the JVM responds with a result.

Even if this proposal doesn’t work out, I think improving the error messages story is something worth discussing.

I think what we really need is some kind of weak reference / weak map. Each UserError IR needs some kind of unique identifier. Upon creation, a UserError IR adds itself to a weak dictionary using the unique identifier as the key. When you serialize, that identifier is included instead of the stack trace. In case of error, we look up the IR/stacktrace in the dictionary. Looking up the identifier in the dictionary must succeed because the IR is necessarily still gc-live until at least this line of execute:

value = ir.typ._from_json(result['value'])

I think doing this all in Python with a single node is going to be too expensive. I think it’ll also be as much work as the following proposal, which may be simpler:

Add an integer argument to any IR node / function which should return a readable invalid-input user error. These functions should generate code that throws an error with a specific pattern recognizable by the python client, using that unique integer to identify the IR source.

Each function we want to convert to the better-error model is a small bit of refactoring work.

That proposal would also work. I was intentionally trying to avoid making this a something individual IRs would have to carry around, but this would still be ok. Just to make this concrete though, you’d be worried about an ArrayExpression.__getItem__(i) implementation that looked roughly like:

def __getitem__(self, i):
    return ArrayRef(self._ir, hl.cases().when(hl.len(self) < i, i).or_error("Array index out of bounds")

You think that would be significantly slower than handwriting Code that does that check? I guess we don’t want to do it twice, and maybe we still want to do the check on IRs we generate during simplifications, so maybe it’s not ok to only check in Python. Some of my NDArray stuff only has checks in python at this time.

The UserError proposal empowers methods developers to provide more useful stack traces when data does not meet their expectations. I think it’s a valuable proposal independent of how Hail internally handles error messages.

I suppose if you can make .or_error somehow leverage the IR integer argument idea, then that satisfies my desire to empower library developers.

Yeah, I think any of these implementations are compatible with the same interface properties we all want.

I’m still curious why you think something like Better Python Error Messages Idea would be slower. I don’t disagree, I just feel like I don’t know it well enough to know what to expect. Is it just not wanting a Python check and a Scala check?

I am worried about the IR expansion and duplication causing problems in our optimizer. The correct rule for ArrayExpression.index will be something like:

hl.case().when((i >= len(self)) | ((i < 0) & (-i >= len(self)), error).default(current_behavior)

We have lots of functions that can throw error messages. I don’t think it’s possible for us to push checks into Python that prevent any invalid-input error coming from scala, so I want a solution that lets us throw errors from the same places in the code generator, but with a way to include context.

Ok, that’s reasonable / makes sense. I agree it would be good to be able to throw errors from within the Scala code that still highlights a python line.

As a first experiment, I’ll probably change Die in the way you described. That’ll cover anyone who does the or_else thing, which will hit a few ndarray checks that motivated this in the first place and will also meet Dan’s requirement.