Just trying to learn a bit about the codebase.
Probably doesn’t reach issue status: https://github.com/hail-is/hail/blob/master/batch/batch/server.py#L291 ; let me know if there’s a more appropriate place for this.
Should we persist jobs rather than storing them in a dictionary in-memory? I could imagine it may be desirable to have the running Batch service capable of going down, as would happen if a bug occurred, or if we wished to update Batch…for instance in Node, some classes of errors can crash the process pretty easily, like “header already sent” (this probably doesn’t happen in Flask, as Flask is synchronous).
Options could be using something like beanstalkd to track completed/failed/submitted via queue states rather than in the Batch process, or just using a db, probably don’t need relations, something simple like Redis with short persistence window.