First of all, I would like to thank you for your work developing and maintaining the Hail library.
Our journey deploying Hail’s on AWS started last year with the recipe from the Avillach Lab mentioned in Hail Docs.
Over the past few months, we have developed a AWS CloudFormation template creating an EMR cluster in which Hail is installed.
Notably this template leverages upon AWS CloudFormation to :
- Streamline the deployment of an EMR cluster with more flexible bootstrap scripts.
- Enable the deployment of the EMR cluster within a private VPC.
- Install, along Jupyter-notebook, Zeppelin-notebook which enables an unhindered use of the bokehjs graph plotting library (alleviating a Jupyter-lab notebooks shortcoming which prevents bokehjs’ effective use in multi-node clusters).
- Facilitate the concurrent optional installation of ensembl VEP to fully leverage
Altogether, in addition to resting upon more recent Hail (0.2.59), EMR (6.1.0 vs 5.23.0) and Spark (3.0.0 vs 2.4.0) versions, we believe, this AWS CloudFormation template improves users’ experience compared to the initial cloud formation recipe from the Avillach Lab.
We made our Hail-on-AWS CloudFormation template publicly available on github (https://github.com/c-BIG/Hail-on-AWS) and would be delighted if you wanted to mention it on Hail Docs