Repo and distribution layout

I want to make some layout changes for the transition to C++ code generation: to the repo and the distribution.

First, the layout of the code. I don’t want the C++ to live inside the JVM src tree, and I want Python, C++ and JVM languages to to be parallel (as well as any other future languages we embrace). So I propose:

hail/ - the monorepo
  hail/ - the hail project
   Makefile - top-level Makefile
   python/ - already exists
   cxx/ - for C/C++
     src/
   jvm/ - for JVM languages
     build.gradle
     src/
     ...

So basically add cxx and jvm, move src into jvm, and move src/main/c into cxx, and add a top-level Makefile to drive the compilation process.

Second, building the hail project should populate a directory with the distribution. That will look like:

distribution/
  bin/
    c++ - compiler
  include/
    hail/ - hail include files
    ... - std, other incudes
 lib/
   libhail.so - or in the jar?
   ... - std, other libs
 jars/
   hail-all-spark.jar
 docs/ - and other stuff

The distribution will have to be present on the master, but not necessarily the workers. @dking how will this interact with the pip package? I assume we can just include this structure there?

1 Like

I think keeping libhail.so inside the jar is the easiest for distribution (it doesn’t change how we distribute it now). It also avoids a breaking change for users not using cloud tools (i.e. users managing JAR distribution themselves).


setup.py packages typically have this structure:

name/
  name/
    __init__.py
    ...
  bin/
    run-the-thing
  setup.py

Currently, we cp the jar into the name/name/ directory before executing setup.py. This ensures it ends up in the package root after installation.

If we go with this structure, we could copy the whole distribution folder into name/name/ and load the jar from there. We just need to change a couple lines to reference the jar inside the distribution folder.

Once we require a C++ compiler and dependencies and the Hail header files (which are currently packaged in the jar resources and extracted, but I don’t want to do that), then JAR-only (or JAR and ZIP) distribution doesn’t work anymore. My thinking was libhail.so can be distributed the way compiled shared object files are. I also expect non-JVM stuff will increasingly want to access the libhail object.