List of C++ tools

Since Richard is starting to work on creating the C++ side of Hail, I thought I should share some of the tools I found while researching C++. I haven’t had a chance to really look into any of these, so this is mostly just a link dump, with some short summary of my impression of how these are viewed by the community.

Here is another much more comprehensive list of C++ tools.

Test frameworks

Gtest and boost are the old standards, but I’m guessing heavier than we need or want. Catch seems very popular, and doctest is a newer framework inspired by catch but supposedly faster and lighter. Both look pretty nice to me.

Package managers

As far as I know, Conan is the most mature attempt at a package manager for C++. It’s still young, but recently hit v1.0. I’ve heard good things, but haven’t looked at it in any depth. Cget appears to be something much lighter. A package manager isn’t necessary, but if any have reached sufficient maturity, it could make our lives easier. I know this is an area the C++ community is actively trying to improve (along with build systems).

Build systems

Cmake is the de facto standard, but I’ve never heard of anybody who actually likes it. We might not have a choice, but it’s probably worth a look at some of the alternatives. Hopefully we can migrate to one build system that works for our entire codebase across C++/Python/Scala/Java.

In the past I have used the very lightweight (header-only) hamlest for writing C++
tests, see https://github.com/martinmoene/hamlest

For a compiler I favor llvm/clang over g++, because g++ goes remarkably slowly when compiling heavily-templated code. For some kinds of code there could be a performance benefit from using the Intel compiler - but if we can get a long way with
open-source llvm/clang then I’m not in a hurry to get into the licensing issues around the use of a non-free compiler (and for really heavy computation there’s likely to be much bigger payoff from using GPGPU libraries rather than trying to squeeze every last drop out of x86_64/AVX cores).

I don’t have any experience with package managers for C++.

For build tools, I have mostly used straight (GNU) make, handling platform-specific and site-specific configuration by poking around with embedded $(shell foo) commands to choose appropriate variable settings. I think this approach can continue to work fairly well for Hail’s platforms, i.e. MacOSX and various flavors of Linux. Cmake has always seemed fairly gruesome, and I don’t think you need that level of configurability unless you’re covering a much wider range of platforms.

Right now we have gradle calling make for C++ code, and that seems to work well enough.

One aspect of the C++/C culture which seems significantly different from the Scala/Java culture is an emphasis on using code-generator tools such as yacc/bison, lex/flex etc. It seems the Java-world answer to a hard but already-solved problem is to pick up a class library from somewhere; whereas the C++ answer is to pick up a combination of code-generating tools and libraries, with the code-generating tools then needing to be integrated into the build process. I’m not sure quite how that affects the functionality needed in the build system.

I made (simple) use of catch in a previous project and was happy with it. doctest seems nice in that it seems loosely based on catch (reuses some code) but with a focus on compile speed (significant).

I think we should be able to build with gcc or llvm, and to each their own. I don’t see a need for non-free compilers these days.

I also think make should be fine. I don’t have experience with C++ package managers either, but I don’t see us having a lot of dependencies you can’t install with apt or brew. (Or a lot of dependencies, period.)

I think we should have support for testing under valgrind, using the clang/g++ sanitizers and using fuzzers like afl: http://lcamtuf.coredump.cx/afl/. I had very good luck finding bugs with afl in another project.

I would like to enforce use of a single compiler/version for the dynamically-generated code. If you allow whatever-compiler-the-user-likes, then a) your generated code has to obey lowest-common-denominator use of features and syntax, and b) there are just too many different compiler versions out there to be able to test against them.

One not-altogether-standard (at least not in C++11, but maybe it got in since then) is g++'s (also supported in clang) block-with-final-value-as-expression, viz if you put a
block inside brackets ({ }) then it becomes an expression taking the value of the
last expression (just like Scala). That, together with object-valued expressions, is going to be really handy for dealing with missingness.

But keeping track of what compiler you’ve got and which compilers support which features could just get distracting. My preference is to package the required compiler into the release - in a way which won’t interfere with anything else on the machine - and then we avoid a lot of potential problems, at the cost of taking a few extra MB’s of disk.

[Aside: this is in line with the kind of packaging done by Bitnami, which I have found
to be a fairly painless way of managing complex software stacks, without going all the way to having separate containers].

I would vote for fixing a C++ standard for all our C++ code (written and generated) rather than referencing it to a compiler. Given that current versions of llvm and gcc support C++17, I see no reason why it shouldn’t be C++17. I think this largely solves the “lowest common denominator” problem. I think we have no obligation to test on all versions of all compilers, but we should document and support the versions we do (and I’m happy if that’s just the latest llvm, although I do like the idea of testing against both llvm and gcc as they have different warnings, static analyzers, sanitizers, etc.)

If we want our dialect to include extensions implemented by both llvm and gcc (I think most of the GNU extensions are) I think we should consider those on a case-by-case basis.

You propose the gnu statement expression extension. I don’t think it’s included in later standards (gcc -std=c++17 -pedantic warns on it). However, I don’t think it is necessary because it can be easily simulated with C++11 lambdas:

#include <stdio.h>

void foo() {
  printf("in foo\n");
}

int
main()
{
  // printf("%d\n", ({ foo(); 5; }));
  printf("%d\n", ([]() { foo(); return 5; })());
}

I do like the idea of releasing with a supported compiler, we’d certainly ship with libllvm if generated LLVM IR directly. (“a few MB” seems slightly optimistic: cc1plus (stripped) for a recent gcc build on my machine is 23MB and I’m not sure what other dependencies are required, although that still seems plenty small.)

At some point I tried the standard Xcode C++ and whatever version it is didn’t support the “-std=c++17”
option, but did support “-std=c++14”. Right now I’m using the latest clang (6.0.0) but with “-std=c++14”.

Using the lambda instead of the statement-expression syntax is a good trick, as long as we have a sufficient
supply of ‘(’ and ‘)’ brackets :slight_smile:

On size of compiler installation, IIRC the whole llvm/clang install, built for C/C++ only, comes to somewhere
around 50MB. So yes, I’m using “few MB” loosely, in the sense of “won’t take long to download and won’t make much of a dent in your 100s-of-GB disk space”. Not in the sense of “fits on a couple of floppy disks”.

In general, I’m a pessimist about specifying the C++ standard but not the compiler+version. Individual compilers have quirks,and those quirks show up most acutely on algorithm-generated source doing things that wouldn’t occur to humans. At Physics Speed, we were forced to switch from g++ to clang because some generated query code using templates could take over 100sec to compile with g++, but less than 5sec with clang. That was standard-compliant, but a showstopper for dynamic codegen.

I agree with Cotton about using as many compilers as possible, to get more thorough error/warning reporting, which is also advice I’ve seen commonly repeated. The compilers usually suggested for this are gcc, clang, and visual studio, but unless we want to find a windows machine to build on the latter is out.

I don’t see any reason not to follow that advice for our human-written code. For generated code, what if we start out testing it with both compilers, and only restrict to a single compiler if we come across something that forces us to? In your case where g++ was too slow to compile, it might still be beneficial to make use of g++ in testing and CI, but only use the fastest compiler in production.