How to Tame Your Dragon: C++ for Scala Programmers


#1

How to Tame Your Dragon: C++ for Scala Programmers

Richard Cownie, Hail Team, 2018-05-21

Introduction

We already have a bunch of C++ code queued up for code review, and
a shortage of C++ developer bandwidth to get it done quickly. And over
the next few months we hope to push critical functionality from
generated-JVM-bytecode into generated-C++. So now is a good time to
get everyone up to speed on C++. This document assumes working
knowledge of Scala and OOP, and focuses on the differences between
C++ and Scala, with particular attention to the C++ features which
are most relevant to Hail development.

Syntax

C++ syntax is gnarly:

  • comment with /comment/ or //comment-up-to-newline

  • apart from that, newline is ignored, so statements
    are terminated with semicolon (unless they end with a brace)

  • declarations are more Java-like “int foo;” than Scala-like
    “foo: int”

  • declaration syntax for pointers and arrays is just weird

  int foo[8]; // foo is an array of int's, length 8
  char* str;  // str is a pointer to char
  Foo* const p; // p is a constant pointer to Foo
  const Foo* q; // q is a pointer to const Foo

Understanding really complex C++ declarations becomes absurd,
see the “Clockwise/Spiral Rule” described here http://c-faq.com/decl/spiral.anderson.html
and used to parse this:

void (*signal(int, void (*fp)(int)))(int);
// signal is a function passing 
//   an int and 
//   a pointer to 
//     a function passing an int returning nothing (void) 
// returning a pointer to a function passing an int returning nothing (void)

The good news is that such declarations are so mind-twistingly awful
that real code tends to keep things fairly simple.

Correspondence of some basic types and syntax

Scala       C++

Unit        void
Boolean     bool
Byte        char, int8_t   (also uint8_t for unsigned)
Short       short, int16_t (also uint16_t for unsigned)
Int         int, int32_t   (also uint32_t)
Long        long, int64_t  (also uint64_t, intptr_t)
Float       float
Double      double
String      std::string
Generic[T]  Generic<T>     generic/template type instantiation
array(idx)  array[idx]     array/vector indexing

C++ tends to use at least two different representations of text strings:

  • “char*” pointer to a buffer of (8-bit) char terminjkated by 0x00 NUL byte
    may be interpreted as ASCII or as modified-UTF8
    Standard UTF8 encodes the 0x0000 codepoint as a 0x00 byte; modified UTF8
    uses non-standard 2-byte encoding so that the 0x00 byte can still be used
    as string-termination.

  • std::string is a relatively heavyweight (but not as heavy as Scala String)
    heap-allocated mutable variable-length array of bytes, also usually
    interpreted as modified-UTF8.

String literals in C++ are of type “const char*”. This can be implicitly
converted to std::string; and you can convert back with str.c_str().

C++ has both signed and unsigned integer types. One source of awkwardness
is that the STL container classes have many interfaces using “size_t”, which
is an unsigned integer type, and comparisons between signed and unsigned types
give compiler warnings. So it’s almost impossible to avoid using size_t.

Correspondence of special method names

// Scala default constructor
class Foo(val ival: Int, val sval: String) {

  // or secondary constructor
  def this(sval: String) = this(0, sval)
  
  def apply(n: Int) = ival+n
}

// C++ constructors
class Foo {
 public:
  int ival_;
  std::string sval_;

  Foo(int ival, std::string sval) :
    ival_(ival),
    sval_(sval) {
  }
  
  Foo(std::string sval) :
    ival_(0),
    sval_(sval) {
  }
  
  // Destructor
  ~Foo() {
    printf("Foo::~Foo() ...\n");
  }
  
  // Functor method corresponds to Scala "apply"
  int operator()(int n) {
    return(ival_ + n);
  }   
}

Scala “package” and C++ “namespace”

There’s a rough correspondence between the use of Scala “package”
and C++ “namespace”". In each case the package/namespace is used to
subdivide the name hierarchy.

However, while Scala makes it easy to import everything from a package,
it’s slightly more complicated to import everything from a C++ namespace
to be used as an unqualified name:

import is.hail.somepackage._

is roughly equivalent to:

#include "hail/somepackage.h"
using namespace hail::somepackage;

So to avoid excessive clutter of many “using namespace hail::foo” declarations,
it may helps to use fewer and coarser-grain C++ namespaces.

Confusingly, C++ also has “anonymous namespaces” like this:

namespace { // anonymous namespace
void FuncVisibleOnlyInThisCompilationUnit() { }
}

which may be used for function/class/variable declarations visible
only within the current compilation unit (usually corresponding to
one foo.cpp file).

Objects and their lifecycle

On the JVM, the lifecycle of an object is simple:

  • create new object and call one of its constructors

  • implicitly garbage-collect the object when no longer referenced

  val foo = new Foo(someArg0, someArg1)

In C++, there is much more fine-grain control over all stages in
the object lifecycle: memory allocation (where the object is)
may be separated from initialization; objects may be embedded in
other objects, rather than separately allocated; objects may have
non-trivial destructors as well as constructors.

But let’s start with a simple case:

  auto foo = new Foo(someArg0, someArg1) // "auto" infers type Foo*
  delete foo; // destroy the object and free its memory

We can also declare a local stack-allocated object:

  Foo foo(someArg0, someArg1);
  // the destructor will be called precisely when foo goes out of scope

In this case we have created an object on the stack, and the object will
be implicitly destroyed when it goes out of scope, either due to a normal
exit from the code block, or due to an exception unwinding the stack.
Note that nothing in the compiler or the language will stop us taking
the address of that stack-allocated object and storing it somewhere else,
leaving a dangling reference to a destroyed object. That’s the programmer’s
problem, and is one of the many ways C++ allows you to blow your foot off.

Sometimes - though usually only in fairly deep systems-programming - you
may want to create an object in an already-allocated memory buffer:

  Foo* createFooInBuffer(char* buf, std::string someArg0, int someArg1) {
    return new (buf) Foo(someArg0, someArg1);
  }
  
  void destroyButKeepMemory(Foo* foo) { foo->~Foo(); }

… or override the normal memory-allocation policy for a class:

  class Foo {
    // other methods
    void* operator new(size_t n) {
      // could be special memory allocator for this class
      auto allocatedBlock = malloc(n);
      return allocatedBlock;
    }
    
    void operator delete(void* p) {
      // could be matching special deallocator
      free(p);
    }
  }

Stack-allocated objects with non-trivial destructors are especially
important for implementing various kinds of “smart pointer”, and for
controlling other resources such as locks on mutexes.

References, pointers, and smart pointers

In Scala, all class-instances and Array-instances are handled as
references to (probably-)heap-allocated Objects.

In C++, the choice of memory-allocation is decoupled from the type,
and the programmer is free to deal with an object without necessarily
having any kind of pointer or reference. This introduces a distinction
between the object contents, and the object reference (if any) - and also
allows references and pointers to basic types such as integers, as well
as to classes and array.

In the good old days of C, there was just one flavor of reference -
the pointer:

  int foo = 5;
  int* p = &foo; // p is a pointer-to-int, initialized to point at foo
  printf("*p = %d\n", *p); // print the value referenced by p

But of course it is more commonly used to point at compound objects:

  struct IntList { struct IntList* next_; int val_; };
  
  int sum(IntList* list) {
    int n = 0;
    for (; list != nullptr; list = list->next_) n += list->val_;
    return n;
  }

Note the syntax “list->val_” using the “->” operator to dereference a
pointer and select a member of the referenced object.

The designers of C++ kept pointers, but for no very obvious reason they
introduced a second flavor known as “reference”, which by convention
tends to be used only where the reference must point to a valid object,
rather than possibly being null.

  void increment(int& foo) { foo += 1; }

A reference tries hard to disguise the indirection and look just like
the object it refers to, e.g.

  void push_n_times_5(std::vector<int>& vec, int n) {
    vec.push_back(n*5); // note "." instead of "->"
  }
  
  struct Complex {
    double re_;
    double im_;
    
    Complex() : re_(0.0), im_(0.0) { }
    
    Complex(double re, double im) : re_(re), im_(im) { }
    
    // copy-constructor
    Complex(const Complex& b) : re_(b.re_), im_(b.im_) { }

    // copy-assign    
    Complex& operator=(const Complex& b) { re_ = b.re_; im_ = b.im_; }
    
    bool operator==(const Complex& b) const { return ((re_ == b.re_) && (im_ == b.im_)); }
  };
  
  void test() {
    Complex a(1.0, 2.0);
    Complex& a_ref(a);
    Complex b(1.0, 2.0);
    auto& b_ref = b; // "auto&" infers reference-type of RHS
    // We have refs to two different objects with the same contents
    assert(a_ref == b_ref); // content-comparison using Complex::operator==()
    Complex c(2.0, 3.0);
    b_ref = c; // copies the contents of c into the object referenced by b_ref
    assert(a != b); // so now contents of b have changed
  }

It’s common to get confused when dealing with assignment and comparison
operations on reference variables.

My recommendation is to use const-reference variables “const Foo& x” mostly
as function parameters for “borrowing” immutable objects only for the duration
of the function call. And to use non-const reference variables only where
required to fit in with the language or STL conventions (e.g. the return value
from the “operator=” assignment method).

Smart pointers - std::shared_ptr and std::make_shared

In the 1990s everyone trying to use C++ for GUI and application programming
noticed they were spending a lot of effort trying to get their memory
management to work correctly using only “new” and “delete”. And with compute
throughput getting cheaper, and developers getting more expensive, much
development moved towards managed-memory (garbage-collected) languages such
as Java and C#.

C++ experts responded by using the template abstraction mechanisms of C++
to implement “smart pointers” which could be passed around and referenced
in ways that looked just like a “Foo*” built-in pointer, but which would
implement various memory-manangement policies. The “Boost” library in
particular popularized these innovations, and eventually these were incorporated
in the standard library for the C++11 standard.

Most similar to a normal pointer, and thus (IMO) most useful, is std::shared_ptr,
which provides automatic reference-counted memory management. The refcount for an
object is the number of std::shared_ptr’s pointing at the object.
The std::shared_ptr template class can adjust the refcount in the
shared_ptr constructor/destructor/copy-assign to keep everything correct, and
whenever a refcount reaches zero the object is destroyed.

The refcount updates are performed with thread-safe atomic operations, so that
several threads can have shared_ptr’s to the same object, so this manages the
object lifetime correctly even in the presence of multiple threads - note however
that shared_ptr does not synchronize access to data in the object, it only
guarantees that the object won’t be destroyed while you have a reference to it.

To ensure consistent refcounting throughout the object lifetime, objects should
be allocated with std::make_shared(args), which calls the T::constructor
and immediately creates the shared_ptr.

With these features, we can rewrite our earlier example:

  val foo = new Foo(someArg0, someArg1)

becomes:

  auto foo = std::make_shared<Foo>(someArg0, someArg1);

… which is now only slightly more verbose than Scala, but also has the
benefit of avoiding the costs of garbage-collection.

Reference-counted garbage collection doesn’t solve all problems. But abstract
syntax trees and dataflow-related structures tend to be acyclic. And in many
cases explicit unlinking (setting shared_ptr’s to nullptr) of a few pointers
in a data structure will break cycles and cause the rest of the structure to
evaporate.

When you more, std::weak_ptr is a reference which doesn’t force the object
to stay alive. But you can’t dereference a weak_ptr directly - you have to convert
it to a shared_ptr (using weak_ptr::lock()), which will either give you a shared_ptr
to the live object, or a null pointer if the object has already been destroyed.

In keeping with the general philosophy of C++ (why have one way to do something when
you can have 5 different ways and keep everyone happy, except the people who have to
read it ?), they put in yet another flavor of smart pointer, std::unique_ptr.
The advantage of std::unique_ptr is that it has zero space overhead, and zero time
overhead on de-referencing, compared to a normal “Foo*” pointer.

The disadvantage is that it has weird semantics. The essential feature of pointers
is that you can have multiple pointers referring to the same object. But you can’t
do that with std::unique_ptr - those objects are in 1-1 correspondence with non-null
unique_ptr’s. An object has precisely one unique_ptr, and when the unique_ptr stops
pointing at the object, the object goes away.

How the heck do they make that work ? The answer is that you can’t copy a unique_ptr;
you can only move it, and when you move it from one variable to another, the old variable
is cleared. And to make that work you need some rather weird language/type-system
features described in the next section …

Move semantics and Rvalue references

It might seem as though two different flavors of reference (with a choice
of const and non-const toppings) would be enough for any reasonable language.
But who ever said C++ was going to be reasonable ? So the C++11 standard
introduced a third flavor, the “Rvalue reference”. Ignore the name -
it’s really a reference to something that you know you’re about to throw
away.

Why is that a useful thing to have ? Consider a dynamic-resized vector
of complicated objects: when you hit the capacity limit, you allocate a larger
buffer, and you want to move all the old objects into the new buffer.
Up to C++11, you could do that correctly, by copying the old objects
(which might invoke a copy-constructor on each of the new objects, followed
by the destructor on each of the old objects). But you couldn’t do it
in the most efficient way in all cases, which might involve re-using
parts of the old Foo objects in the new copies.

To make that work you want “move semantics”, which means roughly “I want to
make the destination object look like the source object, and I don’t care if
the source object gets trashed”.

So the R-value reference means “you’re allowed to trash this object”. You don’t
have to trash the object. And after “trashing”, it still is a valid object,
but you shouldn’t trust its contents - the only sane things to do to it are to
copy-assign from another object, or to destroy it.

Now in some circumstances the compiler can do dataflow and escape analysis to
figure out that an object is about to be destroyed and thus that it can be passed
as an R-value reference; but more often that is higher-level knowledge, and the
programmer forces it by using std::move(foo) to convert a normal reference into
an it’s-ok-to-trash-foo reference. Most importantly, the STL container classes
know what they’re doing and will use std::move() to optimize for objects which
have an Rvalue-constructor’s and/or an Rvalue-copy-assign method. And the STL
container classes themselves implement Rvalue-constructor and Rvalue-copy-assign.

// A non-trivial class with Rvalue methods
class Foo {
public:
  std::map<std::string, int> col_map_; // map names to column-index
  std::vector<int> col_info_;
  
public:
  Foo() { }
  
  Foo(const Foo& b) :
    col_map_(b.col_map_),
    col_info_(b.col_info_) {
  }
  
  // move-constructor
  Foo(Foo&& b) :
    col_map_(std::move(b.col_map_)),
    col_info_(std::move(b.col_info_)) {
  }
  
  Foo& operator=(const Foo& b) {
    col_map_ = b.col_map_;
    col_info_ = b.col_info_;
    return *this;
  }
  
  // move-assign
  Foo& operator=(Foo&& b) {
    col_map_ = std::move(b.col_map_); // the std::move() is needed
    col_info_ = std::move(b.col_info_);
  }

  int get_col_info(const std::string name) const {
    auto p = col_map_.find(name);
    if (p == col_map_.end()) {
      return -1;
    } else {
      return col_info_[p->second];
    }
  }
};

As you can see, this is quite a lot of boilerplate code. If what you want
is precisely the obvious member-by-member copy or move, then the compiler can
fill in the details:

  Foo(const Foo& b) = default;
  Foo(Foo&& b) = default;
  Foo& operator=(const Foo& b) = default;
  Foo& operator=(Foo&& b) = default;

My recommendation for Rvalue references and std::move is to use them as little
as possible. If you find yourself tempted to use std::move(), think long and hard
about whether you should instead be using some kind of pointer to individually
allocated objects. That often gives the desired effect of moving things around
without excessive copying, and in a way which is easier to understand, less prone
to error, and which naturally corresponds to the semantics and idioms of
Scala/JVM-like managed languages. Just because you can copy object contents
in C++ doesn’t mean that you should

Templates and zero-overhead abstraction

Earlier on I glossed over precisely how C++ implements features such
as std::shared_ptr in the standard library rather than as part of the core
language. But now we’ll give a quick introduction to the C++ template mechanism
and how it allows us to build powerful abstractions with minimal run-time overhead.

A template class or template function can have parameters which are types
and/or integer-type values or enum values. There may be several distinct
template specializations which give totally different behavior for different
parameters. Here’s a brief example.

// Primary template
template<typename T, int N>
class FixedLengthArray {
 private:
  T vals_[N];
 public:
  T get(size_t idx) { return vals_[idx]; }
  
  void set(size_t idx, const T& v) { vals_[idx] = v; }
};

// Partial specialization with 8 bools packed into each byte
template<int N>
class FixedLengthArray<bool, N> {
 private:
  static const size_t kNumBytes = (N+7)>>3;
  int8_t bytes_[kNumBytes];
 public:
  bool get(size_t idx) {
    return ((bytes_[idx>>3] >> (idx&0x7)) & 1) != 0);
  }
  
  bool set(size_t idx, bool v) {
    auto bit = (1 << (idx&0x7));
    if (v) {
      bytes_[idx>>3] |= bit;
    } else {
      bytes_[idx>>3] &= bit;
    }
  }
};

Templates can also invoke themselves recursively - specialization
allows such recursion to be terminated when it matches a
non-recursive specialization.

It’s possible to play a lot of tricks with recursive templates, and
the STL does so. But such code can be very difficult to understand,
and deeply recursive templates can lead to enormously complicated
error messages. So this feature is to be used with caution.

STL Containers and Iterators

This is just going to be the Greatest Hits of the STL, rather than a
comprehensive description. There are plenty of good books and websites
(e.g. http://www.cppreference.com) for that.

STL container classes have corresponding iterator and const_iterator
types which are an abstract position in the container. A container class
has begin() and end() methods which return iterator, and the iterator
can at least be incremented with the “++” operator to advance to the
next position. The iterator can be dereferenced to access the contents
at the current position.

A simple scan of an STL-compatible container looks like this:

std::vector<int> vec;
sum = 0;
// The iterator is of type std::vector<int>::iterator
for (auto it = vec.begin(); it != vec.end(); ++it) sum += *it;

Which can also be abbreviated as:

for (auto n : vec) sum += n;

And the same scan will work for a different container:

std::unordered_set<int> my_set;
for (auto n : my_set) sum += n;

As written so far, each element is copied into the temporary variable “n”.
But if we want to modify the elements, or if the copying would be expensive,
then we can use “auto&” to get a reference to each element:

for (auto& n : my_set) { if (n > 0xff) n = 0xff; }

The difference between “auto” and “auto&” can really screw you up either
by not changing container values that you thought you were changing, or
by introducing unnecessary expensive temporary copies. So always write
and review that construct carefully.

An associative container is viewed as a container of (key, value) pairs,
implemented with the template std::pair<KeyT, ValT> whose members are called
“first” (the key) and “second” (the value).

void PrintKeys(const std::map<std::string, int>& map) {
  for (auto& kv : map) printf("key \"%s\"\n", kv.first.c_str());
}

void SumValues(const std::map<std::string, int>& map) {
  int sum = 0;
  for (auto& kv : map) sum += kv.second;
  return sum;
}

One slight weirdness of STL is that “iterator” type is also used as the
result of an associative lookup - which implies that it needs to support
an associative lookup giving a position in the container, and then
incrementing to the “next” position. That is somewhat meaningful in a
vector or a std::map (binary search tree), but is awkward in a
std::unordered_map (hash-table).

void insertOrIncrement(std::map<std::string, int>* map, const std::string& key) {
  auto p = map->find(key);
  if (p == map->end()) {
    (*map)[key] = 1;
  } else {
    p->second += 1;
  }
}

Great care is needed in modifying a container class during an iterator loop.
Inserting, removing, or resizing elements in a container may lead to undefined
behavior of iterators, and anything can happen. It’s generally best to avoid this.

Here’s a quick list of the most useful STL template classes:

std::vector<T>              // auto-resized vector
std::map<KeyT, ValT>  // associative map (binary search tree)
std::unordered_map<KeyT, ValT> // associative map (hash table)
std::set<T>                   // set (binary search tree)
std::unordered_set<T> // set (hash table)
std::queue<T>              // FIFO queue
std::priority_queue<T> // priority queue
std::pair<T0, T1>          // pair of T0 first; T1 second
std::function<RetT(ArgTs)> // function wrapper
std::tuple<T0,T1,...>     // tuple

#2

Thanks for writing this Richard! I’m just starting to read through it.

I don’t think the third line is right. I think this should be

int foo[8]; // foo is an array of int's, length 8
char* str; // str is a pointer to char
const Foo* q; // q is a pointer to const Foo
Foo const* p; // q is a pointer to const Foo (same as previous)
Foo *const p // p is a constant pointer to Foo
const Foo *const p // p is a constant pointer to const Foo

Other than the fact that the leftmost const can come before or after the type, you can read these off from right to left.


#3

Yeah, that one always confuses me. I’ll check and fix when i get a minute.

Also let me know anything unclear or more stuff I should add.

Thanks

Richard