The Oxford Tutorial

The Issues

Setup Issues:

  • “un-tarred" isn’t obvious to everyone, tatyana thought it meant don’t extract it.

  • People naturally try to use Java 9 (it’s the first thing on that webpage)

  • Installing Java 8 after Java 9 is really annoying on mac (we should provide a tool for switching default java versions)

  • parsimonious is not installed by default

  • Spark 2.2.1 is the default spark now, so people immediately try to download that.

UX/UI:

  • TString() is probably going to seem weird (isn’t that different from the table printout?) I saw this when printing an expression

  • nNotMissing from stats aggregator should use underscores

  • should we have an option to print strings with quotes?

  • we should probably just use none consistently everywhere since that’s the python name for missing (or maybe NA?)

  • when do I use set versus hl.set? (never use set?)

Presentation:

  • run describe before and after select, annotate, and transmute

  • Make sure each slide fits on a page

  • the discussion on joins is a bit abstract, especially this table[expr] bit, I think it’s natural for us PL folks, but maybe it would help to show an example

  • in Joins, it would be helpful for the users if we showed the three tables and their fields first, so they can see the connections for themseles

  • A visual example of explode is probably more helpful (maybe actually use a small table and show()):

  a b c [1,2,3,]
=>
  a b c 1
  a b c 2
  a b c 3
  • I feel like we should use the VDS picture in the MatrixTable section

  • focus on the word “compound key” rather than "two keys” (I think the phrase “two keys” can be confusing with the matrix table’s two axes)

Action Items for the Team

We should distribute these items among the team:

  • eliminate camel case everywhere

  • automate as much of the install as possible for OS X and GNU/Linux systems

  • harmonize printed form of types (@tpoterba are you already working on this? what is the preferred printed form?)

  • consistent display and naming of the missing value

Regarding not overfilling slides: Cotton, did you know the standard browser zoom (ctrl +/-) works in presentation mode? That way, if there’s something that just can’t be made to fit in one slide, you can just zoom out to make it fit, and zoom back in on the next slide.