When I was young warthog, I took a class on engineering distributed systems for processing large amounts of data. A lot of the earlier papers focused on consensus algorithms (which we discussed in Hail Lab Meeting several weeks ago), but it also included some papers about MapReduce and other work on the data processing systems themselves (rather than the underlying consensus systems).
The “Cloud Big Data Systems” class webpage contains links to all the papers. A couple ones that might be fun to talk about:
“A Comparison of Approaches to Large Scale Data Analysis” in which Pavlo, et al. compare Hadoop, Vertica, and an anonymous DBMS . As you may expect, Hadoop doesn’t exactly shine.
 Stonebreaker created SciDB, which attempts to tackle similar problems to Hail, though the cited use-cases seem to focus on Physics and Astrophyiscs rather than Genetics.
 There’s a particular company that was/is famous for not letting researchers use their name in publications that benchmark their system. I think that company is Oracle, but I’m not certain.