This paper misses the bigger picture that genomics is a Big Data problem. Setting up pipelines to put together perl, bash, python, and C++ programs is not where the field will be in a few years time.

Agreed.

Hope you don't mind a plug here for ADAM, a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark and Parquet.

https://github.com/bigdatagenomics/adam