I think you underestimate the diversity of genome research activities, technologies and methods out there :) It's such an incredibly fragmented field; sure, many ad-hoc pipelines eventually become productized beyond a pile of scripts and a dozen or so users, and there are definitely plenty of applications which demand HPC and "big data" techniques - but that describes a tiny fraction of all the research projects out there.

In any case, many parts of the field simply don't have the software engineering discipline to pull off proper "big data" workflows. Advances in commodity hardware, stronger programming tools for ad-hoc work, and "cloudification" toolchains will probably delay a lot of what used to require proper engineering effort from maturing.

Not to mention there's plenty of fertile ground solving problems which by now can be answered with merely "annoyingly non-small" rather than "big data" techniques.

(Minor) co-author on the paper here, just wanted to second your experience in the field.

The big win I've found with Nextflow is that once you've written a workflow, you have a lot of flexibility in the execution environment: Have all the tools already installed on your workstation or large compute instance? Use the local executor to saturate the box with concurrently running jobs. Don't have or want all those tools installed? Use the local executor with Docker images. Have access to a traditional compute cluster (e.g. LSF, SGE, Torque, etc.)? Use the cluster executor with Docker images.

A couple other resources worth checking out:

Toil workflow engine https://github.com/BD2KGenomics/toil

Common Workflow Language (CWL) specification https://github.com/common-workflow-language/common-workflow-...