Hi! Author of the post here. I can attempt to answer any questions if need be although it looks like others have done a great job doing that already!

Hello! I'm wondering if you came across our suite of libraries and tools for doing Genomics on Spark?

https://github.com/bigdatagenomics/adam

We've fallen off the first Google hit the past few years but are still quite relevant (e.g. Databricks' commercial offering uses ADAM under the hood). Drop in our Gitter some time!