What does HackerNews think of adam?
ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.
While using Apache Spark for bioinformatics [0] never really took off, I still think Parquet formats for bioinformatics [1] is a good idea, especially with DuckDB, Apache Arrow, etc. supporting Parquet out of the box.
ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.
https://github.com/bigdatagenomics/adam
We've fallen off the first Google hit the past few years but are still quite relevant (e.g. Databricks' commercial offering uses ADAM under the hood). Drop in our Gitter some time!
My experience has been translating domain data into spark has a 100X improvement in data analysis.
2) Read the computational papers on a subject you are interested; replicate the results in the papers and open source your software/pipeline; apply the method to a newer data-set.
3) Contribute to a open source informatics toolchain used for the subject (e.g., https://github.com/bigdatagenomics/adam)
Hope you don't mind a plug here for ADAM, a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark and Parquet.