What does HackerNews think of adam?

Advanced Scientific Data Format | Oct 2022

We presented using Parquet formats for bioinformatics 2012/13-ish at the Bioinformatics Open Source Conference (BOSC) and got laughed out of the place.

While using Apache Spark for bioinformatics [0] never really took off, I still think Parquet formats for bioinformatics [1] is a good idea, especially with DuckDB, Apache Arrow, etc. supporting Parquet out of the box.

0 - https://github.com/bigdatagenomics/adam

1 - https://github.com/bigdatagenomics/bdg-formats

Seq – A programming language for computational genomics and bioinformatics | Sep 2021

Expand Context ↕

We're here, still plugging along.

ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.

https://github.com/bigdatagenomics/adam

Using AWK and R to parse 25TB | Jun 2019

Expand Context ↕

Hello! I'm wondering if you came across our suite of libraries and tools for doing Genomics on Spark?

https://github.com/bigdatagenomics/adam

We've fallen off the first Google hit the past few years but are still quite relevant (e.g. Databricks' commercial offering uses ADAM under the hood). Drop in our Gitter some time!

Genomics – A programmer’s guide | May 2019

Expand Context ↕

I feel like ADAM (https://github.com/bigdatagenomics/adam) is a huge step in the right direction. You convert from standard genomics format to Parquet and then work with the resulting data in spark with genomics-specific libraries.

My experience has been translating domain data into spark has a 100X improvement in data analysis.

I'm choosing euthanasia etd 1pm. I have no last words. | Oct 2016

Expand Context ↕

At the UC Berkeley AMPLab we're working on scaling genomics [0], all open source under Apache 2 license. Or more generally, any of the Open Bioinformatics Foundation (OBF)[1] projects could use a hand, open source licenses vary.

[0] - https://github.com/bigdatagenomics/adam

[1] - https://www.open-bio.org/wiki/Main_Page

What can a technologist do about climate change? | Nov 2015

Expand Context ↕

1) Join a lab as a programmer (e.g., National Weather Service (climate), J. Craig Institute (infectious diseases))

2) Read the computational papers on a subject you are interested; replicate the results in the papers and open source your software/pipeline; apply the method to a newer data-set.

3) Contribute to a open source informatics toolchain used for the subject (e.g., https://github.com/bigdatagenomics/adam)

The impact of Docker containers on the performance of genomic pipelines | Nov 2015

Expand Context ↕

Agreed.

Hope you don't mind a plug here for ADAM, a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark and Parquet.

https://github.com/bigdatagenomics/adam

IBM's Watson will be used to make decisions about cancer care in 14 hospitals | May 2015

Expand Context ↕

Yes indeed. And new tools are being written - see the Adam project for an interesting example: https://github.com/bigdatagenomics/adam and the associated variant caller Avocado: https://github.com/bigdatagenomics/avocado. Others are also trying to get the old tools working on Hadoop, for instance Halvade: https://github.com/ddcap/halvade/wiki/Halvade-Manual, Hadoop-BAM https://github.com/HadoopGenomics/Hadoop-BAM, SeqPig: http://seqpig.sourceforge.net/, and the guys at BioBankCloud: https://github.com/biobankcloud. It's going to take quite a while for this stuff to get fleshed out, and for researchers to adopt it. But the sheer weight of data is going to force things in the Hadoop direction eventually. It is inevitable.