What does HackerNews think of bwa?

Burrow-Wheeler Aligner for short-read alignment (see minimap2 for long-read alignment)

Language: C

Start with downloading SRA toolkit: https://github.com/ncbi/sra-tools/wiki/02.-Installing-SRA-To...

Find some data of interest: https://www.ncbi.nlm.nih.gov/sra?term=(%22Homo%20sapiens%22[... (This searches SRA for human genome sequences on illumina with fastq files available)

Run fasterq-dump on the SRR (listed as "Runs" in the SRA page of your choice): fasterq-dump SRR21812682

Download a microbial genome of interest, here is the link for common yeast: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/146/045/GCF...

Install an alignment tool like bwa: https://github.com/lh3/bwa

Unzip the the genome file and create a bwa index: gunzip GCF_000146045.2_R64_genomic.fna.gz && bwa index GCF_000146045.2_R64_genomic.fna

Align: bwa GCF_000146045.2_R64_genomic.fna SRR21812682.fastq (or whatever the fastq files are named)

If you get any alignment results, you've "found" fungal DNA in a human sample. This is a highly simplified workflow, but covers the basic ideas. One of the papers is free and the method sections covers their workflow (it is much more complicated):

https://www.cell.com/cell/fulltext/S0092-8674(22)01127-8

Useful resources: https://www.biostars.org/ https://rosalind.info/problems/list-view/?location=bioinform... https://www.cancer.gov/about-nci/organization/ccg/research/s... (source data for this paper, cancer specific sequencing data)

This is used in both bwa [1] and bowtie [2], two of the most popular DNA sequence aligners.

[1] https://github.com/lh3/bwa

[2] https://github.com/BenLangmead/bowtie

Aligning all the short reads a (Sequencing By Synthesis) Next Generation Sequencer (to be very specific) produces (hundreds to thousands of millions) to a reference genome is hard computational work indeed, no idea if you can use it for proof-of-work but I like where this is going... But do you not need some kind of definition of correct (like hash needs many leading 0's) for PoW? How would that work for aligning reads to a reference genome? Maybe it could, find the position of the read is hard (I mean, not really for a modern CPU but relatively... I think... not an expert on that), verifying how correct it is, is not hard (I think). Maybe we should ask the BWA devs [0].

[0]: https://github.com/lh3/bwa

Note, Heng Li[0] is a significant figure in Bioinformatics software. Most notably, he is the author of BWA[1] (Burrows-Wheeler Alignment). BWA performs a large percent of all sequence alignments worldwide.

[0] http://www.liheng.org

[1] https://github.com/lh3/bwa