https://github.com/TimothyStiles/poly
A large part of my project's community are devs that want to get into the field but can't tolerate the ridiculously low pay, laughably bad management, disrespect, and what amounts to 40+ years of technical debt that's endemic to biotech software.
I've had companies here in the Bay Area offer me 100K a year with a straight face. I've had companies during interview tell me they're looking for someone to help, "set up GitHub". I've seen job listings for low paid web dev positions require applicants to have PhDs.
The reality is that except for a growing handful of places management straight up won't know the difference between IT and software engineers. It's what I call the naive buyers problem.
The demand for software engineers in biotech is generated by naive buyers that don't know what they need, why they need it, or how to get it.
Benchling and Recursion Pharmaceuticals have reputations in the industry of paying, "standard software salaries". So do the research divisions at places like deepmind/microsoft/google but in my experience there's even new multi-billion dollar institutes where senior management has never even heard the term devops.
Most places advertise for "data scientist", positions or some analog, instead of software engineers. This is mostly because upper management has never met an actual practicing software engineer in a professional setting. Many come from academia where the culture and work requirements heavily disincentivize standard software engineering practices.
It's also not uncommon for a biotech company to either have a very under qualified CTO whose main programming experience is what they learned doing ML research like stuff during their PhD or not even have one at all which has huge downstream consequences.
This week a software engineer trying to make the switch to biotech actually DM'd me to ask why they were seeing a ton of data science / ML job positions but no software engineering / devops positions.
They were worried that these companies were trying to save on costs by forcing their data scientists to create infrastructure but it's actually worse than that. Most of these companies aren't even aware that there's supposed to be infrastructure.
Despite all of this the future is looking better and I'm starting to find new companies and positions that are well... reasonable. I learned about this thread from a friend at a party last night that works at one of these companies. There's a small, strong new wave of companies and developers out there pushing biotech software forward. Hopefully some (including myself) make it big while pushing the idea that better tech equals better biotech.
https://github.com/TimothyStiles/poly
Goal is to have a suite of packages and databases that can be used to design entirely novel proteins, metabolic pathways, and DNA constructs at scale because right now that software ecosystem just doesn't exist.
Blew my mind reading through it, honestly. Just perfect.
Already starting something in the space, but I’d be happy to talk!
What you can do and what you can do are different things. Genetic engineering and biological manipulation go as deep as software, and tacit knowledge about execution is non-trivial to the point where you WILL mess up experiments (so expect to repeat a lot).
That said, you can still do some fun stuff. I would recommend trying to do something very small but actually novel. For example, if you've done a GFP transformation into E.coli, try to get the GFP transformation working in a new organism (maybe a yogurt bacteria). Keep it small though, and keep it single cellular, or else you are putting yourself into the pit of despair.
Also check out the Poly project (https://github.com/TimothyStiles/poly). We're basically building (decent) open-source software for doing synthetic biology. Since you're a software developer, doing code reviews and reading our mega-comments (like https://github.com/TimothyStiles/poly/blob/prime/transformat...) might help you understand some more of the fundamental engineering problems we synthetic biologists are encountering. Also, in code reviews, if you don't understand something, a practicing synthetic biologist will explain it to you so that we can improve our docs.
sporenetlabs.com Building a method to do massive amounts of affordable DNA distribution. I'm still working on the backend for that one.
The 2 that come to mind that are a bit more organized and in development are SBOL (https://sbolstandard.org/) and Poly (https://github.com/TimothyStiles/poly, self plug).
SBOL is a project looking to make a better format for sharing DNA sequences. For context, the same format (GenBank format) has been used as the standard for sharing biological sequences since 1982. As you can imagine, our understanding of DNA sequences has improved since that point, but we have no way to encode much of that information in a shareable format. So it doesn't get shared. That means a lot of genetic elements can't be computed (data is there, it isn't shared) so synthetic biologists just keep it floating in their heads to make experiments work. SBOL is a new format that has been running for about 10 years trying to change that - it is pretty much the definition of "design by committee", but their design is extremely thought out, and they have the community reach necessary to get acceptance. However, the tools just ain't there yet to use SBOL for widespread adoption. I do believe it'll get there though, but they don't have enough skilled programmers.
Poly, which is a project I've been involved, is aiming to make good CLI tools for biotech (in addition to a kickass Golang library). It's fairly early, but ran by an actual software engineer, and we're making pretty good progress on tooling, and a decent community of biologists. For example of some stuff we're working on right now for version 1.0.0 - universal identifiers for genetic sequences, JSON representations of GenBank files (for interop), codon optimization / DNA synthesis optimization, primer design, and plasmid cloning simulation. One of the exciting parts of the project is an algorithm I designed that is able to extremely efficiently index genetic sequences for search. The goal is to index ALL public genetic sequences and make them searchable through a simple API (Should get a 100-1000x speed up to what is currently available).
SBOL does calls every Wednesday with updates to their work, and Jacob Beal kinda runs the show (awesome guy). Poly has a gaming night every Friday, which is a good way to figure out if the community is right. We're trying to get more issues on Github, but that is a work in progress!