What does HackerNews think of airbyte?
Airbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.
The context is that Airbyte is now (after pivoting 3x during YC https://airbyte.com/blog/how-we-pivoted-3-times-in-the-1st-m...) the largest/fastest growing open source community (see our github https://github.com/airbytehq/airbyte) of data pipeline connectors[0], so in a sense they have always been free if you are self hosting. But now using them on Airbyte Cloud is going to be free as well aka "we will do your ELT for free no matter the volume as long as our connectors are not GA yet".
This is a massive commitment to improve the quality of our connectors, which is also something we have been pushing the industry on: https://airbyte.com/blog/connector-release-stages :
Alpha: new, basic docs, works, passes acceptance tests
Beta: Alpha + at least 25 active users + >90% sync success rate + snapshot tests + all streams + severe issues handled + security + supports checkpointing + SLA on cloud
GA: Beta + >99% sync success rate + more than 50 active users + <24 hours downtime + polished docs + performant
It's been going very well; you can see how many connectors we promote to GA each month in our slack (https://slack.airbyte.io/) and changelogs, and our new lowcode CDK (https://www.youtube.com/watch?v=i7VSL2bDvmw) is helping new connectors insta-promote to beta.
We hope to set the new standard in data integration and this is still only day 1.
[0]: good explainer on why companies are moving towards ELT in the first place for the uninitiated https://airbyte.com/blog/elt-pipeline
wondering what kinds of projects are/are not suitable for this. the only context i have is from working at open source devtool companies that provide docker builds for people to pull down. might speed up the release process slightly. i suspect my company https://github.com/airbytehq/airbyte/ could benefit. but is it also useful for internal usage?
I think some of the points made here about ETL scripts being just 'ETL scripts' are very relevant. Definitely been on the other side of the table arguing for a quick 3-hour script.
Having written plenty of ETL scripts - in Java with Hadoop/Spark, Python with Airflow and pure Bash - that later morphed into tech debt monsters, I think many people underestimate how quickly these can quickly snowball into proper products with actual requirements.
Unless one is extremely confident an ETL script will remain a non-critical good-to-have part of the stack, I believe evaluating and adopting a good ETL framework, especially one with pre-built integrations is good case of 'sharpening the axe before cutting the tree' and well worth the time.
We've been very careful to minimise Airbyte's learning curve. Starting up Airbyte is as easy as checking out the git repo and running 'docker compose up'. A UI allows users to select, configure and schedule jobs from a list of 120+ supported connectors. It's not uncommon to see users successfully using Airbyte within tens of mins.
If a connector is not supported, we offer a Python CDK that lets anyone develop their own connectors in a matter of hours. We have a commitment to supporting community contributed connectors so there is no worry about contributions going to waste.
Everything is open source, so anyone is free to deep as dive as they need or want to.
We also build in the open and have single-digit hour Slack response time on weekdays. Do check us out - https://github.com/airbytehq/airbyte!