As someone who works in the data-{engineering, science,etc} space, I really don’t understand the communities obsession with unwieldy tools like DBT.
It’s like there’s some obsession with ignoring any kind of practice in wider software engineering, or innovation in the PL theory space, in favour of…gluing sql together using string templates?
It seems like for the most part-there’s very little innovation or progress in the space. It’s just 15-variations on a theme of configure some yaml for your ninja-interpolated strings to dump ever more data into your cost-centre black hole known as a “modern data lake”.
I’m sure there’s some interesting things going on in small corners, but it’s difficult to find, and if it exists, it’s being studiously ignored by mainstream tooling.
People don't have enough time to step back from their tools and think about what the ideal thing would look like.
I'm convinced this entire space should be visual. We always visualize data pipelines in our minds and on whiteboards...well, visually. Two-way code-diagram syncing is needed to allow best of both worlds. We should serialize to yaml from imperative code, but allow manipulating the diagram to modify the imperative code.
Ideally what you want is also to track the dependencies of every atom of data in your org, and then have something cache and incrementally compute updates.
The biggest thorn in this vision is SQL and the relational model, and I don't think a lot of people realize it.
It favors representing data such that the query planner can optimize query execution...rather than being able to track data dependencies and visualize data flow. It wasn't designed in mind with complex ETL pipelines and many external data sources of today's world.
At my last 2 jobs I spent entirely too much time debugging Matillion jobs, which are visual. I have my doubts that it’s the panacea that it appears to be.
That said, you may find Enso particularly interesting: https://github.com/enso-org/enso