Great work! I once was a fan of custom well tailored DSL's for each tool. However, nowadays I really like the current direction of providing a lot of tooling with only SQL necessary to use them, even with the dialects differing a bit.

I've made my own contribution here too, with a tool to join and analyse data in various databases and file formats (JSON, CSV, Excel) using plain SQL, OctoSQL: https://github.com/cube2222/octosql

This is really cool stuff, thanks for sharing!

I'm working on a desktop application which has feature to let users query data across a bunch of different data sources, like csv, excel, and some others. The way I've implemented this is by first scanning all applicable files and populating a SQLite database with all the data from these files, a table per file essentially, and then allowing users to execute queries against the database. The database is updated whenever any of these files are changed on disk, but any writes to the database are not persisted back into the files on disk, so it's really a one-way kind of flow. This was all implemented mostly as a proof-of-concept, but it's been massively useful so we're probably going to expand on it this year, perhaps by allowing bi-directional workflows.

Even if the use cases and approach is different, it's nice and validating to see other projects with similar thoughts and ideas. Again, thanks for sharing!

cube2222

Glad you like it!

I think I've seen an open source project with the approach described by you too.

However, I think philosophically we differ in that one of our goals is to push down as much work as possible to the underlying databases and our next big upcoming milestone is streaming (Kafka, possibly database change streams).

But yes, it's great that a kind of ecosystem is forming around those ideas!

mstade

Absolutely agree regarding difference in approach – and I'm by no means suggesting that yours is in any way worse! In fact, I believe yours is much, much better given the more general use cases it supports. Ours is inherently stateful and requires careful coordination, which works for us because the system is pretty much self contained and it makes things easier, but I don't think it's a good generic solution to be honest.

My point was really that it's nice and validating to see the the whole idea of a unified way to "query all the things!" as it were isn't all that novel. :o)

By the way, if you can think of the name (or a URL even!) of that project you mention I'd be very interesting to take a look at that too. Much obliged!

cube2222

I didn't want to come of as suggesting you meant any of that, not at all! Just comparing approaches.

I think it was this: https://github.com/simonw/datasette though supports only CSV files.

There's also Apache Drill which works with a lot of data sources, which is philosophically closer to OctoSQL.