What does HackerNews think of pgsql-ivm?

Why did Prolog lose steam? (2010) | Apr 2023

Amen!

TypeDB[1] is also doing interesting things. Work on incremental view maintenance is also carrying the deductive database vision forward, but under another banner, such as Materialize[2] (based on Datalog, but speaks SQL) or pgsql-ivm[3].

[1]: vaticle.com/typedb [2]: materialize.com [3]: https://github.com/sraoss/pgsql-ivm/

A journey to io_uring, AIO and modern storage devices | May 2021

Expand Context ↕

> pgBackRest is probably strictly better than wal-g for our use-case, but I already knew wal-g was at least satisfactory in solving our problems (from previous usage in implementing something Heroku-Dataclips-alike before), so I threw it on and filed an issue for doing a compare-and-contrast down the line. So far it hasn't caused too many issues, and does indeed solve our main issue: allowing truly-async replication (e.g. replication to DCs in other countries, where the link has a huge RTT, high potential packet loss, and might even go down for hours at a time due to, say, politics — without that causing any kind of master-side buffering or slowdown.)

I always thought that dataclips was just them running on ZFS but honestly this would make more sense. I don't use Heroku much so I didn't see any limitations that jumped out to me as "oh they have that limitation because they're doing pgbackrest" but.

> Not yet, but we've certainly been considering it. Probably something we'll do when we have a bit more breathing room (i.e. when we get out from under all the scaling we're doing right now.)

Yup sounds reasonable, I certainly am not a customer (nor in a position to be one really, as I don't do any blockchain) so satisfying me brings you almost negative ROI. Scaling sounds fun in and of itself (when it isn't terrifying though) -- thanks for sharing so much about the setup!

> Nothing like that per se (though it'd very much be a data-layer match for how we're doing infra management using k8s "migrations" through GitOps. Imagine SQL schema-definition k8s CRDs.)

Now this is fascinating -- I'm also very much sold on k8s and SQL-schemas-as-CRDs sounds like it'd be relatively easy to implement and very high ROI. Migrations were always a sort of weird infrastructure-mixed-with-dev thing, and being able to tie the CRDs for migrations to the StatefulSet running the DB and the Deployment running the application seems like it could open up a whole new world of more intelligent deployment. cool stuff.

> But that's mostly because — to our constant surprise — our PG cluster's query plans improve the more we normalize our data. We hardly build matviews or other kinds of denormalized data representations at all (and when we do, we almost always eventually figure out a partial computed-expression GIN index or something of the sort we can apply instead.)

I don't know how people use open source DBs that are not postgres. I mean I do, but I can't wait until zheap removes the most common reason for people choosing MySQL to begin with.

> We are planning to offer our users something like dbt, and we may or may not use dbt under the covers to manage it. (Analogous to how GitHub created their own git library to use in their Rails backend.)

dbt does look pretty good, it seems like just enough structure. I do want to also bring up ddbt[0] if ya'll do some research/exploratory sprint on it.

> One thing we do want to enable is avoiding the recomputation of intermediate "steps" within long chained CTE queries (imagine: taking 10 seconds to compute A, and then using it in a join to quickly select some Bs; then, in another query, taking another 10 seconds to compute A again, because now you need to select Cs instead), without keeping around any sort of persistent explicitly-materialized intermediate, or writing ugly procedural logic. https://materialize.com/ seems promising for that. (Just wish it was a library I could shove inside PG, rather than an external system that is apparently going to grow to having its own storage engine et al.)

Interesting -- would Incremental View Maintenance[2] work for this? Don't know if this is useful to you but the dbt conference I got sucked into watching most of had a talk with the materialize CEO and some other people. I don't remember it being particularly high signal (I mean let's be real it's basically a 3-way ad for the CEO involved) but it might be worth a watch[1].

> We've always mentioned to people that Tableau has first-class support for connecting to/querying Postgres. Given the type of data we deal with, that's usually all they need to know. :)

I really underestimate how much Tableau is the standard (because I'm not a data person probably) -- basically between that, looker and PowerBI I'm not sure people use much else, even if it's open source/free.

[0]: https://github.com/monzo/ddbt

[1]: https://www.youtube.com/watch?v=0DDNjB4O0As

[2]: https://github.com/sraoss/pgsql-ivm

SQL-jobber: a lightweight SQL job-server for scaling read queries | May 2021

Expand Context ↕

> Curious to see sql-jobber here. I'm the author, if you have any questions.

Well it's good software! I was thoroughly impressed with a lot of the stuff you all have put out -- Listmonk was an immediate thought of something that could be better for my mailing list solution (I currently use Mailtrain). Listmonk is so good it feels like I could run a aaS off of it alone.

> Yep. The model makes sense when you have large databases (for us, it's hundreds of billions of rows in multi-DB Postgres instances) that have to directly serve complex read queries to millions of users pulling reports.

Yeah this was interesting to read about! Have you all seen the Incremental View Maintenance[0] feature that's currently being worked on? I'm not sure that it's actually better than what you have now, as it feels like it would increase load more and if what's working for you is working well there's no reason to change.

> 2) Easy traffic control and queuing when you have large volumes of read requests coming through. This can be beneficial to large DBs serving reports to users irrespective of the kind of reports. In addition, when there is big, unexpected surge in user traffic, the reads just get queued and users just have to wait for longer to see the results, but the big databases never get overloaded.

This is a huge benefit, and what I picked up on that I hadn't seen much before -- having people running reports wait is so much better than having the DB go down, and I don't think I'd seen what you all figured out suggested as much. People are kind of doing what you're doing with the general idea of a data warehouse or a data lake, but this seems like a step before that level of complication and is pretty accessible to administrators/users (no one has to spend a few days figuring out what a data warehouse is, etc).

[0]: https://github.com/sraoss/pgsql-ivm

Introducing dbt + Materialize | Mar 2021

Expand Context ↕

I agree with you. If you are focused on PostgreSQL, you could find interesting the work being done on IVM: https://wiki.postgresql.org/wiki/Incremental_View_Maintenanc... - https://github.com/sraoss/pgsql-ivm