I stumbled upon this while checking out Listmonk[0], and reading through the ZERODHA blog (a post on postgres was nice[1]), and was pretty impressed by how light but effective this pattern was -- writing read results to a temporary database is something I hadn't seen before, and is quite an interesting way to solve the problem of scaling large read queries, though it's probably not the first one would think of (probably that would be scaling up or adding read replicas).

[1]: https://zerodha.tech/blog/working-with-postgresql/

knadh

Curious to see sql-jobber here. I'm the author, if you have any questions.

> though it's probably not the first one would think of

Yep. The model makes sense when you have large databases (for us, it's hundreds of billions of rows in multi-DB Postgres instances) that have to directly serve complex read queries to millions of users pulling reports.

1) When you have users who access arbitrary reports over and over (for us, financial reports that don't change during the day), doing a simple `select *` from the cached report tables per user, where they can be thrown away at any time based on a TTL or other conditions, can be way faster and cheaper than creating and maintaining read replicas of slow-ish multi-TB databases to directly serve large volumes of concurrent queries.

2) Easy traffic control and queuing when you have large volumes of read requests coming through. This can be beneficial to large DBs serving reports to users irrespective of the kind of reports. In addition, when there is big, unexpected surge in user traffic, the reads just get queued and users just have to wait for longer to see the results, but the big databases never get overloaded.

hardwaresofton

> Curious to see sql-jobber here. I'm the author, if you have any questions.

Well it's good software! I was thoroughly impressed with a lot of the stuff you all have put out -- Listmonk was an immediate thought of something that could be better for my mailing list solution (I currently use Mailtrain). Listmonk is so good it feels like I could run a aaS off of it alone.

> Yep. The model makes sense when you have large databases (for us, it's hundreds of billions of rows in multi-DB Postgres instances) that have to directly serve complex read queries to millions of users pulling reports.

Yeah this was interesting to read about! Have you all seen the Incremental View Maintenance[0] feature that's currently being worked on? I'm not sure that it's actually better than what you have now, as it feels like it would increase load more and if what's working for you is working well there's no reason to change.

> 2) Easy traffic control and queuing when you have large volumes of read requests coming through. This can be beneficial to large DBs serving reports to users irrespective of the kind of reports. In addition, when there is big, unexpected surge in user traffic, the reads just get queued and users just have to wait for longer to see the results, but the big databases never get overloaded.

This is a huge benefit, and what I picked up on that I hadn't seen much before -- having people running reports wait is so much better than having the DB go down, and I don't think I'd seen what you all figured out suggested as much. People are kind of doing what you're doing with the general idea of a data warehouse or a data lake, but this seems like a step before that level of complication and is pretty accessible to administrators/users (no one has to spend a few days figuring out what a data warehouse is, etc).

[0]: https://github.com/sraoss/pgsql-ivm