What does HackerNews think of noria?
Fast web applications through dynamic, partially-stateful dataflow
Sounds a bit like Noria: https://github.com/mit-pdos/noria
I'm so excited to see dynamic, partially-stateful data-flow for incremental materialized view maintenance becoming more wide-spread! I continue to think it's a _great_ idea, and the speed-ups (and complexity reduction) it can yield are pretty immense, so seeing more folks building on the idea makes me very happy.
The PlanetScale blog post references my original "Noria" OSDI paper (https://pdos.csail.mit.edu/papers/noria:osdi18.pdf), but I'd actually recommend my PhD thesis instead (https://jon.thesquareplanet.com/papers/phd-thesis.pdf), as it goes much deeper about some of the technical challenges and solutions involved. It also has a chapter (Appendix A) that covers how it all works by analogy, which the less-technical among the audience may appreciate :) A recording of my thesis defense on this, which may be more digestible than the thesis itself, is also online at https://www.youtube.com/watch?v=GctxvSPIfr8, as well as a shorter talk from a few years earlier at https://www.youtube.com/watch?v=s19G6n0UjsM. And the Noria research prototype (written in Rust) is on GitHub: https://github.com/mit-pdos/noria.
As others have already mentioned in the comments, I co-founded ReadySet (https://readyset.io/) shortly after graduating specifically to build off of Noria, and they're doing amazing work to provide these kinds of speed-ups for general-purpose relational databases. If you're using one of those, it's worth giving ReadySet a look to get these kinds of speedups there! It's also source-available @ https://github.com/readysettech/readyset if you're curious.
> Noria is a new streaming data-flow system designed to act as a fast storage backend for read-heavy web applications based on Jon Gjengset's Phd Thesis, as well as this paper from OSDI'18. It acts like a database, but precomputes and caches relational query results so that reads are blazingly fast. Noria automatically keeps cached results up-to-date as the underlying data, stored in persistent base tables, change. Noria uses partially-stateful data-flow to reduce memory overhead, and supports dynamic, runtime data-flow and query change.
Here's an excellent interview with the creator: https://corecursive.com/030-rethinking-databases-with-jon-gj...
Great that someone is productionalizing this!
Btw, is Jon involved in ReadySet?
[1] Partial State in Dataflow-Based Materialized Views - https://github.com/mit-pdos/noria
Once your developers have completed an iteration, your DB will see the same queries over and over again (if it doesn't, then it should be an OLAP aggregate). These databases optimize for writes, and defer complexity to reads and, considering that you could see millions more reads than writes, makes no sense whatsoever.
But I think you are wrong about Rust not having the right machinery for making high performance dbs. Two examples are Noria and Materialize
https://github.com/mit-pdos/noria
and it its 50k lines, in the immediate codebase, there are 40 uses of unsafe.
In Materialize's 125k of Rust, there are 76 direct uses of unsafe.
This is a very hard problem to do the right way and probably would need some changes on the RDBMS itself. You would need to monitor all tables that might affect your query for changes and how these changes affect your query (say you're just reading a value, aggregating with sum, doing average with count of rows, the list goes on). Add more complexity on top of that if you want to support querying from other views that also aggregate the data on your query.
As pointed on another comment there's DB Noria [0] but I'm not sure how production ready it's right now. You an idea of the complexity of the task on a interview with one of the project leads [1].
[0] https://github.com/mit-pdos/noria [1] https://corecursive.com/030-rethinking-databases-with-jon-gj...
Jon Gjengset presents his work [2] on materialized views, cache maintenance, and incremental updates, in the context of web databases. His main contributions are around partial state materialization of the hot subset of the data with downstream propagation of the updates and upstream requests to create the missing entries on-demand. These ideas have been implemented with Noria [3], a MySql compatible database designed for mostly-read workloads. The performance measured on lobsters-like workload is really impressive.
[1] https://news.ycombinator.com/item?id=24853783