What does HackerNews think of noria?

Fast web applications through dynamic, partially-stateful dataflow

Language: Rust

> Automatically managed, application-transparent, physical denormalisation entirely managed by the database is something I am very, very interested in.

Sounds a bit like Noria: https://github.com/mit-pdos/noria

It feels more than a little bit coincidental to call it Noria when https://github.com/mit-pdos/noria exists (and has been posted about here on HN)... especially with the whole bit about incrementally computing changes.
:wave: Author of the paper this work is based on here.

I'm so excited to see dynamic, partially-stateful data-flow for incremental materialized view maintenance becoming more wide-spread! I continue to think it's a _great_ idea, and the speed-ups (and complexity reduction) it can yield are pretty immense, so seeing more folks building on the idea makes me very happy.

The PlanetScale blog post references my original "Noria" OSDI paper (https://pdos.csail.mit.edu/papers/noria:osdi18.pdf), but I'd actually recommend my PhD thesis instead (https://jon.thesquareplanet.com/papers/phd-thesis.pdf), as it goes much deeper about some of the technical challenges and solutions involved. It also has a chapter (Appendix A) that covers how it all works by analogy, which the less-technical among the audience may appreciate :) A recording of my thesis defense on this, which may be more digestible than the thesis itself, is also online at https://www.youtube.com/watch?v=GctxvSPIfr8, as well as a shorter talk from a few years earlier at https://www.youtube.com/watch?v=s19G6n0UjsM. And the Noria research prototype (written in Rust) is on GitHub: https://github.com/mit-pdos/noria.

As others have already mentioned in the comments, I co-founded ReadySet (https://readyset.io/) shortly after graduating specifically to build off of Noria, and they're doing amazing work to provide these kinds of speed-ups for general-purpose relational databases. If you're using one of those, it's worth giving ReadySet a look to get these kinds of speedups there! It's also source-available @ https://github.com/readysettech/readyset if you're curious.

It seems similar to MIT's Noria [1]

> Noria is a new streaming data-flow system designed to act as a fast storage backend for read-heavy web applications based on Jon Gjengset's Phd Thesis, as well as this paper from OSDI'18. It acts like a database, but precomputes and caches relational query results so that reads are blazingly fast. Noria automatically keeps cached results up-to-date as the underlying data, stored in persistent base tables, change. Noria uses partially-stateful data-flow to reduce memory overhead, and supports dynamic, runtime data-flow and query change.

[1] https://github.com/mit-pdos/noria

Materialize is really neat, also checkout https://github.com/mit-pdos/noria. It inverts the query problem and processes the data on insert. Exactly like what most applications end up doing using a no-sql solution.
Here's a database that does much better than this, it creates and fine-grainedly updates materialized views of the queries you're using: https://github.com/mit-pdos/noria

Here's an excellent interview with the creator: https://corecursive.com/030-rethinking-databases-with-jon-gj...

This looks really nice. I was reading about this and realized it had similar ideas to this [1] Phd thesis from Jon Gjengset. I checked his twitter [2] and it was based on his work indeed.

Great that someone is productionalizing this!

Btw, is Jon involved in ReadySet?

[1] Partial State in Dataflow-Based Materialized Views - https://github.com/mit-pdos/noria

[2] https://twitter.com/jonhoo/status/1537474261689872384

I haven't read much about Noria other than this readme [0], but would like to know if you are familiar enough to contrast Materialize [1] with it in terms of perf, overhead, approach, and fundamental (design) principles?

[0] https://github.com/mit-pdos/noria

[1] https://github.com/MaterializeInc/materialize

Write-expensive, read-cheap[1]: the exact opposite of the mentioned.

Once your developers have completed an iteration, your DB will see the same queries over and over again (if it doesn't, then it should be an OLAP aggregate). These databases optimize for writes, and defer complexity to reads and, considering that you could see millions more reads than writes, makes no sense whatsoever.

[1]: https://github.com/mit-pdos/noria

I always love your take even if I don't agree, SpaceCurve was a phenomenal system, one of the most pragmatic, high performance, easy to use MPP database systems I have ever used. We never met btw, was just a user.

But I think you are wrong about Rust not having the right machinery for making high performance dbs. Two examples are Noria and Materialize

https://github.com/mit-pdos/noria

and it its 50k lines, in the immediate codebase, there are 40 uses of unsafe.

In Materialize's 125k of Rust, there are 76 direct uses of unsafe.

https://github.com/MaterializeInc/materialize

> I guess it's relatively easy to do

This is a very hard problem to do the right way and probably would need some changes on the RDBMS itself. You would need to monitor all tables that might affect your query for changes and how these changes affect your query (say you're just reading a value, aggregating with sum, doing average with count of rows, the list goes on). Add more complexity on top of that if you want to support querying from other views that also aggregate the data on your query.

As pointed on another comment there's DB Noria [0] but I'm not sure how production ready it's right now. You an idea of the complexity of the task on a interview with one of the project leads [1].

[0] https://github.com/mit-pdos/noria [1] https://corecursive.com/030-rethinking-databases-with-jon-gj...

There is also the research DB Noria[0] that's based on this idea. It maintains materialized views for queries and efficiently updates them when the data changes.

[0] https://github.com/mit-pdos/noria

This has already been posted [1], but I do think that this video deserves a better audience.

Jon Gjengset presents his work [2] on materialized views, cache maintenance, and incremental updates, in the context of web databases. His main contributions are around partial state materialization of the hot subset of the data with downstream propagation of the updates and upstream requests to create the missing entries on-demand. These ideas have been implemented with Noria [3], a MySql compatible database designed for mostly-read workloads. The performance measured on lobsters-like workload is really impressive.

[1] https://news.ycombinator.com/item?id=24853783

[2] https://jon.thesquareplanet.com/papers/phd-thesis.pdf

[3] https://github.com/mit-pdos/noria