I have been waiting for this since the moment I first read about Materialize a year or two ago. I think there's still a lot of work to be done, but at heart, if you can pair technology like Materialize with an orchestration system like dbt, you can use dbt to keep your business logic extremely well organized, yet have all of your dependent views up to date all of the time, and use dbt even to use the same analytical layered views both for analytical AND operational purposes.

The biggest issue I see is that it requires you to be all-in on Materialize, and as a warehouse (or as a database for that matter), it's surely not as mature as Snowflake or Postgres.

Thank you for your kind words! We indeed have plenty of work to be done (and are thus hiring)! I'm curious however why you think this requires you to be all-in on Materialize. As you said better than I could have, dbt is amazing at keeping your business logic organized. Our intention is very much for dbt to standardize the modeling/business logic layer which allows you to use multiple backends as you see fit in a way that shares the catalog layer cleanly.

Our hope is that you have some BigQuery/Snowflake job that you're tired of running up the bill hitting redeploy 5 times a day, and you can cleanly port that over to Materialize with little work because the adapter is taking care of any small semantic differences in date handling, or null handling, etc. So Materialize sits cleanly side-by-side with Snowflake/BigQuery, and you're choosing whether you want things incrementally maintained with a few seconds of latency by Materialize, or once a day by the batch systems.

My view is you're likely going to want to do data science with a batch system (when you're in "learning mode" you try and keep as many things fixed, including not updating the dataset), and then if the model becomes a critical automated pipeline, rather than rerunning the model every hour and uploading results to a Redis cache or something, you switch it over to Materialize, and don't have to every worry about cache invalidation.

jdotjdot

I should clarify; I don't think that for the general case you have to go all-in on Materialize, but for the case in my comment--where you are effectively using business logic within Materialize views as the "source of truth" of logic across all of both your analytics and your operation--that requires buy-in. Additionally, if I'm _already_ sending all of my data to a database or to my data warehouse, ETLing all of that data to Materialize also is rather burdensome. Just because I technically could run Materialize side by side with something doesn't mean I necessarily want to, especially given the streaming use case requires a lot more maintenance to get right and keep running in production.

I fully agree with you that for many data science cases, you're likely to stick with batching. Where I see Materialize to be the most useful, and where I'd be inspired to use it and transform how we do things, would be the overlap between when Analytics team are writing definitions (e.g., what constitutes an "active user"?) and are typically doing so on the warehouse, but then I want those definitions to be used, up to date, and available everywhere in my stack, including analytics, my operational database, and third-party tools like marketing tools.

Personally, I'm less interested in one-off migrations like you're suggesting. What I really want is to have something like Materialize embedded in my Postgres. (Such a thing should be doable at minimum by running Materialize + Debezium side-by-side with Postgres and then having Postgres interact with Materialize via foreign data wrappers. It would need some fancy tooling to make it simple, but it would work.) In such a scenario, a Postgres + Materialize combo could serve as the "center of the universe" for all the data AND business definitions for the company, and everything else stems from there. Even if we used a big data warehouse in parallel for large ad hoc queries (which I imagine Materialize wouldn't handle well, not being OLAP), I would ETL my data to the warehouse from Materialize--and I'd even be able to include ETLing the data from the materialized views, pre-calculated. If I wanted to send data to third-party tools, I'd use Materialize in conjunction with Hightouch.io to forward the data, including hooking into subscriptions when rows in the materialized views change.

For what I propose, there are some open questions about data persistence, high availability, the speed to materialize an initial view for the first time, and backfilling data, among other things. But I think this is where Materialize has a good chance of fundamentally changing how analytical and operational data are managed, and I think there's a world where data warehouses would go away and you'd just run everything on Postgres + Materialize + S3 (+ Presto or similar for true OLAP queries). I could see myself using Materialize for log management, or log alerting. I'm just as excited to see pieces of it embedded in other pieces of infrastructure as I am to use it as a standalone product.

frafra

I agree with you. If you are focused on PostgreSQL, you could find interesting the work being done on IVM: https://wiki.postgresql.org/wiki/Incremental_View_Maintenanc... - https://github.com/sraoss/pgsql-ivm