In "The Future of Data Systems", the author imagines a system where the application writes events to a Kafka-like distributed log. Consumers of the log do work like de-duping, committing the data to a RDBMS, invalidating caches, updating search indexes, etc. The application might read state directly from log updates, or have a system to sync w/ some sort of atomic state (e.g. RethingDB changefeeds).
The architecture seems to solve two big problems
* Scaling RDBMS (there are solutions like Cloud Spanner but they rely heavily on Google's proprietary network for low latency and are expensive as balls)
* Keeping downstream systems in sync. A lot of companies have gone with a "Tail the RDBMS" of some kind, e.g. writing the MySQL binlog to a Kafka queue and having downstream consumers read from that, but this seems like a more elegant solution.
Are there any examples or experiences of people working with systems like this? What are some downsides, challenges, and actual benefits?
I have been involved in using a system built like this. All I can say is... It feels like you're building a database out of an event stream.
A shitty one at that... Basically the write log part, only without a way to apply that state reliably like a real database. So you have to keep the log around basically forever. It's like you're in the middle of a DB recovery all the time.
After insane amounts of research and deep thought my personal opinion is that this is the wrong way to do scalable systems. Event sourcing and eventual consistency are taking industry for a ride in the wrong direction.
In my quest to find a better way I found some research/leaks/opinions of Googlers, and I think they're right. Even Netflix admits that using eventual consistency means they have to build systems that go around and "fixup" data that ends up in bad states. Ew. Service RPC loops in any such systems are Pandora's box. Are these calls getting the most recently updated version of the data? Nobody knows. Even replaying the event log can't save you, the log may be strongly ordered but the data state between services that call each other is party determined by timing. Undefined behavior.
You'll notice that LinkedIn/Netflix/Uber etc all seem to be building their systems using this pattern. Who is conspicuously absent? Google. The father of containers, VM's, and highly distributed systems is mum.
Researching Google's systems gives some fascinating answers to the problem of distributed consistency, a solution I'm stunned hasn't seen more attention. Google decided as early as 2005 that eventually consistent systems were too hard to use and manage. All of their databases, BigTable, MegaStore, Spanner, F1... They're all strongly consistent in certain ways.
How does Google do it? They make the database the source of truth. Service RPC calls either fail or succeed immediately. Service call loops, while bad for performance, produce consistent results. Failures are easy to find because data updates either succeed or fail immediately, not in some unbouded future time.
The rest of the industry is missing the point of microservices IMO. Google's massively distributed systems are enabled largely by their innovative database designs. The rest of the industry is trying to replicate the topography of Google's internal systems without understanding what makes them work well.
For microservices to be realistically usable for most use cases we need someone to come up with decent competition to Google's database systems. When you have a transactional distributed database all the problems with data spread across multiple services goes away.
HBase was a good attempt but doesn't get enough love. A point missed in the creation of HBase, that becomes clear when reading the papers about MegaStore and Spanner, is that it wasn't designed to be used as a data store by itself. Instead, it has the minimal features needed to build a MegaStore on top of it. The weirder features of HBase/BigTable (like keeping around 3 copies of changed data, and row level atomicity without transactions) are clearly designed to make it possible to build a database on top of it.
Unfortunately nobody thus far has taken up that challenge, and outside Google were all stuck with shitty databases that Google tossed away a decade ago.