Caching should be a by-product of the data flow design in an app. When we start focusing on the cache, as in how much we cache, when we cache and where we put the cache, we're losing the main point which is data flow.
There are two ways of projecting state changes via data flow. Push and pull. Pull is when the observer observes and data is fetched from data stores and computed on demand. That's the "lazy" approach.
Push is when every change produces an event and updates read models which are therefore never stale, they're either immediately or eventually consistent. That's the "eager" approach.
In the real world we want to have hybrid push/pull systems depending on the write/read ratio of a source model, otherwise the system is less efficient. A hybrid solution implies having "buffers" in the middle where a push ends, stores intermediate result and the pull then pulls from the intermediate result rather than going back to source.
Notice that in the above we have something like a cache, the intermediate result, but it doesn't get stale, as push solutions never leave a read model stale by definition.
We get stale when we pull, compute and store the result of a pull calculation and then try to reuse it. This actually works great if the computation is known to be from immutable data, OR, checking for mutation events in the source data and reusing cache is cheaper than recalculating the cache.
But if none of those are met, then using cache, as in "pull-derived stored result" in the first place is the incorrect solution, and no amount of shuffling things from databases to memory to disk and back would fix anything.
Gotta have the right data flow.
[0] makes it fairly easy to handle, via, say, inserting the persistent queries into a native table of it (which don't typically get persisted, though that could have changed in the past few months), and the keep-alive messages from the viewer bump the expire-at field in the query entry in the table. [1] is an example of how exactly the internal dataflow of such "demand-driven push" works, though it omits the expiry handling you'd want, and doesn't detail the interaction with the outside needed for live feeds.
What this doesn't emphasize is that due to the timestamping and consistent nature of the underlying dataflow engine, you can see when exactly the new query is in your output stream, and also e.g. bundle multiple queries together into one atomic transaction, to not get any tearing-style output/display artifacts.
The underlying Rust framework Differential Dataflow[2] is even more powerful, but also far less easy to use, due to the lack of a query optimizer. It arguably makes up by already supporting recursive/iterative computation and multi-temporal timestamps.
[0]: https://materialize.com/a-simple-and-efficient-real-time-app...
[1]: https://materialize.com/lateral-joins-and-demand-driven-quer...
[2]: https://github.com/TimelyDataflow/differential-dataflow