I wonder if dormando who sometimes comes around would care to run memcache with the same traces as are used in this paper, which are available at https://github.com/twitter/cache-trace. I'm not sure I care about a cache that can scale to 24 cores, as in my experience I usually end up with hundreds of caches each with a few cores rather than fewer, bigger cache servers, but it still would be interesting to see what memcached can do.

Based on his response on Twitter I'd guess not https://twitter.com/dormando/status/1381706098290225152

But also agree with you that in that high concurrency usually not needed when using distributed caching. As I said in my tweet response (https://twitter.com/thinkingfish/status/1382039915597164544) IO dominates. And there's failure domain to consider. However, I've been asked a few times now about having the storage module used by itself or via message over shared memory in a local setting. That may very well present different requirements on cache scalability.

> However, if this design is used as a local cache over shared memory for a high throughput system (trading? ML feature store?) the scalability can be handy.

In that case, one might be concerned about the hit rates. While FIFO & LRU have been shown to work very well for a remote cache, especially in social network workloads, it is a poor choice in many other cases. Database, search, and analytical workloads are LFU & MRU biased due to record scans. I'd be concerned that Segcache's design is not general purpose enough and relies too heavily on optimizations that work for Twitter's use cases.

Unfortunately as applications have dozens of local caches, they are rarely analyzed and tuned. Instead implementations have to monitor the workload and adapt. Popular local caches can concurrently handle 300M+ reads/s, use an adaptive eviction policy, and leverage O(1) proactive expiration. As they are much smaller, there is less emphasis on minimizing metadata overhead and more on system performance, e.g. using memoization to avoid SerDe roundtrips to the remote cache store. See for example Expedia using a local cache to reduce their db reads by 50% which allowed them to remove servers, hit SLAs, and absorb spikes (at a cost of ~500mb) [1].

[1] https://medium.com/expedia-group-tech/latency-improvement-wi...

I might be dumb about estimating throughput. According to https://github.com/Cyan4973/xxHash, the best hash function can only do 100s M hashes per second, how can a local cache run at such throughput? I assume when measuring cache throughput, one need to calculate hash, look up, (maybe compare keys), and copy the data.