I've worked on moderately busy backend platforms (~10K-20k rps handled on a ~4 e5-2650 and aiming for 5ms 95p response times).

It greatly depends on what you're doing, but for the majority of systems which are read heavy (and that most certainly includes "dynamic" sites like Amazon or Wikipedia), I hold to two major beliefs:

1 - Have very long TTLs on your internal cache servers with a way to proactively purge (message queues) and refresh in the background. Caching shouldn't be a compromise between freshness and performance. Have both!

2 - Generate message/payloads/views asynchronously in background workers and have your synchronous path as streamlined as possible (select 1 column from 1 table with indexes filters). Avoid serialization. Precaculate and denormalize. Any personalization or truly dynamic content can be done: 1 - By having the client make separate requests for that data 2 - Merging the data into the payload with some composition. 3 - Glueing bytes together (easier/safer with protocol buffers than json)

Do things asynchronously. Use message queues / streams.

Beyond that, GC becomes noticeable. For example, Go's net/http used to (might still) allocate much more than other 3rd party options.

0 - Caching antipattern 101:

    key = calculate_cache_key()
    if not cache.has(key):
        data = expensive_calculation()
        cache.store(key, data)
    else:
        data = cache.get(key)

Why it's an antipattern ?

It’s susceptible to Thundering Herd whereby more requests come in for the same cache key before the initial computation is finished, and so you end up with lots of cache misses. The fix is usually to lock the cache key and have subsequent requests wait on the original computation but it’s a bit more complex to code.

I've heard it called Cache Stampede. Any decent framework for memoizing method calls would cover this case though.

Do you have an example of one? I'm curious what features they provide over ad hoc memoization mechanics.

If you're in the Java ecosystem, the CacheBuilder in Guava is pretty good: https://google.github.io/guava/releases/19.0/api/docs/com/go...

By default it handles the case of concurrent retrieval on the same key (the second one will just wait for the first one to finish and use that value rather than starting a duplicate computation). It also lets you configure more interesting things like eviction strategies, removal notifications, and statistics.

Last year was a lesson for me in why caches are a hard problem as I had to debug many cache issues from other people not thinking things through... (At least one of the issues was my own fault. :)) Since then whenever someone suggests we use a cache I instinctively pull out a set of questions[0] to ask. The three questions Guava has you consider can also lead you to using memcached or the like instead but my set tries to answer the question "Do you even need a cache?" and if so, generating helpful design documentation.

    Is the code path as fast as it can possibly be without the cache? Do you have numbers?
    Will the cache have a maximum size, or could it grow without bound?
    If it grows without bound, either because of unbounded entries or because of unbounded memory for any particular entry, under what conditions will it consume all system memory?
    If it has a maximum size, how does it evict things when it reaches that size?
    Are you trying to cache something that could change?
    If so, how is the cache invalidated?
    How can it be invalidated / evicted manually by another thread or signal? (Debuggability, testability, inspectability/monitorability, hit rate and other statistics?)
    Is there a race condition possibility for concurrent stores, retrieves, and evicts?
    How constrained are your cache keys, that is, what does it need to know about to create one?
    Do they need to take into account global info?
    Do they need to take into account contextual information (like organization ID if your application server runs in a multi-tenant system, or user ID, or browser type, or requested-language)?
    Or do they only depend on direct inputs to that code path?
[0] https://www.thejach.com/view/2017/6/caches_are_evil -- need to update it a bit but not much...
A coworker introduced me to Caffeine which I think shares authors with Guava's cache. Kind of a 2.0 /lessons learned iteration.

https://github.com/ben-manes/caffeine