There is actually quite a bit of complexity with load balancing, but the good news is that a lot of the complexity is understood and is configurable on the load balancer.

I think what Rachel calls a "suicide pact" is now commonly called a circuit breaker. After a certain number of requests fail, the load balancer simply removes all the backends for a certain period of time, and causes all requests to immediately fail. This attempts to mitigate the cascading failure by simply isolating the backend from the frontend for a period of time. If you have something like a "stateless" web-app that shares a database with the other replicas, and the database stops working, this is exactly what you want. No replica will be able to handle the request, so don't send it to any replica.

Another option to look into is the balancer's "panic threshold". Normally your load balancer will see which backends are healthy, and only route requests to those. That is what the load balancer in the article did, and the effect was that it overloaded the other backends to the point of failure (and this is a somewhat common failure mode). With a panic threshold set, when that many backends become unhealthy, the balancer stops using health as a routing criterion. It will knowingly send some requests to an unhealthy backend. This means that the healthy backends will receive traffic load that they can handle, so at least (healthy/total)% of requests will be handled successfully (instead of causing a cascading failure).

Finally, other posts mention a common case like running ab against apache/mysql/php on a small machine. The OOM eventually kicks in and starts killing things. Luckily, people are also more careful on that front now. Envoy, for example, has the overload manager, so you can configure exactly how much memory you are going to use, and what happens when you get close to the limits. For my personal site, I use 64M of RAM for Envoy, and when it gets to 99% of that, it just stops accepting new connections. This sucks, of course, but it's better than getting OOM killed entirely. (A real website would probably want to give it more than 64M of RAM, but with my benchmarking I can't get anywhere close with 8000 requests/second going through it... and I'm never going to see that kind of load.)

I guess the TL;DR is that in 2011 it sounded scary to have a "suicide pact" but now it's normal. Sometimes you've got to kill yourself to save others. If you're a web app, that is.

While not actively developing infrastructure myself, I've always like the concept presented in the Hystrix package: https://github.com/Netflix/Hystrix

Even though it seems it is no longer maintained, the circuit breakers, fail over modes and all that are well documented.

And I don't know why Hystrix hasn't been adopted by a wide audience yet. It seems like a necessity in the micro service landscape.