I know some people might find it a little controversial, but I’m super excited about our load balancing future and that we probably have the biggest Envoy deployment in the world now. When we moved most of Dropbox traffic to Envoy, we had to seamlessly migrate a system that already handles tens of millions of open connections, millions of requests per second, and terabits of bandwidth. This effectively made us into one of the biggest Envoy users.
Well, a single server doesn't really need to do more than 10Gbps or 100k connections. Going above is a "simple" matter of managing horizontal scaling.
What I wonder about is how do you distribute the traffic on the higher level? I imagine there are separate clusters of envoys to serve different configurations/applications/locations? How many datacenters does dropbox have?
I was running a comparable setup in a large company, all based on HAProxy, there was a significant amount of complexity in routing requests to applications that might ultimately be in any of 30 datacenters.
* First level of loadbalancing is DNS[2]. here we try to map user to a closest PoP based on metrics from our clients.
* User to a PoP path after that mostly depends on our BGP peering with other ISPs (we have an open peering policy[3], please peer with us!)
* Within the PoP we use BGP ECMP and a set of L4 loadbalancers (previously IPVS, now Katran[4]) that encapsulate traffic and DSR it to L7 balancers (previously nginx, now mostly Envoy.)
Overall, we have ~25 PoPs and 4 datacenters.
[1] https://dropbox.tech/infrastructure/dropbox-traffic-infrastr... [2] https://dropbox.tech/infrastructure/intelligent-dns-based-lo...
[3] https://www.dropbox.com/peering [4] https://github.com/facebookincubator/katran