What does HackerNews think of katran?

A high performance layer 4 load balancer

Language: C

HTTP/3/QUIC supports migrating connections between two networks, such as if a user switches from WIFI to LTE. IPVS or any UDP load balancer won't handle this scenario properly since it doesn't introspect the QUIC header and load balance based on the QUIC connection ID. This QUIC connection ID allows for a stable connection when the device needs to switch networks. If operators have any sort load balancer (like IPVS) between the client and the point the HTTP/3 connection is terminated, they will need to ensure that it has proper support for QUIC. One example is Katran[1] which has support for this method of load balancing.

[1] https://github.com/facebookincubator/katran

While I don't knwo the actual answer, a good place to look may be one of the eBPF load balancers like "Katran" from Facebook. I imagine it's needing to do that sort of thing. But no idea if it's attaching at the same level. I haven't really explained eBPF outside of tracing.

https://github.com/facebookincubator/katran

We had a large rundown of our Traffic Infrastructure some time ago[1]. TL;DR is:

* First level of loadbalancing is DNS[2]. here we try to map user to a closest PoP based on metrics from our clients.

* User to a PoP path after that mostly depends on our BGP peering with other ISPs (we have an open peering policy[3], please peer with us!)

* Within the PoP we use BGP ECMP and a set of L4 loadbalancers (previously IPVS, now Katran[4]) that encapsulate traffic and DSR it to L7 balancers (previously nginx, now mostly Envoy.)

Overall, we have ~25 PoPs and 4 datacenters.

[1] https://dropbox.tech/infrastructure/dropbox-traffic-infrastr... [2] https://dropbox.tech/infrastructure/intelligent-dns-based-lo...

[3] https://www.dropbox.com/peering [4] https://github.com/facebookincubator/katran

Cilium's container networking and security product [0]; Facebook's Katran [1]; Netflix's flowsrus (not public yet, but see my tcplife tool in BCC[2]). That's just the beginning. It's not just for performance, it's also security, as while BPF programs run in kernel-mode they have a limited and secured API for interacting with the system (BPF helpers).

Since extended BPF is in the Linux kernel (and will be in other kernels in the future), everyone is getting it, and we'll see more use cases over time. In some ways it's like the birth of JavaScript for the browser, and all the new applications it made possible. But it goes further than that: we could still analyze and debug JavaScript applications using traditional tools. But BPF programs are neither process-space or kernel routines, and are outside the view of everything. No visibility in ps(1), top(1), or lsmod(8). We're having to create new tools to even see what's running on the CPUs. Every performance monitoring product that shows a process table with CPU consumption will now need a BPF program table as well.

[0] https://cilium.io/ [1] https://github.com/facebookincubator/katran [2] http://www.brendangregg.com/blog/2016-11-30/linux-bcc-tcplif...