I have a hard time understanding what you would use it for. I could understand a use-case, but I fail to understand why it would be that much useful.

I have a sense it allows much better performance for horizontal scaling, but I'm not sure...

Cilium's container networking and security product [0]; Facebook's Katran [1]; Netflix's flowsrus (not public yet, but see my tcplife tool in BCC[2]). That's just the beginning. It's not just for performance, it's also security, as while BPF programs run in kernel-mode they have a limited and secured API for interacting with the system (BPF helpers).

Since extended BPF is in the Linux kernel (and will be in other kernels in the future), everyone is getting it, and we'll see more use cases over time. In some ways it's like the birth of JavaScript for the browser, and all the new applications it made possible. But it goes further than that: we could still analyze and debug JavaScript applications using traditional tools. But BPF programs are neither process-space or kernel routines, and are outside the view of everything. No visibility in ps(1), top(1), or lsmod(8). We're having to create new tools to even see what's running on the CPUs. Every performance monitoring product that shows a process table with CPU consumption will now need a BPF program table as well.

[0] https://cilium.io/ [1] https://github.com/facebookincubator/katran [2] http://www.brendangregg.com/blog/2016-11-30/linux-bcc-tcplif...