It doesn't mention in the article why the kernel is so slow at processing network packets. I'm not a kernel programmer so this may be utterly wrong, but wouldn't it be possible to sacrifice some feature for speed by disabling it in the kernel code?
The example in the article found a single core handling 1.4M packets per second. If you're running a web-server shoveling data out to clients those packets are going to be close to the maximum size which, if I haven't screwed up the math, looks something like this:
1.4M * 1400 bytes (assuming a low MTU) * 8 (bytes -> bits) = 15Gbps
That's not to say that there isn't still plenty of room to improve and, as lukego noted, there's a lot of work in progress (see e.g. https://lwn.net/Articles/615238/ on work to batch operations to avoid paying some of the processing costs for every packet) but for the average server you'd find bottlenecks on something like a database, application logic, request handling, client network capacity, etc. before the network stack overhead is your greatest challenge. The people who encounter this tend to be CDN vendors like CloudFlare and security people who need to filter, analyze, or generate traffic on levels which are at least the the scale of a large company (e.g. https://github.com/robertdavidgraham/masscan).