It doesn't mention in the article why the kernel is so slow at processing network packets. I'm not a kernel programmer so this may be utterly wrong, but wouldn't it be possible to sacrifice some feature for speed by disabling it in the kernel code?

One thing to remember is that high-speed packet filtering is an unusual workflow and CloudFlare operates at a much greater scale than most of us see: most Linux devices are not connected to 10G, much less 100G, networks and they're usually doing more work than looking at a packet to decide whether to accept or reject it. The fact that the APIs and the kernel stack were designed many years before those kind of speeds were possible doesn't matter because most sites don't have that much traffic and most server applications will bottleneck at doing other work well before that point.

The example in the article found a single core handling 1.4M packets per second. If you're running a web-server shoveling data out to clients those packets are going to be close to the maximum size which, if I haven't screwed up the math, looks something like this:

1.4M * 1400 bytes (assuming a low MTU) * 8 (bytes -> bits) = 15Gbps

That's not to say that there isn't still plenty of room to improve and, as lukego noted, there's a lot of work in progress (see e.g. https://lwn.net/Articles/615238/ on work to batch operations to avoid paying some of the processing costs for every packet) but for the average server you'd find bottlenecks on something like a database, application logic, request handling, client network capacity, etc. before the network stack overhead is your greatest challenge. The people who encounter this tend to be CDN vendors like CloudFlare and security people who need to filter, analyze, or generate traffic on levels which are at least the the scale of a large company (e.g. https://github.com/robertdavidgraham/masscan).