Count min sketches are really neat. A colleague at my old company used them to implement DDoS mitigations in eBPF and wrote it up: https://blog.cloudflare.com/building-rakelimit/

The code is also open source, and I've improved on it a bit: https://github.com/lmb/socklimit Not production ready but a cool idea and implementation.

I was wondering anyone could shed some light on the "These hash functions should be pairwise independent" part. I don't know what pairwise independent means. My background is mostly in web so I am familiar with things like `sha256`, `sha1`, `md5` things like that. Would you use these kind of functions for Count-min Sketch?

I saw in your socklimit project looks like `fasthash64` and `hashlittle` I'm not familiar with those any insight or recommended reading to understand these hash functions?

p.s. googling pairwise independent hash functions did get me some college class reading but doesn't mention any named hash functions developed out in the world.

We ended up with fasthash64 and lookup3 by looking for a fast hash that is easy to port to the restricted subset of C supported by eBPF with minimal changes. https://github.com/rurban/smhasher is a great resource for that.

I would probably choose different, more robust hash functions if I was targeting regular C.