What does HackerNews think of xxHash?

Extremely fast non-cryptographic hash algorithm

Language: C

#8 in C
If you're only using the hash for non-cryptographic applications, there are much faster hashes: https://github.com/Cyan4973/xxHash

SHA1 and MD5 are the most widely accessible, though, and I agree it's fine to use them if you don't care about security.

> For protocol 30 and beyond (first supported in 3.0.0), the checksum used is MD5. For older protocols, the checksum used is MD4.

Newer versions (≥3.2?) support xxHash and xxHash3:

* https://github.com/WayneD/rsync/blob/master/checksum.c

* https://github.com/Cyan4973/xxHash

* https://news.ycombinator.com/item?id=19402602 (2019 XXH3 discussion)

Agree with everything you say except that the post didn't mention non-cryptographic hashing algos that can be driven that hard. xxHash[1] (and especially XXH3) is almost always the fastest hashing choice, as it both is fast and has wide language support.

Sure there are some other fast ones out there like cityhash[2] but there aren't good Java/Python bindings I'm aware of and I wouldn't recommend using it in production given the lack of wide-spread use versus xxhash which is used by LZ4 internally and in databases all over the place.

[1] https://github.com/Cyan4973/xxHash [2] https://github.com/google/cityhash

How does it compare to XXH3?

https://github.com/Cyan4973/xxHash

https://github.com/rurban/smhasher#readme says XXH3 is 16GB/s while meow is 17GB/s.

I might be dumb about estimating throughput. According to https://github.com/Cyan4973/xxHash, the best hash function can only do 100s M hashes per second, how can a local cache run at such throughput? I assume when measuring cache throughput, one need to calculate hash, look up, (maybe compare keys), and copy the data.
I've been comparing other hashes to MD5 during the last few months to compare and identify large photos and movie files. My experience is that xxhash is much faster than good old MD5.

You can find some numbers at it's site here: https://github.com/Cyan4973/xxHash

Or maybe xxHash would work for duplicate finding? https://github.com/Cyan4973/xxHash
Also in the super fast, but not designed to be cryptographically secure category:

xxhash (https://github.com/Cyan4973/xxHash) with 32/64 bit output. The latest version, xxh3, supports up to 128 bit output.

meow hash (https://github.com/cmuratori/meow_hash)

The recently released Blake3 which is designed to be cryptographically secure is very fast also (https://github.com/BLAKE3-team/BLAKE3)

OK, after looking into it for just a few minutes, I found out: 1) CRC32C is not nearly as collision-resistant as many common fast non-cryptographical hashes. 2) there are many portable hash functions that are faster (and I think this is due to their bit size.) See: https://github.com/Cyan4973/xxHash and https://github.com/leo-yuriev/t1ha

So, my initial assumption was wrong. I'll be trying Meow Hash and some others out!