you might want to particularly look into murmur, spooky, and metrohash. I'm not exactly sure of what the tradeoffs involved are, or what your need is, but that site should serve as a good starting point at least.
I would probably choose different, more robust hash functions if I was targeting regular C.
What would you consider a short string? The latency of AESENC is 5 if I remember correctly. In that time you can FNV-1a a "short" string.
For longer strings I agree that WyHash and xxHash is better. I have a benchmark of both further down on FastHash's github page[1]. Of course, that is only a measure of speed, but both are high quality as well[2].
From the top of my head, t1ha, falkhash, meowhash and metrohash are using AES-NI and none of them are particularly fast on short inputs, and at least two of them have severe issues, despite guarding against lots of vulnerabilities, which your construction does not.
You are right regarding bulk-keying with SIMD outperforming any kind of scalar hashing one-at-the-time. I've build a database that does just that (requires keys to be available in bulk - like when doing bulk inserts) using SIMD, and the latency becomes negligible.
So the fastest hash functions on x86_64 without quality problems are: xxh3low, wyhash, ahash64, t1ha2_atonce, komihash...
In short: it shows why a simple word mixer alone is not enough for a proper hash. It is insecure, fails all tests, and is not as fast as expected. I would not even use it for PRNG's, but I haven't tested that usage yet.
It's indeed pretty fast ( among the top 5), but not particularly good. It fails some basic tests. Rather use xxh3, t1ha or farmhash
> SMHasher is a test and benchmark suite for a variety of hash function implementations. It can provide data for claims like “our new Foo hash function is faster than the widely used Bar, Baz and Qux hash functions”. However, when comparing Foo to CRC-32, be aware that a SIMD-accelerated CRC-32 implementation can be 47x faster than SMHasher’s simple CRC-32 implementation.
In fact smhasher implements both, the slow soft crc32 he cites, and the fastest crc32_pclmul, which is not just 47x faster, but 5000x faster. Other than wuff's implementation of the crc32_pclmul variant.
Compare https://github.com/rurban/smhasher/#smhasher
crc32 392.10 vs crc32_pclmul 1972140.38 MiB/sec
Reminds me a lot on ATSIt may be the fastest rust hash, but certainly not faster than other fast hashes. More like 2x slower.
xxh3, t1ha0, wyhash are all much faster on the 2 machines I tested it on, an old 7 years old Intel i5-2300, and a new Ryzen 3200U.
This is a good survey paper about the old, known, fast, simple one-line hashes. I've tested all of them in smhasher also. They are all *not* recommended for production use.
Passes all smhasher tests for a tiny size (157 byte on x64), but slow as every more secure hash. I would put it as alternative to people using siphash. Personally I see no use case for siphash at all. Slow, big, insecure, easily brute forcable.
https://www.strchr.com/hash_functions Is also a great write up of how some of the classic hashing algorithms perform of various data sets.
https://nullprogram.com/blog/2018/07/31/ is an interesting approach on generating hashing functions automatically.
What is the memory bandwidth on that instance? I ask because I'm not seeing that listed [1] but it would be a useful point of comparison. Maybe run Doctor Bandwidth's STREAM benchmark [2].
DiscoHash is included in SMHASHER [3] but its benchmark results aren't.
[1] https://cloud.google.com/compute/docs/machine-types
FarmHash
CityHash
SMHasher
MetroHash
FNV1a_YT
Note that if you're just using it for a hash table, where you'll compare the full strings to resolve hash collisions, you don't actually need good randomness. FNV1A_VT is faster than most, has worse collision properties, yet provides a faster overall hash table, because the faster hash (every operation) outweighs the extra full string checks (rare).
Simple benchmark is included (git close && make). The quality and speed could be checked by Reini Urban's (https://github.com/rurban/smhasher) and Yves Orton's (https://github.com/demerphq/smhasher) forks of SMHasher suite.
Reasonable to note: t1ha have two injection points of each data word inside the loop of entropy sponge.
Note that in contrast with what Andy or DJB say, the collision safety is not a problem of the hash function per se, as you cannot fix collision attacks with any "safer" hash function. You can easily brute-force even the worst of all siphash in under 4min. Safety begins with 256 bits, in a hash table you got typically 10-14, max 32 to attack. It is only making it marginably safer, but much slower. It is only doable by checking collision attacks in the collision scheme, either by collision counting or using a non-attackable collision scheme.
Unfortunately most implementations have no idea, and just go with the slowest of all. The latest such sin done in the linux network stack. I thought at least those guys were above that, but apparently not.
This simple multiplication hash is indeed superior, just with linear probing it has problems. You would use double hashing with another similar constant then. Or find two such constants dynamically, as this would be universal then.
And also note that the upper bits of a hash function are always superior to the lower bits. power-by-2 & checks are thus always worse than right-shifting as with this one.
The current leaders are on my smhasher site, https://github.com/rurban/smhasher/
Speed & number of cycles/hash is better with metro hash.
Both rurban and I have (independently) spent considerable time testing about 100 hash functions using the hash test suite smhasher. In addition I have greatly enhanced smhasher, and included test results and graphs in my version. Included in my fork is the --stream option so you can test different hash functions in counter mode with tools like the dieharder RNG test suite.
https://github.com/demerphq/smhasher https://github.com/rurban/smhasher
These are the FNV test results from my version of SMHasher:
FNV1a 32 32 32 FAILED 99 of 183 tests. Avalanche: 17-41, Crc-MultiCollision: 81, 83, 85-86, 88-91, 94, 96-98, 100, Cyclic: 42-52, Hi-Lo: 157-158, HiBit-Null: 150-152, Highbits2: 147-149, Highbits: 144-146, LowBit-Null: 153-155, Lowbits: 142-143, Murmur2-MultiCollision: 101-110, Murmur3A-MultiCollision: 111, 113, 117, 119-120, Murmur3F-MultiCollision: 122-127, 129-130, OverAll: 183, Text: 160-165, TwoBytes: 54-56, 63, Zeroes: 167-168
FNV1a_YT 32 32 32 FAILED 67 of 181 tests. Avalanche: 15-39, Differential: 9-14, Hi-Lo: 154-156, HiBit-Null: 148-150, Highbits2: 145-147, Highbits: 142-144, Lowbits: 139-141, OverAll: 181, Sparse: 74-78, Text: 157-158, 160-163, TwoBytes: 51, 53-61
FNV64 64 64 64 FAILED 66 of 183 tests. Avalanche: 17-41, Cyclic: 43-45, 47, 49, 51-52, Hi-Lo: 157-158, HiBit-Null: 151-152, Highbits2: 148-149, Highbits: 145-146, LowBit-Null: 154-155, Lowbits: 142-143, OverAll: 183, Sparse: 65-67, 69, 71, 73, 75, 77, 79-80, Text: 160-162, 164-165, TwoBytes: 54-56, 58, 60, 62-63, Words: 182, Zeroes: 167-168
You can see text view of the full list at:
https://github.com/demerphq/smhasher/blob/master/doc/summary...
Also after reading this thread I added some graphs comparing performance of other hashes, including some that I have developed using genetic algorithm for use in Perl. Some of those hashes have hash-time performance at the top of the envelope (StadtX is very fast).
https://github.com/demerphq/smhasher/blob/master/doc/graph/F... https://github.com/demerphq/smhasher/blob/master/doc/graph/F... https://github.com/demerphq/smhasher/blob/master/doc/graph/F... https://github.com/demerphq/smhasher/blob/master/doc/graph/F...
You may also be interested in looking at
https://github.com/demerphq/smhasher/blob/master/HashAnatomy...
Good luck!
PS - I didn't know a lot of the terms you used, but the Finite State Entropy (FSE) link you provided does a good job intro'ing some of them, and the linked paper Asymmetric numeral systems: entropy coding combining speed of Huffman coding with compression rate of arithmetic coding [4] (ANS) seems interesting.
[1] https://github.com/Cyan4973/xxHash
[2] https://github.com/rurban/smhasher
There are a lot of things that fundamentally deal with either systems that users interact with or that run on real machines with physical world implications that also make things perform differently, from user interaction latency and pre-computing and pre-caching, to SSD block level reads/writes, cpu caches and other caches, network latency implications, etc.
Similarly, on the algorithm front from things like Boyer–Moore to work being done on non-cryptographic hashes (ie, https://github.com/rurban/smhasher/ ) that is sometimes mind boggling.
The fastest and mostly-secure 64bit hash functions are Metro, Spooky, xxHash64 and Multilinear-HM by the same author. See "Strongly universal string hashing is fast", Daniel Lemire and Owen Kaser, 2014.
CLHash doesn't even pass the avalanche test from smhasher https://github.com/rurban/smhasher I'll add it there for better comparison, with the new leader falkhash: https://github.com/gamozolabs/falkhash
From the good enough hash functions, metrohash are by far the fastest, so they should be the new default on 64bit machines. On x86_64 and arm8 the crc variants are the best.
For 32bit we are stuck with Murmur still.
Comparable results for xxHash and the new metrohash will be uploaded soon. These things take time.
But I'm more interested in comparing it to HW CRC32-C, the fastest and best hash so far, 2x faster than metro.