FNV-1a seems like a questionable choice. You can get higher quality and throughput for short strings from a hash that uses AES instructions or wyhash, or for long strings maybe from xxhash.
> If your input is made of integers, or is a short, fixed length, use an integer permutation, particularly multiply-xorshift. It takes very little to get a sufficient distribution. Sometimes one multiplication does the trick. Fixed-sized, integer-permutation hashes tend to be the fastest, easily beating fancier SIMD-based hashes, including AES-NI. For example:
This is... probably not true? If you want to hash exactly 16 bytes, you can do it in 4 instructions on x64 like this:
const u8x16 chunk0 = _mm_loadu_si128((u8x16*)data);
const u8x16 h10 = _mm_aesenc_si128(chunk0, key);
const u8x16 h20 = _mm_aesenc_si128(h10, key);
const u64 hash = _mm_extract_epi64(h20, 0);
If the total latency of these instructions is a bit higher than the total latency of the 8 instructions in the provided uuid1_hash, but if you want to hash more than 1 value, you may find that hashing more values allows the 4-instruction approach to achieve higher throughput than the 8-instruction approach. Even 2 values at a time should do it. For example on Haswell the combined latency seems to be 16 cycles for load+aesenc+aesenc+store vs. 12 for 2 shifts, 2 multiplies, 2 xors and 2 adds (the adds and multiplies are arranged in such a way that lea may not be used). I tried uuid1_hash on godbolt.org and it actually emitted a bunch of mov's in addition to what I expected.I am of course not an expert in these matters, and the fine author is invited to publish a benchmark or to demonstrate the superiority of his hash table and his method for hashing short strings at https://highload.fun/tasks/2
What would you consider a short string? The latency of AESENC is 5 if I remember correctly. In that time you can FNV-1a a "short" string.
For longer strings I agree that WyHash and xxHash is better. I have a benchmark of both further down on FastHash's github page[1]. Of course, that is only a measure of speed, but both are high quality as well[2].
From the top of my head, t1ha, falkhash, meowhash and metrohash are using AES-NI and none of them are particularly fast on short inputs, and at least two of them have severe issues, despite guarding against lots of vulnerabilities, which your construction does not.
You are right regarding bulk-keying with SIMD outperforming any kind of scalar hashing one-at-the-time. I've build a database that does just that (requires keys to be available in bulk - like when doing bulk inserts) using SIMD, and the latency becomes negligible.