Don't do this. Use a real hash function that guarantees a highly random distribution, make your hash tables power-of-two sized, and map from hash value to table index using (hash & (size-1)).

The fibonacci constant thing will help clean up the distribution of a bad hash function, but it does nothing for collision resistance if the underlying hash function is weak.

-Austin, author of Murmurhash and SMHasher

aappleby

Also, there is nothing magic about his fibonacci constant. Any large odd constant with roughly half the bits set at random will do the same thing.

fis

I also did not follow what is special about 2^n/phi. I found this more concise justification: http://mathforum.org/kb/message.jspa?messageID=431065

"Knuth's finding was that the dispersion of indexes for a sequence of consecutive keys is maximized when M is chosen this way, thus a multiplicative hash table with a dense set of keys will have the fewest possible collisions when M approx= 2^N * R."

Although, if the keys are dense, then we could just use the low bits directly. I guess the unstated assumption is that in the real world, we'll have a mix of keys that are sequential and keys that are strided in such a way that using low bits directly would cause a lot of collisions. So we need a multiplicative hash to take care of the latter and we should use 2^n/phi to take care of the former.

Austin, you've done a lot of great work on this. What is the hash function you'd use today for small (<=32byte) keys?

rurban

The best is currently Leonid Yuriev's t1ha. See https://github.com/rurban/smhasher/

Note that in contrast with what Andy or DJB say, the collision safety is not a problem of the hash function per se, as you cannot fix collision attacks with any "safer" hash function. You can easily brute-force even the worst of all siphash in under 4min. Safety begins with 256 bits, in a hash table you got typically 10-14, max 32 to attack. It is only making it marginably safer, but much slower. It is only doable by checking collision attacks in the collision scheme, either by collision counting or using a non-attackable collision scheme.

Unfortunately most implementations have no idea, and just go with the slowest of all. The latest such sin done in the linux network stack. I thought at least those guys were above that, but apparently not.

This simple multiplication hash is indeed superior, just with linear probing it has problems. You would use double hashing with another similar constant then. Or find two such constants dynamically, as this would be universal then.

And also note that the upper bits of a hash function are always superior to the lower bits. power-by-2 & checks are thus always worse than right-shifting as with this one.