> -- 8 bytes gives a collision p = .5 after 5.1 x 10^9 values

So yeah, a fifty-fifty chance of a collision after only five billion values. You’re at 10% chance before even two billion, and 1% after 609 million. I wouldn’t care to play this random game with even a million keys, the 64-bit key space is just not large enough to pick IDs at random. UUIDs are 128 bits; that’s large enough that you can reasonably assume in most fields that no collisions will occur.

Storing a string is also inefficient, wasting more than three, and probably eight or more bytes (I’m not certain of the implementation details in PostgreSQL), growing index sizes and making comparisons slower. It’s more efficient to store it as a number and convert to and from the string form only when needed.

> UUIDs are 128 bits; that’s large enough that you can reasonably assume in most fields that no collisions will occur.

I'd go a step further than that honestly. Even at 80 bits I'd say you can really be sure that it will never happen. Mind that 2^64 is quite large, and with every single +1 of that power you halve the chance. Anything above 80 bits is quite safe even when someone purposefully tries to find your input, let alone by random chance. The 128 bits includes a decent security margin.

Plain MD5 is pretty much the fastest thing you can find. A decent GPU can do 25GH/s on that. If you can have a $1 000 000 hardware budget, it takes you 150 years to have a 50/50 chance of cracking a 2^80 secret with Hashcat. (Ballpark numbers: can be off by an order of magnitude, but not <10 years for that budget if you use today's hardware and today's hashcat.)

We haven't even cracked the 3DES key used for Adobe's user database backup, which I'm sure a lot of people would be interested in. Not $1M interested maybe, but still.

2^128 is really gigantic and no collision will ever occur there unless you have a quantum computer. It's more than a "reasonable assumption in most fields".

Of course, I agree that 2^64 is too small to just generate a token and assume it's unique without checking if it exists. I'm just trying to prevent someone from assuming that we need 512-bit hashing or something and creating crypto soup (as a security dude, this happens all too often).

> Plain MD5 is pretty much the fastest thing you can find.

Faster than BLAKE3?

Again, ballpark numbers, but yes you might be right. From the sibling comment's benchmarks, it seems though that MD5 is indeed even faster than BLAKE3 (5.4 cycles/byte for blake3 vs 4.95 for md5).

I've been comparing other hashes to MD5 during the last few months to compare and identify large photos and movie files. My experience is that xxhash is much faster than good old MD5.

You can find some numbers at it's site here: https://github.com/Cyan4973/xxHash