Honestly, a good showcase of how expensive UUIDs are. If you can perform a mapping to integers you should really consider it. Not only will your UUIDs take up 2x the space of a 64bit integer, they'll compress horribly by comparison. Integer compression is so good these days you can compress integers such that they basically end up taking up ~1/2 a byte instead of 8. Doing that with UUID v4s is fundamentally not going to work.

The article mentions RLE, which is also incredibly good for integer IDs if you diff-encode them, since ids are typically sequential with no or small gaps. Diff + RLE can turn your encoded structure into ~1 byte.

Also, incredible website. The interactivity is so fun.

I did something similar to this very recently where I took a dataset and just continuously applied data encoding methods to it. It was much smaller in memory and compressed with zstd to a smaller size as well. I've found that 'prepping' data before using a generalized compression algorithm has significant gains both for encode/decode performance + the output size. These were, incidentally, CRDT operations :D

Your blog posts are great, keep it up

stylepoints

> Integer compression is so good these days you can compress integers such that they basically end up taking up ~1/2 a byte instead of 8

This needs an explanation.

insanitybit

Well if your integers are sequential you can encode huge numbers of them using diff + RLE in just a few bytes, likely far fewer than 1/2 a byte on average, for the right dataset (in theory you can store 1,2,3,4,5...10_000 in 2 bytes).

But for other integer datasets there's FastPFOR

https://github.com/lemire/FastPFor

The linked papers there will talk about techniques that can be used to store multiple 32bit integers into a single byte, etc. Integer compression is pretty powerful if your data isn't random. The thing with UUIDs is that your data is pretty random - even a UUIDv7 contains a significant amount of random data.