Let’s not forget that a decompressor for such a format would necessitate running the entire model, i.e. in this case, a 70B parameter decompressor. It’s perhaps not surprising that you can compress files better when given a (very) large dictionary to refer to. This is why any reasonable compression benchmark includes the decompressor’s size in the size score.

Could a distributed decompressor be valid? I. e. something like bittorrent that provides a DHT for known files.

dmajor2

https://qwantz.com/index.php?comic=2092

nick-of-time

This has been implemented: https://github.com/philipl/pifs