Let’s not forget that a decompressor for such a format would necessitate running the entire model, i.e. in this case, a 70B parameter decompressor. It’s perhaps not surprising that you can compress files better when given a (very) large dictionary to refer to. This is why any reasonable compression benchmark includes the decompressor’s size in the size score.
Could a distributed decompressor be valid? I. e. something like bittorrent that provides a DHT for known files.
This has been implemented: https://github.com/philipl/pifs