So we've got a system that backs up to an XML file nightly. I noticed a couple weeks ago the bucket was using a not-insignificant amount of disk space.
These XML files are full backups rather than deltas - so each one contains the full previous file plus the new additional data.
My assumption was if I grabbed just a swath of the old ones, say a years worth and compressed them together I would get really decent savings.
I was pretty disappointed - gzip knocked 10 gb off of about 100 gb of data. I started doing some research and found people saying 7zip and it's sliding dictionary size options were the answer. After multiple tries, each run of 7zip taking multiple days I was able to get it down to about 70 gigabytes from 100. Better then gzip but frankly nowhere near what I would expect.
Does there exist a compression that could better handle this sort of expanding documents?
Long Range ZIP or LZMA RZIP
https://github.com/ckolivas/lrzip
"A compression utility that excels at compressing large files (usually > 10-50 MB). Larger files and/or more free RAM means that the utility will be able to more effectively compress your files (ie: faster / smaller size), especially if the filesize(s) exceed 100 MB. You can either choose to optimise for speed (fast compression / decompression) or size, but not both."