So we've got a system that backs up to an XML file nightly. I noticed a couple weeks ago the bucket was using a not-insignificant amount of disk space.

These XML files are full backups rather than deltas - so each one contains the full previous file plus the new additional data.

My assumption was if I grabbed just a swath of the old ones, say a years worth and compressed them together I would get really decent savings.

I was pretty disappointed - gzip knocked 10 gb off of about 100 gb of data. I started doing some research and found people saying 7zip and it's sliding dictionary size options were the answer. After multiple tries, each run of 7zip taking multiple days I was able to get it down to about 70 gigabytes from 100. Better then gzip but frankly nowhere near what I would expect.

Does there exist a compression that could better handle this sort of expanding documents?

lrzip

Long Range ZIP or LZMA RZIP

https://github.com/ckolivas/lrzip

"A compression utility that excels at compressing large files (usually > 10-50 MB). Larger files and/or more free RAM means that the utility will be able to more effectively compress your files (ie: faster / smaller size), especially if the filesize(s) exceed 100 MB. You can either choose to optimise for speed (fast compression / decompression) or size, but not both."