What does HackerNews think of lrzip?

Long Range Zip

Language: C

lrzip

Long Range ZIP or LZMA RZIP

https://github.com/ckolivas/lrzip

"A compression utility that excels at compressing large files (usually > 10-50 MB). Larger files and/or more free RAM means that the utility will be able to more effectively compress your files (ie: faster / smaller size), especially if the filesize(s) exceed 100 MB. You can either choose to optimise for speed (fast compression / decompression) or size, but not both."

I typically don't make use of standard archive formats these days for my own file storage. If I want just pure maximum compression, I'll often use a long-range matcher like FreeArc's srep [1] or lrzip [2] combined with either fast-lzma2 [3] using a p7zip fork [4] for multithreading and fast compression or use mcm [5] or zpaq for max ratio with longer compression time.

However, my truly preferred way is using dwarfs [6], which features some really good deduplication and (by default) zstd compression while being mountable. Most of my files are highly compressed and easily accessible without needing to full decompress them. I even made a small script to convert and create AppImages that instead use this [7]. Admittedly, I don't make use of PAR2 or anything of the sort, but I could just do that the traditional way if I so wished.

[1]: https://github.com/Phantop/srep

[2]: https://github.com/ckolivas/lrzip

[3]: https://github.com/conor42/fast-lzma2

[4]: https://github.com/jinfeihan57/p7zip

[5]: https://github.com/mathieuchartier/mcm

[6]: https://github.com/mhx/dwarfs/

[7]: https://github.com/Phantop/appdwarf/

Best I know of for that is something like lrzip still, but even then it's probably not state of the art. https://github.com/ckolivas/lrzip

It'll also take a hell of a long time to do the compression and decompression. It'd probably be better to do some kind of chunking and deduplication instead of compression itself simply because I don't think you're ever going to have enough ram to store any kind of dictionary that would effectively handle so much data. You'd also not want to have to re-read and reconstruct that dictionary to get at some random image too.

I always find [lrzip](https://github.com/ckolivas/lrzip) is under appreciated when it comes to compression discussions; it doesn't suit all circumstances, but it works really well in the ones it does (we're using it with the nocompress flag and then using zstd, hence why it comes to mind :-) )

Edit: it's not well suited to real-time...

The next default compressor might be lrzip [1] by Con Kolivas; I've only see it a couple of times in the wild so far but for certain files it can increase the compression ratio quite a bit.

[1] https://github.com/ckolivas/lrzip

  # 151M    linux-4.14-rc6.tar.gz
  # GZIP decompression
  ~$ time gzip -dk linux-4.14-rc6.tar.gz

  real    0m4.518s
  user    0m3.328s
  sys     0m13.422s

  # 787M    linux-4.14-rc6.tar
  # LRZIP compression
  ~$ time lrzip -v linux-4.14-rc6.tar
  [...]
  linux-4.14-rc6.tar - Compression Ratio: 7.718. Average Compression Speed: 13.789MB/s.
  Total time: 00:00:56.37

  real    0m56.533s
  user    5m35.484s
  sys     0m9.422s

  # 137M    linux-4.14-rc6.tar.lrz
  # LRZIP decompression
  ~$ time lrzip -dv linux-4.14-rc6.tar.lrz
  [...]
  100%     786.16 /    786.16 MB
  Average DeCompression Speed: 131.000MB/s
  Output filename is: linux-4.14-rc6.tar: [OK] - 824350720 bytes
  Total time: 00:00:06.35

  real    0m6.524s
  user    0m8.031s
  sys     0m1.766s

  # Results
  ~$ du -hs linux* | sort -h
  137M    linux-4.14-rc6.tar.lrz
  151M    linux-4.14-rc6.tar.gz
  787M    linux-4.14-rc6.tar

tested on WSL (Ubuntu BASH for Windows 10)

edit:

  ~$ time xz -vk linux-4.14-rc6.tar
  linux-4.14-rc6.tar (1/1)
    100 %        98.9 MiB / 786.2 MiB = 0.126   3.0 MiB/s       4:25

  real    4m25.189s
  user    4m23.828s
  sys     0m1.094s
  
  ~$ du -hs linux* | sort -h
  99M     linux-4.14-rc6.tar.xz
  137M    linux-4.14-rc6.tar.lrz
  151M    linux-4.14-rc6.tar.gz
  787M    linux-4.14-rc6.tar
It looks like XZ still has the best compression ratio but also took the longest (real)time.