What do you mean "in-memory"? Is there any other kind of data compression algorithm?
How does it compare with zstd? I find that zstd, given the right compression level, can beat almost every other compression algorithm nowadays on at least one or two metrics, and very often on all metrics.
For example, I just tried to compress the `enwik9` dataset with both lz4 and `zstd -1 --single-thread` and zstd was both faster on compression and decompression and also produced a 357 MB file compared to 509 MB with lz4. According to these results, apparently it would beat LZAV on every metric?
That said, even though I used `--single-thread`, somehow zstd still used 150% of CPU according to `time` (?).
lz4 is ALWAYS decompressing faster (> 2x) thand zstd
No, lz4 consistently decompresses slower than zstd on these particular conditions. `lz4 -d` consistently finishes in ~1.95s wall clock time while `zstd -d --single-thread` consistently finishes in ~1.50s wall clock time.
For reference, I'm using the latest versions of both programs in nixpkgs/NixOS (lz4 1.9.4 and zstd 1.5.5), both of them compiled with `-march=znver1`, I used the `-1 --single-thread` options in zstd, all files were cached in memory and the dataset was `enwik9` as I mentioned.
Except for a compression level that is competitive with lz4, I didn't cherry-pick the above scenario, it's just what I had available at hand. The reason why I chose `enwik9` as a dataset is that I wanted something that could be referenced rather than just using some random files in my computer. I chose `enwik9` specifically because that's what is mentioned in the README of this LZAV project.
There are a few things to note:
1. For some reason, zstd uses around 145-150% CPU during compression and consistently uses 113% CPU during decompression, so the `--single-thread` option doesn't seem to be doing what it advertises [1]. This may be giving zstd an unfair advantage.
2. However, even taking into account the unfair 13% extra CPU usage, zstd is still coming out ahead in terms of decompression speed vs lz4.
3. Notably, I've used one of the "normal" zstd compression levels (level 1), in which zstd seems to beat lz4 at every metric. If I use one of the "fast" zstd compression levels, zstd could be even faster. For example, with level `--fast=1`, `zstd -d --single-thread` finishes in 1.24s on my machine compared to lz4's 1.95s.
4. Not to mention, if I use multiple threads for I/O and for compression, then zstd can compress faster than lz4 by an order of magnitude. Although of course, in one sense this is quite an unfair advantage, but on the other hand the lz4 CLI tool has no option to compress with multiple threads, so it is quite relevant in terms of usability.
5. Interestingly, when I run the exact same benchmark in the exact same conditions on my AMD Zen 3 server rather than my Zen 1 laptop, then lz4 consistently decompresses almost 2x faster than zstd on this compression level. I'm not sure why there is such a large discrepancy.
My only guesses for the large discrepancy is that my Zen 1 laptop has the `RETBleed: Mitigation: untrained return thunk` CPU mitigation enabled in the Linux kernel, which can cause a very large performance degradation [2], as well as slightly different `Spectre v2` mitigations and 2x-16x smaller CPU caches, and perhaps this somehow affects lz4 more than zstd for some reason... (apparently the RETBleed mitigation is not necessary for Zen 3 CPUs).
[1] Adding `-T1` before `--single-thread` make no difference in either compression or decompression, but adding it after `--single-thread` makes no difference in decompression speed/CPU usage but it does makes compression even faster and use more CPU. My guess is that `-T1` is overriding `--single-thread` and using only one compression/decompression thread but using more threads for I/O.
You are comparing the zstd with asynchrounous I/O + multithreading against a single thread simple lz4 cli.
Additionally you're using some exotic and not representative hardware.
In theory, the only single thread decompression case where zstd can beats lz4 is when you have slow (reading) i/o.
lz4 & zstd are mostly used as libraries and real benchmarks are not considering I/O or multithreading.
This is what I'm getting with TurboBench & enwik9. lz4 decompressing 2,8x faster than zstd.
C Size ratio% C MB/s D MB/s Name
356828015 35.7 400.71 1557.59 zstd 1
416874817 41.7 312.31 1967.66 lzav
509199084 50.9 630.78 4395.08 lz4 1
Download TurboBench [1] from [2] and make the tests on your machine.