What does HackerNews think of FiniteStateEntropy?

New generation entropy codecs : Finite State Entropy and Huff0

Language: C

#43 in C
It's obsolete. DEFLATE format is limited to 32KB LZ window with huffman coding.

Zstd can use a much larger window (8MB recommended) and a much better entropy coder: https://github.com/Cyan4973/FiniteStateEntropy

Sounds similar to: https://github.com/Cyan4973/FiniteStateEntropy

https://arxiv.org/abs/1311.2540

> The modern data compression is mainly based on two approaches to entropy coding: Huffman (HC) and arithmetic/range coding (AC). The former is much faster, but approximates probabilities with powers of 2, usually leading to relatively low compression rates. The latter uses nearly exact probabilities - easily approaching theoretical compression rate limit (Shannon entropy), but at cost of much larger computational cost. Asymmetric numeral systems (ANS) is a new approach to accurate entropy coding, which allows to end this trade-off between speed and rate: the recent implementation [1] provides about 50% faster decoding than HC for 256 size alphabet, with compression rate similar to provided by AC. This advantage is due to being simpler than AC: using single natural number as the state, instead of two to represent a range. Beside simplifying renormalization, it allows to put the entire behavior for given probability distribution into a relatively small table: defining entropy coding automaton. The memory cost of such table for 256 size alphabet is a few kilobytes. There is a large freedom while choosing a specific table - using pseudorandom number generator initialized with cryptographic key for this purpose allows to simultaneously encrypt the data. This article also introduces and discusses many other variants of this new entropy coding approach, which can provide direct alternatives for standard AC, for large alphabet range coding, or for approximated quasi arithmetic coding.

Check out his other papers / the github project (looked super interesting and similar).

Let's see:

- Wavpack [1], which is a rough contemporary but offers three tiers of presets (normal scale, high scale, extra high scale) and an innovative (and optional) lossy/hybrid mode

- TAK [2] which compressed better and decoded faster than either, but was initially closed-source until the dev was persuaded to open it up

- LossyWAV [3] which isn't lossless but chops off least-significant-bits while using noise shaping to pre-process audio and make it compress better when fed to a lossless compressor

Most of these developments were first publicized on Hydrogenaudio. But as for innovations in the last two years, not that I'm aware.

[1] http://wiki.hydrogenaud.io/index.php?title=WavPack [2] http://wiki.hydrogenaud.io/index.php?title=TAK [3] http://wiki.hydrogenaud.io/index.php?title=LossyWAV

EDIT (for some more background): generally in lossless audio compression you want to use linear prediction to predict an approximate signal for the next few samples, then encode the difference between your predicted guess and the actual signal in some entropy coder, like Golomb-Rice codes or Huffman or Arithmetic coding. Although most of Zstandard's improvements are algorithmic or implementation-related and not related to data theory, the part that could show promise is the tANS entropy coder [4] used in Zstandard; but Golomb-Rice codes perform well for data that comes from linear predictors; so I'm not sure what to expect [5].

[4] https://github.com/Cyan4973/FiniteStateEntropy

[5] 'Benchmarks' section under [4]