I wonder how it would fare compared to "software-only" solution, that is re-architecturing your algorithm to work with compressed data directly.
For example, consider an analytical engine going over values in a column of a database to sum it together. To increase the bandwidth, you can consider compressing the values (for example, with some form of Huffman encoding), and you have two options:
1. Decompress the values before passing them onto calculation, then do the calculation. This is what the paper proposes (transparently in HW as the data are being used in the cache).
2. Do the "addition" directly on the compressed values. This requires more logic. (But it doesn't have to be just implemented in software, it could be hardware assisted by some FPGA or something like that.)
So I wonder, instead of compressing the cache, wouldn't it be better to implement the 2nd solution?
The disadvantage of the 1st solution is that since it has to work in the general case, the compression is more likely to be bad in special cases, where the 2nd solution with a specialized compression/processing combo could be superior.
How does one work with compressed data directly? If you are dealing with the data in blocks, then your compression is only as good as the repetition in each block (perhaps the dictionary can be shared).
But even in this case, you don't deal with compressed data directly, but decompress only 1 block at a time.