I've dealt with image decoding for a game engine before. The images in question were large pixel-art texture atlases, stored as PNG files and loaded at startup. Their slow decoding speed caught me by surprise, given that the file format is 25 years old!

The reason turned out to be that Deflate's design makes it very hard to parallelise or SIMD-accelerate. Even the best available decoders are basically forced to process a byte at a time, in a straight line from start to finish, which obviously isn't a good match for modern CPUs. The 3x to 4x decompression speedup here is nice, but I can't help but feel a little sad about how poorly this format is taking advantage of available compute resources. (The ultimate dream would be a format which is so parallel-friendly that you can just send the binary blob to the GPU and decompress it in a compute shader!)

Even a rule like "each row is compressed separately, with a table of row lengths at the beginning of the file" might have been enough - this would have made compression ratios slightly worse, but complexity wouldn't have suffered too much. With only six different chunk types, we could perhaps imagine a branchless decoder where each row's decoding state is stored in its own SIMD lane, and the results for several rows are all emitted at the same time...

> you can just send the binary blob to the GPU and decompress it in a compute shader

Surely this exists already?

I think texture compression is always lossy, so it's not directly comparable to this or to PNG. So I don't think it exists. See ASTC and BC7.

Texture compression can be very advanced: https://cbloomrants.blogspot.com/2020/06/oodle-texture-slash...

Reading that blog post, I was surprised to learn that many modern games dedicate more than half of their filesize to textures. I haven't played an AAA game in more than a decade, but I would have thought that meshes and (particularly) audio would use up more space.

It sounds like developers are stuck between a rock and a hard place. They need one of the specific compressed pixel formats which can be efficiently read by the GPU, but those formats are about ten times larger than a similar JPEG file, they don't losslessly compress well (modulo RAD Game Tools' discoveries here), and recompressing raw pixels to a GPU-addressable format at load time would be orders-of-magnitude too slow.

RAD Game Tools' approach here is clever, but it feels like a bit of a hack. The obvious next step would be a lossy compressed image format which can decompress directly to BC7, exploiting spatial-correlation tricks similar to those which PNG uses to get better results than a gzipped BMP file. Has anybody tried this already?

I wouldn't call Oodle Texture a hack. But there's also https://github.com/BinomialLLC/crunch and https://github.com/BinomialLLC/basis_universal. The latter has the advantage that it can be decoded to multiple different compressed texture formats, so that all GPUs can be supported from a single file (different GPUs support different formats so you can't necessarily send the same compressed texture to any GPU).