The graphs ranking GPUs may not be accurate, as they don't represent real-world results and have factual inaccuracies. For example:
> Shown is raw relative performance of GPUs. For example, an RTX 4090 has about 0.33x performance of a H100 SMX for 8-bit inference. In other words, a H100 SMX is three times faster for 8-bit inference compared to a RTX 4090.
RTX 4090 GPUs are not able to use 8-bit inference (fp8 cores) because NVIDIA has not (yet) made the capability available via CUDA.
> 8-bit Inference and training are much more effective on Ada/Hopper GPUs because of Tensor Memory Accelerator (TMA) which saves a lot of registers
Ada does not have TMA, only Hopper does.
If people are interested I can run my own benchmarks on the latest 4090 and compare it to previous generations.