The graphs ranking GPUs may not be accurate, as they don't represent real-world results and have factual inaccuracies. For example:

> Shown is raw relative performance of GPUs. For example, an RTX 4090 has about 0.33x performance of a H100 SMX for 8-bit inference. In other words, a H100 SMX is three times faster for 8-bit inference compared to a RTX 4090.

RTX 4090 GPUs are not able to use 8-bit inference (fp8 cores) because NVIDIA has not (yet) made the capability available via CUDA.

> 8-bit Inference and training are much more effective on Ada/Hopper GPUs because of Tensor Memory Accelerator (TMA) which saves a lot of registers

Ada does not have TMA, only Hopper does.

If people are interested I can run my own benchmarks on the latest 4090 and compare it to previous generations.

Can you please run bench from https://github.com/karpathy/nanoGPT ?