> So long as Pytorch only practically works with Nvidia GPUs, everything else is little more than a rounding error.

This is changing.

https://github.com/merrymercy/awesome-tensor-compilers

There are more and better projects that can compile an existing PyTorch codebase into a more optimized format for a range of devices. Triton (which is part of PyTorch) TVM and the MLIR based efforts (like torch-MLIR or IREE) are big ones, but there are smaller fish like GGML and Tinygrad, or more narrowly focused projects like Meta's AITemplate (which works on AMD datacenter GPUs).

Hardware is in a strange place now... It feels like everyone but Cerebras and AMD/Intel was squeezed out, but with all the money pouring in, I think this is temporary.