I don't want to be too cynical about the state of hardware for ML, but I don't see where this is going. Nvidia does not lack for competitors trying (and sometimes nominally succeeding) to build faster/cheaper/more efficient hardware. Yet still Nvidia is overwhelmingly the vendor of choice because the software story works. So long as Pytorch only practically works with Nvidia GPUs, everything else is little more than a rounding error.

I don't see MatX ending up any different than the legion of startups that have come already - either they get acquired by a bigger player, or they fade into obscurity.

> So long as Pytorch only practically works with Nvidia GPUs, everything else is little more than a rounding error.

This is changing.

https://github.com/merrymercy/awesome-tensor-compilers

There are more and better projects that can compile an existing PyTorch codebase into a more optimized format for a range of devices. Triton (which is part of PyTorch) TVM and the MLIR based efforts (like torch-MLIR or IREE) are big ones, but there are smaller fish like GGML and Tinygrad, or more narrowly focused projects like Meta's AITemplate (which works on AMD datacenter GPUs).

Hardware is in a strange place now... It feels like everyone but Cerebras and AMD/Intel was squeezed out, but with all the money pouring in, I think this is temporary.