I am confused. Is it another competitor of Tensorflow, JAX, and Pytorch? Or something else?

I believe this is more of an optimization layer to be utilized by libraries like Tensorflow and JAX. More of a simplification of the interaction with traditional CUDA instructions.

I imagine these libraries and possibly some users would implement libraries on top of this language and reap some of the optimization benefit without having to maintain low-level CUDA specific code.

So is this similar to XLA?

XLA is domain-specific compiler for linear algebra. Triton generates and compiles an intermediate representation for tiled computation. This IR allows more general functions and also claims higher performance.

obligatory reference to the family of work: https://github.com/merrymercy/awesome-tensor-compilers