Current tally of high-performance, deep-learning-oriented DSLs/IRs/compilers, in no particular order:

- TensorComprehensions (Facebook): https://github.com/facebookresearch/TensorComprehensions

- XLA (Google): https://www.tensorflow.org/performance/xla/

- taco (MIT): http://tensor-compiler.org/

- DLVM (UIUC): http://dlvm.org/

- nGraph (Intel): http://ngraph.nervanasys.com/docs/cpp/

- TVM (DMLC): https://github.com/dmlc/tvm

Honorable mention to Julia (http://julialang.org) as well.

As far as I know Tile/PlaidML (Vertex.AI) is the only DSL+compiler that's usable for real workloads across a variety of hardware. https://github.com/plaidml/plaidml