Current tally of high-performance, deep-learning-oriented DSLs/IRs/compilers, in no particular order:
- TensorComprehensions (Facebook): https://github.com/facebookresearch/TensorComprehensions
- XLA (Google): https://www.tensorflow.org/performance/xla/
- taco (MIT): http://tensor-compiler.org/
- DLVM (UIUC): http://dlvm.org/
- nGraph (Intel): http://ngraph.nervanasys.com/docs/cpp/
- TVM (DMLC): https://github.com/dmlc/tvm
Honorable mention to Julia (http://julialang.org) as well.
As far as I know Tile/PlaidML (Vertex.AI) is the only DSL+compiler that's usable for real workloads across a variety of hardware. https://github.com/plaidml/plaidml