Nice to see ML folks getting weaned off of Python and using a language that can optimally exploit the underlying hardware and not require setting up a specialized environment to build and run.

Since when does C++ optimally exploit the underlying hardware? It has no vector instructions, does not run on the GPU and is arguably too hard to make multithreaded. Which leaves you with about 0.5% performance of a current PCs.

> does not run on the GPU

both Cuda and the Metal shader language are C++, so is OpenCL since 2.0 (https://www.khronos.org/opencl/), so is AMD ROCm's HIP (https://github.com/ROCm-Developer-Tools/HIP), so is SYCL (https://www.khronos.org/sycl/)? C++ is pretty much the language that runs most on GPUs.

> no vector instructions,

There's a thousand different possibilities for SIMD in C++, from #pragma omp simd, to libs such as std::experimental::simd (https://en.cppreference.com/w/cpp/experimental/simd/simd), Eve (https://github.com/jfalcou/eve), Highway (https://github.com/google/highway), Vc (https://github.com/VcDevel/Vc)...