Why not tools like https://github.com/ispc?

This seems really close to the metal, either to have a non-negligible maintenance cost or not being able to fully exploit the hardware at use.

It doesn't always emit optimal SIMD code. Plus, when you get the hang of it, writing your own SIMD library is fairly simple so you don't need a tool for it. C++ templates and operator overloading really shines here. For example, you can write sqrt(x*y+z) and have the the template system select the most optimal SIMD intrinsics depending on whether x, y, and z are int, float, int16, float8, double4, etc.

+1 to intrinsics or wrappers giving us more control over performance.

> Plus, when you get the hang of it, writing your own SIMD library is fairly simple

hm.. it's indeed easy to start, but maintaining https://github.com/google/highway (supports clang/gcc/MSVC, x86/ARM/RiscV) is quite time-consuming, especially working around compiler bugs.