It's interesting that SIMD came to the mainstream 25 years ago and compilers, our apps and PL tech are still quite far from effectively utilizing it outside nonportable manually coded SIMD aware compute kernels in glorified assembler. There are some exceptions like ispc and GPU languages like OpenCL and Futhark (GPU people say "cores" when they mean SIMD lanes!)...
SIMD instructions as we know them today do not make a consensus among the actors and architectures. See : https://www.phoronix.com/scan.php?page=news_item&px=Linus-To... That makes them harder to implement in compilers and in turns to democratize SIMD.
There is an article on the web explaining the purpose of SVE/SVE2 ("Scalable Vector Extension"), which is supposed to be the successor of SIMD on ARM : https://levelup.gitconnected.com/armv9-what-is-the-big-deal-...
Extract : "[...] the addition of SIMD instructions has led to an explosion in the number of instructions, especially for x86. And of course not every x86 processor will support all these instructions. Only the newer ones will support AVX-512. The beauty of SVE is that the same code will work for both the super-computer and the cheap phone. That is not possible with the x86 SIMD instructions."
There is also a Java proposal to use SVE as the way of doing SIMD in the Java world : https://openjdk.java.net/jeps/417
The same principle will be extended on ARM to matrices with "Scalable Matrix Extension" : https://community.arm.com/arm-community-blogs/b/architecture...
We can speculate that everyone will migrate to ARM / RISC-V at some point, or x86 will have similar instructions.
> the same code will work for both the super-computer and the cheap phone. That is not possible with the x86 SIMD instructions
Actually, when the code is expressed using "portable intrinsics" (https://github.com/google/highway), the source code looks the same but compiles to SSE4/AVX2/AVX-512 and NEON,SVE,SVE2 and RISC-V V instructions.
Disclosure: I am the main author of this library.