What does HackerNews think of amx?

Apple AMX Instruction Set

Language: C

Confusingly there are 2 mechanisms to do matrix operations on the new apple hardware - AMX (https://github.com/corsix/amx) - and the ANE (apple neural engine) - which is enabled by CoreML. This code does not run on the neural engine but the author has a branch for his whisper.cpp project which uses it here: https://github.com/ggerganov/whisper.cpp/pull/566 - so it may not be long before we see it applied here as well. All of this is to say that it actually could get significantly faster if some of this work was able to be handed to the ANE with CoreML.
You are correct, in that those are the four

My understanding is that the AMX is more tightly wound with the CPU, ultimately being accessible via an instruction set (https://github.com/corsix/amx), and it is useful if you need to do matrix multiplications interleaved with other CPU tasks. A common example would be a VIO loop or something where you want that data in the CPU caches.

The GPU and Neural Engine are not that – they take some time to set up and initialize. They also can parallelize tasks to a much higher degree. The GPU is more generalizable, because you can write compute shaders to do anything in parallel, but it uses a lot of resources. I'll have to check out the PR to see how exactly the MPS implementation matches up with the task at hand, because you could also consider writing Metal compute shaders by hand. Even if the performance is not much better, the CPU is free to do other things.

I know the least about the ANE, but it has specific hardware for running ML models, and you have to process the weights ahead of time to make sure they are in the right format. It can run ML models very efficiently and is the most battery friendly.

Interesting but the repo homepage itself is probably a more helpful starting point.

https://github.com/corsix/amx