One thing I haven’t seen much mention of is getting things to run on the M1’s neural engine instead of the GPU - it seems like the neural engine has ~3x more compute capacity and is specifically optimized for this type of computation.

Has anyone spotted any work allowing a mainstream tensor library (e.g. jax, tf, pytorch) to run on the neural engine?

George hotz got his "for play" tensor library[a] to run on the Apple Neural Engine (ANE). The results were somewhat dissapointing, however, and currently it only does relu.

[a]: https://github.com/geohot/tinygrad