Very interesting and useful to see.
And in an entirely approach for vectorization for the masses: I do wish that it was easier to access vectorization through BLAS, a library that is well supported across nearly all languages, gets massively optimized, but is hard to install correctly.
Good news is that the Gonum team has been working on an optimized pure Go version of BLAS. It's at parity with netlib blas for some of the important functions (GEMV, GEMV, etc).
Why is this good news? Go is a very easy to use language, and it favours using compile targets, leading it to be available across different platforms. To install, one simply does `go get gonum.org/v1/gonum`
[1] GEMV is entirely limited by memory bandwidth, thus quite uninteresting from a vectorization standpoint. Maybe you meant GEMM?