https://github.com/flame/blis/blob/master/docs/BLISObjectAPI...
Searching "object" in BLIS's README (https://github.com/flame/blis) to see what they think of it:
"Objects are relatively lightweight structs and passed by address, which helps tame function calling overhead."
"This is API abstracts away properties of vectors and matrices within obj_t structs that can be queried with accessor functions. Many developers and experts prefer this API over the typed API."
In my opinion, this API is a strict improvement over BLAS. I do not think there is any reason to prefer the old BLAS-style API over an improvement like this.
Regarding your own experience, it's great that using BLAS proved to be a valuable learning experience for you. But your argument that the BLAS API is somehow uniquely helpful in terms of learning how to program numerical algorithms efficiently, or that it will help you avoid performance problems, is not true. It is possible to replace the BLAS API with a more modern and intuitive API with the same benefits. To be clear, the benefits here are direct memory management and control of striding and matrix layout, which create opportunities for optimization. There is nothing unique about BLAS in this regard---it's possible to learn these lessons using any of the other listed options if you're paying attention and being systematic.
[1] GEMV is entirely limited by memory bandwidth, thus quite uninteresting from a vectorization standpoint. Maybe you meant GEMM?
Both are well optimized for AMD CPUs.
For information of anyone who doesn't know about performance of level 3 BLAS: It doesn't come just from vectorization, but also cache use with levels of blocking (and prefetch). See the material under https://github.com/flame/blis. Other levels -- not matrix-matrix -- are less amenable to fancy optimization, with lower arithmetic density to pit against memory bandwidth, and BLIS mostly hasn't bothered with them, though OpenBLAS has.
BLIS actually tells you how to write a fast production large-matrix GEMM, and the papers linked from https://github.com/flame/blis would be a better reference than the Goto and van de Geijn.
For small matrices see the references from https://github.com/hfp/libxsmm but you're not likely to be re-implementing that unless you're serious about a version on non-x86_64.