I studied physics, and I did a bit of 3D graphics back in the day, and I could never remember the difference between column and row vectors (or co- and contravariant vectors, bra and ket vectors, ...) and when to use which. My guilty secret is that I don't care.
For me a vector is just abstractly three or four numbers together. (Maybe with some defined transformation properties if I'm doing physics.) Whether a matrix is stored row-major or column-major is an implementation detail I try to hide.
In fact I used to make indexing errors all the time. But now when I need to deal with vectors and matrices, I look up or decide on the "correct" way, hide everything behind Vector and Matrix classes, implement dot and cross product once, and never fall back to the raw vector elements if possible.
Unfortunately this physics-guy/spherical-cow thinking is really common in people doing linear algebra on computers and the results are poor performance. Abstractions are great for general programming, but when you are doing tight-loop math it's really not applicable. I've seen this a lot recently with "clever" NDArray style math libraries (not to mention the utility of more than 2 dimensions is .. very limited)
In my not-super-extensive experience, if you're programming and dealing with linear algebra problems - and you're past the MATLAB prototyping stage - then I really suggest using the BLAS/LAPACK API. You're not gunna beat decades of nerds programming ballistic missiles on punchcards. They're kinda weird and unwieldy and don't match to textbook math - but they've had a ton of thought put into them. And are made with as few footguns as possible. At least re-implementing an algo with them a few times is really educational. You'll see that it forces you to think about your memory layout and you'll realize that the back of napkin complexity calculations are actually tricky to massage into a good algorithms that use memory well. The final result with something like the Intel MKL will be way way faster than anything your abstraction can achieve
(Unfortunately I haven't found a good primer on using the BLAS well)
MATLAB uses BLAS and LAPACK under the hood. You can write reasonably fast code in MATLAB if most of your time is spent calling out to faster low level implementations.
There are also many different implementations of BLAS, with varying performance. The API itself is not the reason for its performance. Actually, I would argue that it’s a pretty crufty API which should probably be done away with at this point and replaced with something less awkward. I don’t know how many times I’ve had to remind myself what a “leading dimension” was…
A good example is BLIS, which does provide the BLAS API but has a more modern “object-style” C API which is significantly easier to work with than the BLAS API with no performance hit.
Also, depending on what you’re doing, abstractions can be very helpful indeed to keep around. Even at the BLAS level, groups of bits are thought of as singles, doubles, real or complex… a matrix has a size and shape… etc. Having this information is useful. Linear algebra is full of type information which can be used to dispatch different algorithms.
Lots of other alternatives to BLAS which are a bit higher level but still very useful: Eigen and Armadillo in C++, Julia, etc.
Anyway, I would say (based on my actual, significant experience ;-) ) people are beating the guys with punch cards all the time, no reason to stay in the 70s…
"The API itself is not the reason for its performance"
It eliminates many performance mistakes by making it so that when you do something dumb it's generally immediately apparent. This mostly boils down to nonsensical/slow operations being usually impossible and any memory copying being explicit.
I can only tell you my own experience. Going through the steps of transforming a few "textbook algorithms" into BLAS made me think about the problems much more clearly and in a way I would have missed using a higher abstraction.
All the examples you give (Matlab, eigen, armadillo, Julia) allow you to really easily write really bad code (blas under the hood is kinda irrelevant). But you're totally right that they're useful. If you want to call out to code that does an SVD or something simple that exists in a library then they're generally just fine and you can't mess it up too much.
I use BLAS with a repl from Clojure (thanks to neanderthal) so it's very painless. Sounds like you're using it in a compile/run loop from C/C++ which sounds like very much not fun.
https://github.com/flame/blis/blob/master/docs/BLISObjectAPI...
Searching "object" in BLIS's README (https://github.com/flame/blis) to see what they think of it:
"Objects are relatively lightweight structs and passed by address, which helps tame function calling overhead."
"This is API abstracts away properties of vectors and matrices within obj_t structs that can be queried with accessor functions. Many developers and experts prefer this API over the typed API."
In my opinion, this API is a strict improvement over BLAS. I do not think there is any reason to prefer the old BLAS-style API over an improvement like this.
Regarding your own experience, it's great that using BLAS proved to be a valuable learning experience for you. But your argument that the BLAS API is somehow uniquely helpful in terms of learning how to program numerical algorithms efficiently, or that it will help you avoid performance problems, is not true. It is possible to replace the BLAS API with a more modern and intuitive API with the same benefits. To be clear, the benefits here are direct memory management and control of striding and matrix layout, which create opportunities for optimization. There is nothing unique about BLAS in this regard---it's possible to learn these lessons using any of the other listed options if you're paying attention and being systematic.