I don't understand why people care about the CPython GIL. For computationally intense stuff (numerics) running in the interpreter, the language is generally 60 times slower than PyPy and often 100 times slower than an equivalent C program. That means if you want performance, you would need to dodge Amdahl's law and have 60-100 processors for a hypothetical GIL-less CPython to match a single threaded program in PyPy or C.

If all it was used for was scripting it would be fine, but what is strange is that Python has emerged as a major language for complex data analysis. So I am sometimes confronted with partial solutions in Python that solve a piece of the puzzle that is purely numerical (so all runs on numpy etc) but then extending it to actually solve the real problem ends up either extremely awkward (the whole architecture getting bent around Python's limitations) or being a complete rewrite. It's frustrating that the ecosystem for certain problems is so bent towards a language that is unsuitable for large parts of the job. (mainly, I would add, because it sucks the oxygen away from alternatives that would be better in the longer term).

If you're doing complex data analysis in python, the actual number crunching isn't happening in python it's happening in a C extension. Using numpy or tensor flow, you're really just calling C code.

And Fortran code (many parts of numpy). C doesn't have a complete monopoly on high-performance code.

What is the advantage of Fortran over only C in numpy’s case specifically?

Fortran has the more optimized libraries (blas) and is also capable of optimizations that C isn't (guarantees that pointers dont alias).

Modern C has the restrict keyword for that. There isn't any competitive advantage left for Fortran over C or C++, the only reason why Fortran is still part of the modern numeric stack is that BLAS, LAPACK, QUADPACK and friends run the freaking world and nobody is ever going to rewrite them to C without a compelling reason to do so.

Note that while BLAS and friends aren't getting rewritten in C, there is an effort underway to write replacements in Julia. The basic reason is that metaprogramming and better optimization frameworks are making it possible to write these at a higher level where you basically specify a cost model, and generate an optimal method based on that. The big advantage is that this works for more than just standard matrix multiplies. The same framework can give you complex matrices, Integer and boolean matrices, matrices with different algebras (eg max-plus).

That's a cool idea. I don't know how realistic it is to achieve performance parity, but the "generic" functionality is definitely intriguing.

The initial results are that libraries like LoopVectorization can already generate optimal micro-kernels, and is competitive with MKL (for square matrix-matrix multiplication) up to around size 512. With help on macro-kernel side from Octavian, Julia is able to outperform MKL for sizes up to to 1000 or so (and is about 20% slower for bigger sizes). https://github.com/JuliaLinearAlgebra/Octavian.jl.