While I might not agree with the conclusion of the OP, the performance of a programming language should be always paired with its "average" programmers. A lot of people tend to straw-man a language they dislike and iron-man (?) a language they like. For an average programmer, I think it might be possible that Java produces a more performant/maintainable code than C++.
This is very true. I've seen colleagues port Python to C++ and wonder why it's slower.
Of course Python is slow, but it will call into efficient C routines for a lot of things that aren't just available out of the box in C++. So what does the average programmer do? They implement their own, inefficient version of this in C++.
Now you've saved the overhead of copying your data from the Python to the C-world and back, but the main part of your computations is just not on the level of numpy.
So naturally C++ can be much faster than Python, because it doesn't have this overhead, but you have to do it right, which may be more effort.
For that reason I generally motivate not using python to people on grounds of correctness rather than performance as per se. I can't be bothered to explain how compiler optimizations work so I usually just resort to "it's magic" when it comes to performance.
Going from a statically typed language to using python to do number crunching genuinely makes me want to vomit. I don't understand how people convince themselves that it's productive to write so many tests and (say) check types at runtime etc.
I actually love Python's syntax but the language design seems like it was thrown together over a weekend.
Take a look at Nim if you haven't already -
It has a Pythonesque syntax, but strong static typing with extensive user control of semantics you seem to care about; e.g. floats and ints do not convert automatically unless you explicitly "import lenientops"[0] ; You can define 'operational transform' optimizations (such as: c <- c+a\b converts to multiply_accumulate(c,a,b) - which is a big performance difference for e.g matrices) that will be applied by the compiler so that your code shows what you mean (c=c+ab), and yet the compiler gets to compile the efficient version (using the relevant BLAS routine)
It's young, but has the best FFI for C,C++,Objective-C or JS you'll find anywhere on one hand, and already a good deal of native implementations, including e.g. a pure Nim BLAS that is comparable to within 5-10% with the best out there (including those with carefully hand optimized ASM kernels).
I am involved with the foundation that runs the D programming language so I've found my home.
Other than syntax obviously.
FFI: Can nim use a C++ class and vtables? D does it all the time, nearly every language proclaims to have the best C++ interop but only D seems to be actually able to do it. Templates, classes, structs, and more all work.
We also have Mir, which I haven't benchmarked for a while but was faster than OpenBLAS and Eugene a few years ago and was recently shown to be faster than numpy (the c bits) on the forum.
Well, it depends on how you define "best". Beyond an ABI, FFI is obviously a function of the implementation rather than the language per-se.
The main Nim implementation can use C++ as a backend language, and when it does, it can use any C++ construct very easily by way of the .emit and .importcpp directives. For sure, classes, structs, exceptions all work, and IIRC templates do too (although you might need to instantiate a header yourself for each concrete type or something .... haven't done that myself). This implementation also means that it can use any C++17 or C++20 construct, including lambdas and friends. Does D's C++ interop support C++17? C++20? Can you guarantee it will support C++27? Nim's implementation already does, on every single platform you'll be able to use C++27 on (as long as C++27 can compile modern C++ code; there had been backward incompatible changes along the C++ history).
You can't just #include a C or C++ header and call it a day; You need to have a Nim compatible definition for any symbol (variable, macro, function, class, ...). There are tools that help you and make it almost as easy as #include, such as nimterop[0] and and nimline[1], and "c2nim" which is included with the Nim compiler is enough to generate the Nim definitions from the .h definitions (though it can't do crazy metaprogramming; if D can do that, then D essentially includes a C++ compiler. Which is a fine way to get perfect C++ compatibility - Nim does that)
But Nim can also do the same for JS when treating JS as a backend.
And it can basically do the same for Python, with nimpy[2] and nimporter, generating a single executable that works with your installed Python DLL (2.7, 3.5, 3.6, 3.7) - which is something not even Python itself can do. There was a similar Lua bridge, but I think that one is no longer maintained.
> We also have Mir, which I haven't benchmarked for a while but was faster than OpenBLAS and Eugene
There's quite a bit of scientific stack built natively with Nim. It is far from self-sufficient, but the ease with which you can use a C library makes up for it. I haven't used it, but Laser[3] is on par with or exceeds OpenBLAS speedwise, and generalizes to e.g. int32 and int64 matrix multiplication; Arraymancer[4] does not heve all of numpy's functionality but does have quite a few nice bits from scikit-learn, supports CUDA and OpenCL, and you can use numpy through nimpy if all else fails. Also notable is NimTorch[5]. laser and arraymancer are mostly developed by mratsim, who occasionally hangs out here on HN.
D is a fine language, I used it a little in the D1 days, and it was indeed a "better C++" but did not deliver enough value to be worth it for me, so I stopped. I know D2 is much better, but I've already found my better C++ (and better Python at the same time!) in Nim, so I haven't looked at it seriously.
[0] https://github.com/nimterop/nimterop
[1] https://github.com/sinkingsugar/nimline
[2] https://github.com/yglukhov/nimpy
[3] https://github.com/numforge/laser