What does HackerNews think of numexpr?
Fast numerical array expression evaluator for Python, NumPy, PyTables, pandas, bcolz and more
https://github.com/ohadravid/poly-match/blob/main/poly_match...
Expecting Python engineers unable to read defacto standard numpy code but meanwhile expect everyone can read Rust.....
Not to mention that the semi-vectorized code is still suboptimal. Too many for loops despite the author clearly know they can all be vectorized.
For example instead the author can just write something like:
np.argmin(
distances[distances<=threshold]
)
Also in oneplace there is: np.xxx( np.xxx, np.xxx + np.xxx)
You can just slap numexpr on top of it to compile this line on the fly.I'm not entirely sure what they're aiming for there. Usually when you see talk about "kernels" it's more of how particular filters/convolutions/low-level operations are optimized, and it is implied that kernels run on GPU (most of the time). They do talk a lot about microarchitectural details, size of caches and ARM/NEON operations, so it seems to be all implemented on CPU, but I don't really grasp how it ties with the vendor-specific implementations that you mention.
It could be that these are some new algorithms/implementations that focus on the strength of the systems (not particularly the CPU or the microarchitecture) and try to "go easy" on the memory bandwidth, for example, to get a better performance out of equivalent (maybe?) code.
This reminds me a bit of the numexpr[0] project, that accelerates numpy computations on python by rearranging data on memory to be more cache-friendly.
One other place I've seen this done is in the numexpr for Python.
https://github.com/pydata/numexpr
It does seem like this
ne.evaluate('a*b-4.1*a > 2.5*b')
could be ne.evaluate(lambda a, b: a*b - 4.1*a > 2.5*b)
The lambda is never executed because it compiles to machine code and not Python byte code, but that shouldn't make a difference. You should still be able to use the AST of the body as input to the compiler.And then a and b have to be pulled out of locals() automatically or something.