What does HackerNews think of numexpr?

Fast numerical array expression evaluator for Python, NumPy, PyTables, pandas, bcolz and more

Language: Python

Semi Vectorized code:

https://github.com/ohadravid/poly-match/blob/main/poly_match...

Expecting Python engineers unable to read defacto standard numpy code but meanwhile expect everyone can read Rust.....

Not to mention that the semi-vectorized code is still suboptimal. Too many for loops despite the author clearly know they can all be vectorized.

For example instead the author can just write something like:

   np.argmin(
    distances[distances<=threshold]
    )
Also in oneplace there is:

    np.xxx( np.xxx, np.xxx + np.xxx)
You can just slap numexpr on top of it to compile this line on the fly.

https://github.com/pydata/numexpr

Replying to your other comments (about how QNNPACK integrates and implementations of the Android NN API):

I'm not entirely sure what they're aiming for there. Usually when you see talk about "kernels" it's more of how particular filters/convolutions/low-level operations are optimized, and it is implied that kernels run on GPU (most of the time). They do talk a lot about microarchitectural details, size of caches and ARM/NEON operations, so it seems to be all implemented on CPU, but I don't really grasp how it ties with the vendor-specific implementations that you mention.

It could be that these are some new algorithms/implementations that focus on the strength of the systems (not particularly the CPU or the microarchitecture) and try to "go easy" on the memory bandwidth, for example, to get a better performance out of equivalent (maybe?) code.

This reminds me a bit of the numexpr[0] project, that accelerates numpy computations on python by rearranging data on memory to be more cache-friendly.

[0] https://github.com/pydata/numexpr

Yeah that's a good point. I haven't used this ggplot library, but it seems like it could use lambdas. And then you don't break syntax highlighting.

One other place I've seen this done is in the numexpr for Python.

https://github.com/pydata/numexpr

It does seem like this

    ne.evaluate('a*b-4.1*a > 2.5*b') 
could be

    ne.evaluate(lambda a, b: a*b - 4.1*a > 2.5*b)
The lambda is never executed because it compiles to machine code and not Python byte code, but that shouldn't make a difference. You should still be able to use the AST of the body as input to the compiler.

And then a and b have to be pulled out of locals() automatically or something.

Now almost every code to be run is cache-bound for performance, not cycle-bound. So this gems, while may still work, may not be the best means to achieve optimal performance. Optimizing cache access is what brings the biggest speedups today -- see for example NumExpr[1].

[1] https://github.com/pydata/numexpr