What does HackerNews think of ffi-overhead?

comparing the c ffi (foreign function interface) overhead on various programming languages

Language: C

#8 in C
It's not just that it's easy to embed but it's also tiny as well, I've run it on many embedded platforms. It wasn't until quickjs came along that there was anything even in the same ballpark.

When you factor in LuaJIT[1] and the incredible performance/zero-overhead-FFI[2] it really is a neat language.

[1] https://luajit.org/

[2] https://github.com/dyu/ffi-overhead

What about the other benchmarks on the same site? https://docs.sciml.ai/SciMLBenchmarksOutput/stable/Bio/BCR/ BCR takes about a hundred seconds and is pretty indicative of systems biological models, coming from 1122 ODEs with 24388 terms that describe a stiff chemical reaction network modeling the BCR signaling network from Barua et al. Or the discrete diffusion models https://docs.sciml.ai/SciMLBenchmarksOutput/stable/Jumps/Dif... which are the justification behind the claims in https://www.biorxiv.org/content/10.1101/2022.07.30.502135v1 that the O(1) scaling methods scale better than O(log n) scaling for large enough models? There's lots of benchmarks on that site which show things from small to large. And small models do matter too...

> If you use special routines (BLAS/LAPACK, ...), use them everywhere as the respective community does.

It tests with and with BLAS/LAPACK (which isn't always helpful, which of course you'd see from the benchmarks if you read them). One of the key differences of course though is that there are some pure Julia tools like https://github.com/JuliaLinearAlgebra/RecursiveFactorization... which outperform the respective OpenBLAS/MKL equivalent in many scenarios, and that's one noted factor for the performance boost (and is not trivial to wrap into the interface of the other solvers, so it's not done). There are other benchmarks showing that it's not apples to apples and is instead conservative in many cases, for example https://github.com/SciML/SciPyDiffEq.jl#measuring-overhead showing the SciPyDiffEq handling with the Julia JIT optimizations gives a lower overhead than direct SciPy+Numba, so we use the lower overhead numbers in https://docs.sciml.ai/SciMLBenchmarksOutput/stable/MultiLang....

> you must compile/write whole programs in each of the respective languages to enable full compiler/interpreter optimizations

You do realize that a .so has lower overhead to call from a JIT compiled language than from a static compiled language like C because you can optimize away some of the bindings at the runtime right? https://github.com/dyu/ffi-overhead is a measurement of that, and you see LuaJIT and Julia as faster than C and Fortran here. This shouldn't be surprising because it's pretty clear how that works?

I mean yes, someone can always ask for more benchmarks, but now we have a site that's auto updating tons and tons of ODE benchmarks with ODE systems ranging from size 2 to the thousands, with as many things as we can wrap in as many scenarios as we can wrap. And we don't even "win" all of our benchmarks because unlike for you, these benchmarks aren't for winning but for tracking development (somehow for Hacker News folks they ignore the utility part and go straight to language wars...).

If you have a concrete change you think can improve the benchmarks, then please share it at https://github.com/SciML/SciMLBenchmarks.jl. We'll be happy to make and maintain another.

Piling on about overhead (and SQLite), many high-level languages take some hit for using an FFI. So you're still incentivized to avoid tons of SQLite calls.

https://github.com/dyu/ffi-overhead

It depends on what's going on. JIT compilers have more information to optimize on so they can do surprising things. For example, FFI calls into shared libraries is generally faster with fast JIT languages.

https://github.com/dyu/ffi-overhead

This is one reason why you could see Julia outperforming Fortran in some cases where the FFI speed matters. But Fortran does have easier aliasing analysis (because you can't alias) so that helps there, but other than that most of the compiler passes are pretty much the same.

> C interop is simple and elegant

C interop in go is super slow: https://github.com/dyu/ffi-overhead

> Probably Go will be the next hotness in 5 years.

I think it will be difficult to grow a large ecosystem for a language with very poor FFI performance [0] in the long run. Golang's poor FFI performance is the number 1 reason I wouldn't use it for my own projects.

[0]: https://github.com/dyu/ffi-overhead

Julia's `ccall` is great in terms of overhead[0], so calling Rust shared libraries is not a problem. On the Rust side, it took me a while to figure out passing in pointers, and then constructing slices via from_raw_parts[_mut], so that I can transition to safe Rust. Perhaps that is obvious to more experienced Rust programmers, but I was left with the impression that receiving pointers and crunching numbers is not yet a common application for Rust (unlike C, C++, or Fortran). Meaning there is not a lot of introductory material coming from that angle at the moment. Additionally, to get good vectorization seems to require nightly and a fastfloat[1] library. In particular, you'd want associative math / fp-contract for fma and SIMD instructions, and perhaps fno-math-errno to turn off branches in functions like sqrt.

I imagine calling Rust from Julia will be much more common than calling Julia from Rust. I know approximately nothing about this, but there are plenty of questions about embedding Julia into C/C++[2][3]. May be similar for Rust.

[0] https://github.com/dyu/ffi-overhead [1] https://github.com/robsmith11/fastfloat [2] https://discourse.julialang.org/t/support-with-embedding-jul... [3] https://discourse.julialang.org/t/api-reference-for-julia-em...

I think Julia is the dark horse to eventually take over a wide swath of computing - possibly wider than Java or C++. As others have pointed out there's an effort to produce static Julia executables, and I think it's already possible to produce libraries. One interesting datapoint is that Julia's C FFI is faster than that of C++...

https://github.com/dyu/ffi-overhead

(For those interested, the order of the first few languages is: lua-jit, julia, c(!), c++, zig, nim, d in order of decreasing speed.)

It's extremely well thought out, concise, powerful, and readable. I think Julia's approach to types and multiple dispatch is a better alternative to traditional OO programming.

One thing the author didn't point out is that C++ (clang), Swift, Rust and Julia all use the LLVM infrastructure, resulting in extremely similar if not identical code generation. If datacenter efficiency truly becomes a priority, highly efficient languages like Julia, Rust and Swift will see increasing use for general purpose programming.

I think the problem is the ffi overhead to C from Go is really high, like, way higher than any comparable language, because of the segmented stack thing. I mean the nice thing about Love2d is that LuaJIT has the best ffi overhead when calling C.

https://github.com/dyu/ffi-overhead

> I think of Julia and swift as higher-level languages, more domain-specific. Different tools for different problem domains.

Well...my particular interest in these types of languages is primarily in the realm of soft real-time simulation, specifically various kinematic simulations.

Of the three, only Julia is garbage collected, and (unlike some other GC languages) it's fairly easy to not exercise the collector. I'm encouraged that will continue to be the case, since there's an organization using it for robotics, which is implicitly a hard real time use case.

http://www.juliarobotics.org/

Julia, Swift and Rust are all clearly general purpose languages. Swift is unabashedly general purpose, while Julia and Rust each have a primary niche - math/science and systems, respectively. All three use the excellent LLVM infrastructure.

Aside from determinism (which mainly requires pre-allocating nearly everything), my primary requirements are expressiveness/productivity, readability, and efficient runtime performance.

All three languages produce highly optimized code, and Rust probably has the edge as far as efficiency goes - but it clearly loses on the first two criteria, at least to Julia. If one needs access to machine level functionality in Julia, there's an extremely efficient C FFI, so mixing Rust and Julia (for instance) would be painless if needed.

https://github.com/dyu/ffi-overhead

It's a great time to be a software developer, and things will only get better as languages and tooling continue to improve!

good question! i'm not an expert.

my understanding is go is slow because each function call has to copy over to a bigger stack (goroutines have tiny stacks to start, and grow on demand, but c code can't do that, natch) and because it has to tell the goroutine scheduler some stuff for hazy reasons.

this github issue has a lot of interesting discussion of go ffi: https://github.com/golang/go/issues/16051

these benchmarks https://github.com/dyu/ffi-overhead seem to show that a c call via cgo will be about 14 (!) times slower than a c call from c# in mono, which itself is about 2 times slower than just a plain c call.

Looking at the c ffi improvements posted, I thought I'd check if there were improvements.

Time in ms to call a c function 2 billion times (lesser is better).

95536 - go 1.1.2

130105 - go 1.8.0

It is a rather crude example:

// plusone is the ffi

int x = 0;

while (x < 2000000000) x = plusone(x);

Here's the source https://github.com/dyu/ffi-overhead

With other programming languages:

c:

4778

nim:

4746

rust:

5331

java7:

17922

java8:

17992

go:

130105

(edited for formatting)

OpenSSL and similar libraries spend most of their time processing short packets. For example, encrypting a few hundred bytes using AES these days should take only a few hundred CPU cycles. This means that the overhead of calling the crypto code should be minimal, preferably 0. This is in part what I meant by "first-class". Perhaps I should have written "zero-overhead" instead.

I googled around just now for some benchmarks on the overhead of FFIs. I found this project [1] which measures the FFI overhead of a few popular languages. Java and Go do not look competitive there; Lua came surprisingly on top, probably by inlining the call.

Before you retort with an argument that a few cycles do not matter that much, remember that OpenSSL does not run only in laptops and servers; it runs everywhere. What might be a small speed bump on x86 can be a significant performance problem elsewhere, so this is something that cannot be simply ignored.

[1] https://github.com/dyu/ffi-overhead