What does HackerNews think of rustler?
Safe Rust bridge for creating Erlang NIF functions
The big thing in Elixir community right now is to write the native/performance code in Rust[1] and interop with Elixir.
Managing concurrency outside of Rust and then calling Rust for the more focused and specialized work is a good combination IMO.
How so? Packages like neon [1] and rustler [2] suggest otherwise. I'm using both of those in a real product (I'm using neon directly, to write native modules for an Electron app; on the back-end, I depend on an Elixir package that uses rustler).
A recent solution was to precompile these binaries. It's great.
[0] https://github.com/rusterlium/rustler [1] https://dashbit.co/blog/rustler-precompiled
I'm really pleased with how it turned out. Inspired by the Nx architecture, which uses pluggable backends, I built a thin user-facing API that calls into (theoretically, as polars is the only extant backend) pluggable dataframe backends. What really excited me about this approach is that it permits a similar approach to what we see in dplyr [5], where you can manipulate in-memory data frames using the same API as remote databases or spark dataframes.
Next up is to move to lazy-by-default. To be as unsurprising as possible, Explorer dataframes are (for all intents and purposes) immutable. This has required a fair amount of copying when using the eager API from Polars, and because Rustler NIF resources use atomic reference counting and the GC only sweeps intermittently, there can be some pretty bad memory performance. Fortunately Polars also has a lazy API. The plan is to use that with 'peeking' for display.
After that, I'd like to move into additional backends. I'm particularly keen on Ecto (database) and Apache Arrow/Ballista for distributed and OLAP work. There is also work underway for a pure Elixir backend so the library can ship without a Rust dependency. Speaking of which, there's work on prebuilt binaries underway as well.
I'd love feedback on the API! I aimed for a dplyr-ish API as I think it melds better with a functional language than pandas. Generally I find dplyr more intuitive than pandas. The philosophy here is to get from brain to data as simply and intuitively as possible.
Finally, contributions and any other feedback are super, super welcome. It's early days and I'm also a startup founder so I haven't been able to dedicate as much time as I'd like, but I try to get some work done and add features at least once a week.
Thanks for looking!
--
[1] https://github.com/elixir-nx/nx
[2] https://github.com/elixir-nx/axon
[3] https://github.com/pola-rs/polars
Well, yeah, what I’m saying with “these types of modern frameworks don’t impose very many constraints on the language” is that there’s no reason that Qt, UWP, Interface Builder, etc. can’t support Rust (or most other languages, really), because in the end the tooling is just generating/editing data in a declarative markup language, that the language’s toolkit binding parses. You don’t have to modify the tooling in order to get it working with a new language; you just need a new toolkit binding. Just like you don’t need to modify an HTML editor to get it to support a web browser written in a new language. Qt et al, like HTML, is renderer-implementation-language agnostic.
> Distributed computing, again when thinking about distributed calls a la Akka, Orleans, SQL distributed transactions, I rather have the productivity of a GC.
I think I agree re: the productivity multiplier of special-purpose distributed-computing frameworks. I don’t think I agree that it’s a GC that enables these frameworks to be productive. IMHO, it’s the framework itself that is productive, and the language being GCed is incidental.
But, either way—whether it’s easy or hard—you could still have one of these frameworks in Rust. Akka wasn’t exactly easy to impose on top of the JVM, but they did it anyway, and introduced a lot of non-JVM-y stuff in the process. (I’d expect that a distributed-computing framework for Rust would impose Objective-C-like auto-release-pools for GC.)
> Web development with Rust is nowhere close[...]
Web development with Rust isn’t near there yet, but unlike distributed computing, I don’t see anything about web development that fundamentally is made harder by borrow-checking / made easier by garbage-collection; rather the opposite. I fully expect Rust to eventually have a vibrant web-server-backend component ecosystem equivalent to Java’s.
> Rust best place is for kernels, drivers and absolute no GC deployment scenarios.
Those are good use-cases, but IMHO, the best place for Rust is embedding “hot kernels” of native code within managed runtimes. I.e. any library that’s native for speed, but embedded in an ecosystem package in a language like Ruby/Python/Erlang/etc., where it gets loaded through FFI and wrapped in an HLL-native interface. Such native libraries can and should be written in Rust instead: you want the speed of a native [compiled, WPO-able] language; but you also want/need safety, to protect your HLL runtime from your library’s code that you’re forcing to run inside it; and you also want an extremely “thin” (i.e. C-compatible) FFI, such that you’re not paying too much in FFI overhead for calls from your managed code into the native code. Rust gives you all three. (I see this being an increasingly popular choice lately. Most new native Elixir NIF libraries that I know of are written using https://github.com/rusterlium/rustler.)
> number-crunchy, like a shoot em up, or HPC.
> something which requires mutable bitmaps (someone this past weekend brought up "minecraft server")
One thing I'd like to see for the BEAM communities long term are well maintained libraries of NIFs[0] for high performance and possibly mutable data structures. Projects like rusterl[1] and the advances made on dirty schedulers make this more feasible than it used to be.
It would be cool to write all the high level components of a minecraft-esque game in Elixir, and drop down to rust when you need raw performance. Similar to the relationship between lua/c++ in some modern game engines
It's static typing that Erlang can't complete-the-triangle on; not typing generally.
The only problem is that "offline" type-checking like this does nothing to solve one of the main use-cases/pain-points where Erlangers want types (or, at least, think they want types): in the messages that actors receive. You can't make any sort of a type assertion about what other actors in the system are allowed to send this actor, and get that validated; because "other actors" in a distributed system necessarily include ones that aren't even part of your present codebase!
I have a philosophy about this—not sure where I picked it up, but I think it hews to the Zen of Erlang quite well:
If you already know the type of a message, then by definition, it's not a message any more, but just a regular data value. A message is an OOP concept (and Erlang is an OOP language, where processes are the "objects.") An OOP "message" is a piece of data the meaning of which is up to the interpretation of the recipient; where that interpretation can change as the recipient's internal state changes. The whole point of the "receiving a message" code that you write in an Erlang actor-process, is to allow you to do custom logic for that interpreting part. To use the value itself, in making the decision of what the value is.
In fact, I would extend that: the whole point of Erlang as a language is to do that "interpreting" part. Once you know what something is and have put it into a canonical validated structure, you may as well hand it off to native code (using e.g. https://github.com/rusterlium/rustler). If you think of native code as being a pure domain of strongly-typed values, then picture Erlang as the glue that lives in the ugly world of "not yet typed" values, making decisions under partial-information conditions on what types to try to conform received messages into, before they can enter that pure strongly-typed domain. That's Erlang's idiomatic use-case! (You can tell, because using it for that produces absolutely beautiful code; whereas using it to do e.g. crypto math, produces an abomination.)
Which is all to say: the interpretation (or, if you like, constraint) of a message into a typed value is a Turing-complete operation; and the logic for doing so is best represented as an Erlang program. Erlang doesn't need a type system for messages; Erlang is a type system for messages. :)
I would caveat that in a couple of ways.
First, suppose you have a web app where some requests involve heavy number crunching and others don't. In web frameworks where 1 request ties up 1 OS thread, a burst of heavy requests could gobble up all your available connections. Phoenix would use one cheap BEAM process per request, and the BEAM's preemptive scheduler would ensure that other requests are answered in a timely way and that all the heavy ones continue to make steady progress. So although the heavy requests might be completed more slowly than in another language, the overall system would remain more responsive.
Second, if you have need for heavy computation or data structures that work better with mutability, it's possible to (eg) use Rustler (https://github.com/rusterlium/rustler) to implement that part in Rust. See https://github.com/rusterlium/rustler for a story about doing this.
See https://github.com/rusterlium/rustler
Discord especially has several real world use cases where they've implemented native code in Rust.