If you use threads, green or otherwise, you don't have to "implement" special code for composing things together, you get the full set of tools for composing code together, which includes, in passing, state machines, among all the other things it includes. This basically implements an Inner Platform Effect of an internal data-based language for concurrency that the language interprets, which will A: forever be weaker than the exterior language (such is the nature of the Inner Platform Effect) and B: require a lot of work that is essentially duplicating program control flow and all sorts of other things the exterior language already has.

There are some programming languages that have sufficiently impoverished semantics that this is their best effort that they can make towards concurrency.

But this is Rust. It's the language that fixes all the reasons to be afraid of threading in the first place. What's actually wrong with threads here? This isn't Java. And having written network servers with pretty much every abstraction so much as mentioned in the article, green threads are a dream for writing network servers. You can hardly believe how much accidental complexity you're fighting with every day in non-threaded solutions until you try something like Erlang or Go. Rust could be something that I mention in the same breath for network servers. But not with this approach.

There's plenty to debate in this post and I don't expect to go unchallenged. But I would remind repliers that we are explicitly in a Rust context. We must talk about Rust here, not 1998-C++. What's so wrong with Rust threads, excepting perhaps them being heavyweight? (Far better to solve that problem directly.)

> What's so wrong with Rust threads, excepting perhaps them being heavyweight? (Far better to solve that problem directly.)

Nothing. If threads work fine, use them! That's what most Rust network apps do, and they work fine and run fast. A modern Linux kernel is very good at making 1:1 threading fast these days.

> And having written network servers with pretty much every abstraction so much as mentioned in the article, green threads are a dream for writing network servers.

Green threads didn't provide performance benefits over native threads in Rust. That's why they were removed.

Based on my experience, goroutine-style green threads don't really provide an performance benefit over native threads. The overhead that really matters for threads is in stack management, not the syscalls, and M:N threading has a ton of practical disadvantages (which is why Java removed it, for example). It's worth going back to the debate around the time of NPTL and look at what the conclusions of the Linux community were around this time (which were, essentially, that 1:1 is the way to go and M:N is not worth it).

There are benefits to be had with goroutine-style threads in stack management. For example, if you have a GC that can relocate pointers, then you can start with small stacks. But note that this is a property of the stack, not the scheduler, and you could theoretically do the same with 1:1 threading. It also doesn't get you to the same level of performance as something like nginx, which forgoes the stack entirely.

If you really want to eliminate the overhead and go as fast as possible, you need to get rid of not only the syscall overhead but also the stack overhead. Eliminating the stack means that there is no "simple" solution at the runtime level: your compiler has to be heavily involved. (Look at async/await in C# for an example of one way to do this.) This is the approach that I'm most keen on for Rust, given Rust's focus on achieving maximum performance. To that end, I really like the "bottom-up" approach that the Rust community is taking: let's get the infrastructure working first as libraries, and then we'll figure out how to make it as ergonomic as possible, possibly (probably?) with language extensions.

My overarching point is this: it's very tempting to just say "I/O is a solved problem, just use green threads". But it's not that simple. M:N is obviously a viable approach, but it leaves a lot of performance on the table by keeping around stacks and comes with a lot of disadvantages (slow FFI, for example, and fairness).

I'm down with all that. I'm really arguing in favor of threading here rather than any particular model of it.

Plus, if you write threaded code, you can change the runtime out. Rust guarantees all the hard parts, anyhow. I wouldn't even be surprised once Rust settles down further it turns out there's some sort of hybrid solution that is better than either 1:1 or M:N on its own because it has superior insight into what its functions are doing.

However, if you are sitting down in front of an editor, faced with the task of writing a network server, and you immediately reach for the overcomplicated solutions like this rather than starting with threads, you are paying an awfully stiff development price for a performance improvement that probably won't even manifest as any useful effect, because the window where this will actually save you is rather small. If you're sitting at 90% utilization, and this sort of tweak can get you to 80% utilization, that's essentially a no-op in the network server space, because it means you still better deploy a second server either way. If you're close to a capacity problem (and the code has been decently optimized to the point where that's not an option anymore), the solution is not to rewrite your code to squeeze out the 10% inefficiency in stack handling, the solution is to deploy another server, and if a rewrite is needed, rewrite to make that possible/effective/practical.

In the Servo design space, it makes perfect sense to be upset that you lost 10% of your performance to green threads, because that's time a real human user is directly waiting, and what does a web browser need with 10,000 threads anyhow? (I mean, sure, set one the deliberate task of using that many threads and they can, but in practice you'd rather be doing real work than any sort of scheduling of that mess.) In the network space it's way less clear... if your infrastructure is vulnerable to 10% variances in performance, your infrastructure is vulnerable to 10% variances in incoming request rate, too.

(10% is a bit of a made up number. I believe, if anything, it is an overestimate. Point holds even if it's a larger number anyhow.)

Are you talking about threads-the-programming-model (vs. events, callbacks, channels, futures/promises, dependency graphs), or are you talking about threads-the-implementation-technique (vs. processes or various select()-like mechanisms)? And if you are talking about threads-the-programming-model, which synchronization mechanism: locks, monitors, channels/queues, transactional memory?

If the various threads in your application don't share data, then you don't need to worry about all the pitfalls of threading. But if they don't share data and you don't care about small (~10%) performance differences, why not use processes? Then you can use a really straightforward blocking I/O model, and you get full memory isolation & security provided by the OS.

The interesting questions happen when you a.) want to squeeze as much performance out of the machine as possible or b.) need to share data between concurrent activities. Then all of the different programming models have pros and cons, and I'm not sure you can define a "best" approach without knowing your particular problem. (Which, IMHO, validates Rust's "provide the barest primitives you can for the problem, and let libraries provide the abstractions until it becomes clear that one library is a clear winner" approach.)

FWIW, my experience in distributed systems is that threads+locks is a terrible model for writing robust systems, and that once you're operating at scale, you really want some sort of dependency-graph dataflow system where you specify what inputs are required for each bit of computation and then the system walks the graph as RPCs come back and input becomes available. This lets you attach all sorts of other information to nodes - timeouts, latency statistics, error statistics, whether or not this node is required and what defaults to substitute if it fails, tracing, logging, load-balancing, etc. It also adds a huge amount of cognitive overhead for someone who just wants to make a couple database queries. I wouldn't use this for prototyping a webapp, but it's also invaluable when you have a production system and ops people who need to be able to shut down a misbehaving service at a moment's notice while still keeping your overall product up.

Do you have any links on the "dependency-graph dataflow system" that you are talking about? Sounds a little bit similar to what I'm trying to do, except at higher scale.

The specific systems I'm thinking of were Google-internal, but there's a close public analogue with Guice & Dagger:

https://github.com/google/guice

https://github.com/square/dagger

Imagine injecting Futures into your code. Now imagine injecting them, but having the Provider record a lot of metadata about who you were calling, how you were calling it, how long it took, whether an error occurred, letting an SRE turn off the call, etc.