It was a while ago I was deep into this mess so forgive any ignorance–but–iirc the thread-mutex dogma[1] has many pitfalls despite being so widely used. Primarily they’re easy to misuse (deadlocks, holding a lock across a suspend point), and have unpredictable performance because they span so far into compiler, OS and CPU territory (instruction reordering, cache line invalidation, mode switches etc). Also on Arm it’s unclear if mutices are as cheap because of the relaxed memory order(?). Finally code with mutices are hard to test exhaustively, and are prone to heisenbugs.
Now, many if not most of the above apply to anything with atomics, so lock-free/wait-free won’t help either. There’s a reason why a lot of concurrency is ~phd level on the theoretical side, as well as deeply coupled with the gritty realities of hardware/compilers/os on the engineering side.
That said, I still think there’s room for a slightly expanded concurrency toolbox for mortals. For instance, a well implemented concurrent queue can be a significant improvement for many workflows, perhaps even with native OS support (io_uring style)?. Another exciting example is concurrency permutation test frameworks[2] for atomics that reorder operations in order to synthetically trigger rare logical race conditions. I’ve also personally had great experience with the Golang race detector. I hope we see some convergence on some of this stuff within a few years. Concurrency is still incredibly hard to get right.
[1]: I say this only because CS degrees has preached mutices to as the silver bullet for decades.
For triggering race conditions in compiled binaries, you could try https://robert.ocallahan.org/2016/02/introducing-rr-chaos-mo....
Anyways, https://github.com/tokio-rs/loom is used by any serious library doing atomic ops/synchronization and it blew me away with how fast it can catch most bugs like this.
> others that modify the compiler itself to replace the concurrency primitives with runtime functions, that can then execute them in a fuzzed order.
Loom[1] is somewhat of that. It's a testing system (not a runtime for a full app) which tests multiple permutations of program execution (all possible permutations I think, limited by test case complexity or an optional "maximum thread switch count"), as well as modeling atomic/weak memory effects to some degree.
[0] https://preshing.com/20120930/weak-vs-strong-memory-models/ [1] https://github.com/tokio-rs/loom