"Lockless" sounds very positive, and it's tempting to think that everything should be made lockless. However, it's less great than it sounds.

Mutexes are very cheap in the uncontended case, and in the contended case they have some nice properties. You should only consider going for a lockless algorithm if you can prove it is better than just using a mutex. Sometimes lockless algorithms are in fact slower than algorithms using locks, as things like atomic compare-and-exchange instructions can have a significant cost.

Another problem with lockless is that you can't turn every algorithm into a lockless one, it's actually only a handful of them that are practical to implement locklessly. Of course, the ones that are practical, like some queue and linked list algorithms, can give very nice benefits if used correctly.

> Mutexes are very cheap in the uncontended case

It was a while ago I was deep into this mess so forgive any ignorance–but–iirc the thread-mutex dogma[1] has many pitfalls despite being so widely used. Primarily they’re easy to misuse (deadlocks, holding a lock across a suspend point), and have unpredictable performance because they span so far into compiler, OS and CPU territory (instruction reordering, cache line invalidation, mode switches etc). Also on Arm it’s unclear if mutices are as cheap because of the relaxed memory order(?). Finally code with mutices are hard to test exhaustively, and are prone to heisenbugs.

Now, many if not most of the above apply to anything with atomics, so lock-free/wait-free won’t help either. There’s a reason why a lot of concurrency is ~phd level on the theoretical side, as well as deeply coupled with the gritty realities of hardware/compilers/os on the engineering side.

That said, I still think there’s room for a slightly expanded concurrency toolbox for mortals. For instance, a well implemented concurrent queue can be a significant improvement for many workflows, perhaps even with native OS support (io_uring style)?. Another exciting example is concurrency permutation test frameworks[2] for atomics that reorder operations in order to synthetically trigger rare logical race conditions. I’ve also personally had great experience with the Golang race detector. I hope we see some convergence on some of this stuff within a few years. Concurrency is still incredibly hard to get right.

[1]: I say this only because CS degrees has preached mutices to as the silver bullet for decades.

[2]: https://github.com/tokio-rs/loom