Happy to answer any questions.

Could you explain (in simple terms if possible) how the Multicore OCaml achieves a memory model which is much simpler on more efficient than in Java or C (mentioned at https://github.com/ocaml-multicore/ocaml-multicore/wiki)?

Didn't see any mentions of critical sections (mutexes) with C++ examples in the documentation ("Bounding Data Races in Space and Time"). I'm not sure I understand the comparisons the writers are presenting.

(One of the authors of the mentioned paper [1])

Firstly, if you are using high-level synchronisation mechanisms such as mutexes and condition variables, or higher-level concurreny libraries such as java.util.concurrent, you shouldn't worry about the memory model. C++, Java and OCaml ensure that properly synchronised programs do not exhibit surprising behaviours. Such programs have sequentially consistent semantics i.e, the observed behaviour is one of the permitted interleavings of the threads in the program. Rust inherits C++ memory model [2], but if you are using the safe subset, then you will never have to think about it. Memory model is important only to those who write the concurrency libraries. If you are still keen, read on.

The OCaml memory model is certainly simpler than the C++ and Java memory model, but being more efficient is not one of our goals. C++ memory model permits a partially-ordered lattice of stronger memory accesses starting from access to non-atomic memory locations to sequentially consistent access with increasing cost as you move up the lattice. OCaml memory model only provides two -- atomic and non-atomic, representing approximately the top and the bottom of the lattice.

OCaml memory model is also stronger than Java in that our data races are bound not-only in space like Java (data races on certain variables don't affect behaviours on other variables) but also in time (surprising behaviours stop affecting the program after the race ends unlike Java; see example 2 from [1]). This permits modular reasoning of racy and non-racy parts of the program which is not the case with C++ and Java.

The catch is that we have to disallow load-to-store reordering to get the stronger guarantees. Relaxed memory models such as ARM and Power do in fact permit these reorderings, and we have to compile OCaml code (including sequential one) such that the load-to-store reordering is disallowed. This can be done fairly cheaply. It is free on x86 which doesn't perform load-to-store reorderings, and has a small cost (up to 3%) on ARM and Power architectures whose memory models permit load-to-store reorderings.

[1] https://kcsrk.info/papers/pldi18-memory.pdf

[2] https://doc.rust-lang.org/nomicon/atomics.html

> Rust inherits C++ memory model [2], but if you are using the safe subset, then you will never have to think about it.

Small correction, atomics are part of the safe subset of Rust. At a certain point it's important to have a work-a-day knowledge of the memory model with regard to atomic numerics. Dealing with allocated types, now, that's a whole different area and is specialized knowledge.

Still learning Rust here. If I use Rust's safe subset Atomics, am I correct in thinking that I am able to write code that on some platforms does something I didn't expect because I didn't ask for an Ordering I needed to make it do what I meant?

For example if I implement an algorithm that ought to be Acquire / Release (e.g. to build my own custom mutual exclusion) but I tell Rust it's OK to have Relaxed semantics, it sounds like on this PC (an x86-64) it will work anyway, but on some other systems it won't.

And I'd have achieved this goof without writing unsafe Rust, just the same way as if I screwed up a directory traversing algorithm because I relied on semantics not present in all file systems?

This is a good question. Yes, it's exactly the same kind of problem. There is potentially a difference between Rust's memory model and what's actually present on any given target host. x64 has a "strong" memory model which that ordering will always be implicitly acquire/release. Compare this to ARM which is "weak" where your relaxed ordering will actually be relaxed. (There's actually quite a bit more nuance, discussed well here[0].) It's important to write code that is correct with regard to Rust's memory model so that it's portable, but if you don't have a weakly ordered machine to test on it's tricky. Loom[1] is helpful in this regard. This is true of any language that allows you to write atomic code where you specify the ordering.

[0] https://preshing.com/20120930/weak-vs-strong-memory-models/ [1] https://github.com/tokio-rs/loom