What does HackerNews think of TSOEnabler?
Kernel extension that enables TSO for Apple silicon processes
Lemme just look up TSO and...
https://github.com/saagarjha/TSOEnabler
...Oh. Fair enough, my mistake :P
Is there not an instruction to switch into TSO mode, though? Wouldn't that technically count? :P
If TSO didn’t have a performance penalty, it wouldn’t need to be a separate mode. Also, it should be obvious that stricter ordering constraints inherently allow less parallelism, so lower performance.
If it weren't for the following project... I'd agree with you.
https://github.com/saagarjha/TSOEnabler
> A kernel extension that enables total store ordering on Apple silicon, with semantics similar to x86_64's memory model. This is normally done by the kernel through modifications to a special register upon exit from the kernel for programs running under Rosetta 2; however, it is possible to enable this for arbitrary processes (on a per-thread basis, technically) as well by modifying the flag for this feature and letting the kernel enable it for us on. Setting this flag on certain processors can only be done on high-performance cores, so as a side effect of enabling TSO the kernel extension will also migrate your code off the efficiency cores permanently.
--------
Its clear that Apple has implemented total-store ordering on its chips (including the M1).
Here's a kernel extension someone built to manipulate this feature: https://github.com/saagarjha/TSOEnabler
Yes, handling indrect branch seems a bit complex and I'm not a specialist in the field. But I'm pretty sure that the cases of indirect branch are rare enough so that an additional indirection is relatively inexpensive. Adding a simple address mapping table should meet most of the cases.
An interesting question would also be whether Apple has added features to the hardware to improve the translation?
We know, for example, that Apple introduced a special register [1] to temporarily switch from the ARM consistency model to the TSO consistency model (Total Store Order) from x86.