If you are exclusively moving numbers around in an array, SIMD sorting sounds great.

But when you want to actually sort things, it's different. You have objects in an array, the objects will have one member which is a Key that you will sort the objects on.

In order to create a sorted permutation of the list, you either need to rearrange the list of object pointers (fast), rearrange the memory of the objects (slow), or you simply output a list of array indexes that represents what the sort order should be (fast).

Code that doesn't physically move the object memory around creates indirection. Indirection makes any SIMD sorting depend on scatter-gather in order to get data in. Scatter-Gather causes random memory accesses which don't cache as well as sequential access.

Yeah I find the thing you generally want but which many languages don’t support well is to have a struct-of-arrays or data frame layout rather than an array of objects. Then you can do one of:

Sort one array and rearrange other arrays the same way

Compute the array of indices into another array such that array[index[i]] < array[index[i+1]], I.e. the sorted version of the array is then [array[i] for i in index]. If you have a vectorised index operation (eg APL, R, …) getting the sorted array is one line of code.

I think with simd you’d want to sort your array and simultaneously permute either a second array or the array of indices 1, 2, …, n so you can use that to do accesses to corresponding elements in your other associated columns.

Maybe Google's new "Rune" language will become prevalent https://github.com/google/rune, which supports SoA.