Might be worth noting that it's faster than quicksort, std::stable_sort, and timsort.
These are all pretty slow on 32-bit integers, which you're using for testing. Could you please include a comparison with pdqsort, or at least mention that pdqsort is to be preferred when an element fits in a register (as it certainly will be if you're only beating timsort by 20% or so)? It's frustrating to see sorting algorithms with some effort spent on benchmarking and still have to guess at whether it's practical based on how much slower the comparison points are than the state of the art.
Radix sorts (like your own wolfsort!) will also be much faster on the benchmarks presented, although that's sort of misleading since they are very good at uniformly random input and not so good at other distributions.
pdqsort isn't stable and still suffers from killer inputs.
Not sure how well this will paste:
Name | Items | Type | Best | Average | Loops | Samples | Distribution
-------- | -------- | ---- | -------- | -------- | --------- | ------- | ----------------
blitsort | 100000 | 32 | 0.005962 | 0.006532 | 1 | 100 | random order
pdqsort | 100000 | 32 | 0.002665 | 0.002802 | 1 | 100 | random order
| | | | | | |
blitsort | 100000 | 32 | 0.000069 | 0.000078 | 1 | 100 | ascending order
pdqsort | 100000 | 32 | 0.000098 | 0.000100 | 1 | 100 | ascending order
| | | | | | |
blitsort | 100000 | 32 | 0.001138 | 0.001196 | 1 | 100 | ascending saw
pdqsort | 100000 | 32 | 0.003245 | 0.003317 | 1 | 100 | ascending saw
| | | | | | |
blitsort | 100000 | 32 | 0.003777 | 0.003843 | 1 | 100 | generic order
pdqsort | 100000 | 32 | 0.000819 | 0.000851 | 1 | 100 | generic order
| | | | | | |
blitsort | 100000 | 32 | 0.000051 | 0.000060 | 1 | 100 | descending order
pdqsort | 100000 | 32 | 0.000202 | 0.000207 | 1 | 100 | descending order
| | | | | | |
blitsort | 100000 | 32 | 0.000815 | 0.000845 | 1 | 100 | descending saw
pdqsort | 100000 | 32 | 0.002307 | 0.002368 | 1 | 100 | descending saw
| | | | | | |
blitsort | 100000 | 32 | 0.001761 | 0.001774 | 1 | 100 | random tail
pdqsort | 100000 | 32 | 0.002560 | 0.002572 | 1 | 100 | random tail
| | | | | | |
blitsort | 100000 | 32 | 0.003328 | 0.003374 | 1 | 100 | random half
pdqsort | 100000 | 32 | 0.002651 | 0.002854 | 1 | 100 | random half
| | | | | | |
blitsort | 100000 | 32 | 0.000593 | 0.000938 | 1 | 100 | ascending tiles
pdqsort | 100000 | 32 | 0.002316 | 0.002482 | 1 | 100 | ascending tiles
You should explain these issues in the repository, not in Hacker News comments!
A benchmark of 100,000 32 bit integers suggests that you will be comparing against algorithms that do well on that benchmark when it's not the case at all. The concept of stability doesn't apply when elements that compare the same are indistinguishable, which is the case for integers. And I think it's an exaggeration to say pdqsort has "killer inputs". In your benchmarks the patterned data always sorts faster than random, so the most you can say is that merge sort exploits some kinds of structure better (I have seen patterns where it does worse, but not by much). I expect timsort has some similar advantages over blitsort because of its ability to find natural runs while blitsort always starts at size 32—try an up-down pattern with runs of length 50 for instance. The absolute worse case for pdqsort is a fallback to heapsort. With pivot randomization it's pretty much impossible on large arrays unless they are engineered to make pdqsort fail, and you end up sorting the array three times slower or something. Not much of a DOS attack.
Having a fast merge sort is definitely useful in a lot of situations (my thoughts at [1]), and I've been interested in quadsort particularly for that purpose (blitsort seems to emphasize lower external memory usage which I've never had a use for). Half as fast as pdqsort on random data is actually somewhat better than I expected. It feels dishonest to see "Blitsort has exceptional performance" followed by selective benchmarks showing worse algorithms, and it makes it much harder to figure out what blitsort is good for.
[1] https://mlochbaum.github.io/BQN/implementation/primitive/sor...
Well, blitsort's performance is exceptional in the sense that it's 15% faster than the next fastest stable in-place sort.
As for stability, it's useful when you need to sort a table with an integer index. So it might be of value to database software.
I probably should point out some of the weakness on the README.
Agreed on pdqsort being 3x slower worst case on "killer" input, but it's my understanding that's why it's not being used to replace std::sort.
As for runs of 50, haven't benched those, though I assume gridsort (https://github.com/scandum/gridsort) would handle those rather well.
This is just not true. Introsort (the current std::sort in all C++ compilers I know of) also has 'killer' inputs, and will switch to heapsort twice as slow than than pdqsort does in those. Worse, the libc++ implementation of std::sort has a quadratic killer sequence that I found 7 years ago, and it is still not fixed: https://bugs.llvm.org/show_bug.cgi?id=20837. In my opinion there is no good reason that pdqsort is not adopted yet as std::sort as it is faster for many patterns, faster for random input and rarely if ever slower.
In fact, pdqsort is the standard unstable sorting algorithm in Rust: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.sor...
Note that any deterministic quicksort variant that does not spend a lot of time on choosing its pivots will always have some 'killer' pattern that forces it to use its fallback algorithm (or worse, go O(n^2)), as you can always shuffle the input array such that poor pivots are chosen. What pdqsort does is shuffle things around so that those 'killer' patterns are a lot less trivial, and are not as likely to occur in real-world input. For me determinism was very important so I didn't do it in https://github.com/orlp/pdqsort, but note that the variant implemented in Rust uses non-deterministic swapping and thus it is impossible to force a 'killer' input against it.