Basically: Timsort is to mergesort as pdqsort is to quicksort
Direct link to the GitHub home of pdqsort (in C++), which has some performance comparisons: https://github.com/orlp/pdqsort
I've long since implemented that in pdqsort [2], along with other techniques described by me in [3] which is a drop-in replacement for std::sort that's ~twice as fast as std::sort for small data with branchless comparisons. It's also available in Boost.sort [4].
[1] https://arxiv.org/abs/1604.06697 [2] https://github.com/orlp/pdqsort [3] https://arxiv.org/abs/2106.05123 [4] https://www.boost.org/doc/libs/1_76_0/libs/sort/doc/html/sor...
This is just not true. Introsort (the current std::sort in all C++ compilers I know of) also has 'killer' inputs, and will switch to heapsort twice as slow than than pdqsort does in those. Worse, the libc++ implementation of std::sort has a quadratic killer sequence that I found 7 years ago, and it is still not fixed: https://bugs.llvm.org/show_bug.cgi?id=20837. In my opinion there is no good reason that pdqsort is not adopted yet as std::sort as it is faster for many patterns, faster for random input and rarely if ever slower.
In fact, pdqsort is the standard unstable sorting algorithm in Rust: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.sor...
Note that any deterministic quicksort variant that does not spend a lot of time on choosing its pivots will always have some 'killer' pattern that forces it to use its fallback algorithm (or worse, go O(n^2)), as you can always shuffle the input array such that poor pivots are chosen. What pdqsort does is shuffle things around so that those 'killer' patterns are a lot less trivial, and are not as likely to occur in real-world input. For me determinism was very important so I didn't do it in https://github.com/orlp/pdqsort, but note that the variant implemented in Rust uses non-deterministic swapping and thus it is impossible to force a 'killer' input against it.
https://github.com/orlp/pdqsort
Here are some of the results on an Ivy Bridge hackintosh:
size, qsort, inline, sort, stable, pdqsort, radix7
1000, 88.0, 60.6, 31.2, 37.5, 24.6, 12.8
10000, 109.3, 77.1, 45.6, 51.6, 29.1, 12.0
100000, 137.1, 97.7, 57.7, 71.9, 33.3, 12.5
1000000, 159.6, 114.2, 70.6, 88.6, 39.7, 20.4
10000000, 185.2, 133.7, 82.0, 99.7, 43.6, 19.4
Edit: it's not quite as easy to plug radix7 into pdqsort's more varied benchmark program, which expects a template function with a type parameter of int.Zhangxp1998 ported quadsort to C++ and compared with std::sort and found it to be slower: https://gist.github.com/zhangxp1998/0e2fa30656c894017d183e0d...
Shameless self promotion, if you want a faster sorting implementation, check out pdqsort (https://github.com/orlp/pdqsort) which is almost twice as fast as libstdc++'s std::sort for randomly shuffled integers, and has all sorts of tricks for input distributions with patterns (like pre-sorted, reverse sorted, single out-of-order element appended, etc).
[1] https://github.com/orlp/pdqsort [2] https://doc.rust-lang.org/std/vec/struct.Vec.html#method.sor...
https://github.com/orlp/pdqsort
Basically, timsort is to mergesort as pdqsort is to quicksort.
If the input array consists of several concatenated ascending or descending sequences, then timsort is the best. After all, timsort was specifically designed to take advantage of that particular case. Pdqsort performs respectably, too, and if you have more than a dozen of these sequences or if the sequences are interspersed, then it starts winning over timsort.
Anyways, both pdqsort and timsort perform well when the input is not quite random. In particular, pdqsort blows introsort (e.g. typical C++ std::sort implementations) out of the water when the input is not random[1]. It's pretty much a strict improvement over introsort. Likewise, timsort (at least the variant implemented in Rust's standard library) is pretty much a strict improvement over merge sort (e.g. typical C++ std::stable_sort implementations).
Regarding radix sort, pdqsort can't quite match its performance (it's O(n log n) after all), but can perform fairly respectably. E.g. ska_sort[2] (a famous radix sort implementation) and Rust's pdqsort perform equally well on my machine when sorting 10 million random 64-bit integers. However, on larger arrays radix sort starts winning easily, which shouldn't be surprising.
I'm aware that benchmarks are tricky to get right, can be biased, and are always controversial. If you have any further questions, feel free to ask.
def LookupKind(id):
return KIND_TABLE[id]
def LookupKind(id):
return 175 & id & ((id ^ 173) + 11)
> The second implementation is faster because it doesn't require any memory accesses.This is not necessarily true. If this function is in a hot loop, and the entire lookup table fits in L1 cache you can treat KIND_TABLE[id] as only taking ~1-2 cycles.
> Jeff Dean also posed this question in his talk: How long it does it take to quicksort 1 billion 4-byte integers? It surprised me that he estimated it by simply counting the number of branch mispredictions, which is described in this post.
This is no longer true. In state of the art implementations quicksort is implemented in a branchless fashion. See https://github.com/orlp/pdqsort for details (I'm the author of pdqsort, old username on HN). The trick is to first replace directly swapping misplaced elements in a loop by first finding a buffer of elements on the left that should be on the right, and a buffer of elements that are on the right but should be on the left. This can be done in a branchless manner:
buffer_num = 0; buffer_max_size = 64;
for (int i = 0; i < buffer_max_size; ++i) {
// With branch:
if (elements[i] < pivot) { buffer[buffer_num] = i; buffer_num++; }
// Without:
buffer[buffer_num] = i; buffer_num += (elements[i] < pivot);
}
These buffers can then also be swapped branchlessly.I'd strongly suggest you to read the readme I wrote for https://github.com/orlp/pdqsort, which explains in detail how I beat heapsort (and other sorting algorithms). It also shows how I prevent the O(n^2) worst case.
> I used to get torqued at considering big-O, and did this just because it ignores constants; as a programmer, I worked hard to do well with such constants. But in the end I gave in and gave up and just accepted big-O notation as the most appropriate, simple, single criterion.
We have already established that O(n log n) is the lower bound, and we have reached this lower bound. This is where big O stops being useful. A metric that compares equal for every relevant algorithm (mergesort, introsort, pdqsort, timsort, smoothsort, heapsort) is not a useful metric.
> In the land of big-O, a factor of only 2 is considered not very exciting.
I live in the land of real-world performance, powered by actual benchmarks. In this land a factor of 2 is very exciting.
> So, for a hybrid sort routine, consider also radix sort.
I worked in the scope of std::sort, which is strictly a comparison sort.