What's the problem? From D. Knuth's The Art of Computer Programming: Sorting and Searching,

(1) if sorting n records by comparing pairs of keys, then the Gleason bound shows that on average can't sort faster than O( n ln(n) ).

(2) Heap sort sorts n records in both worst case and average case in time O( n ln(n) ).

Net, when sorting by comparing pairs of keys, in main memory, on a simple machine architecture, with a single processor, ..., can't improve on Heap sort.

So, just use Heap sort. That's what I do.

Big O hides all constants. A hybrid algorithm using Quicksort for the heavy lifting is O(n log n) in the worst case, but is a significant amount faster than heapsort - mainly due to cache effects.

See pdqsort for examples (benchmarks included): https://github.com/orlp/pdqsort.

It shows that on average it's twice as fast on the random input distribution.

Now and long the usual criterion of performance is big-O notation, and that is what I considered, with the Gleason bound and heap sort and, thus, showed that, with various common assumptions, can't do better than heap sort. As far as I know, I was correct.

So, to claim to beat heap sort, have to set aside some of the usual assumptions and the big-O criterion.

Quicksort is great with a cache since as it has great locality of reference when working with the partitions that fit into the cache. But, worst case of Quicksort is still O( n^2 ) so what happens if Quicksort encounters such a partition?

Just intuitively, it appears that heap sort has a tough time making much exploitation of usual cache designs. We know that. There is a modification of heap sort that helps it work better when have to do virtual memory paging.

I used to get torqued at considering big-O, and did this just because it ignores constants; as a programmer, I worked hard to do well with such constants. But in the end I gave in and gave up and just accepted big-O notation as the most appropriate, simple, single criterion. And, for judging algorithms by how they do with cache exploitation, I gave up on that, also: Or, broadly it's the responsibility of such speedup techniques to work on ordinary code written for a simple processor architecture, not the responsibility of such code to do tricky things to exploit tricky architectures.

In the land of big-O, a factor of only 2 is considered not very exciting.

If relax some of the assumptions, then can beat heap sort and O( n ln(n) ): Use radix sort. That was the sorting algorithm of the old punch card sorting machines. E.g., if have 100 billion records to sort where each key is an alpha-numeric string 10 characters long, then radix sort does all the work in just 10 passes of the 100 billion records. Or for keys m characters long, can sort n records in O( nm ). So that's faster when m < ln(n). Or, to make the arithmetic easier, m < log(n). But log(100 billion) = 12, so radix sort should be faster with m = 10 and n = 100 billion.

So, for a hybrid sort routine, consider also radix sort.

Am I excited about radix sort? Nope!

> But, worst case of Quicksort is still O( n^2 ) so what happens if Quicksort encounters such a partition?

I'd strongly suggest you to read the readme I wrote for https://github.com/orlp/pdqsort, which explains in detail how I beat heapsort (and other sorting algorithms). It also shows how I prevent the O(n^2) worst case.

> I used to get torqued at considering big-O, and did this just because it ignores constants; as a programmer, I worked hard to do well with such constants. But in the end I gave in and gave up and just accepted big-O notation as the most appropriate, simple, single criterion.

We have already established that O(n log n) is the lower bound, and we have reached this lower bound. This is where big O stops being useful. A metric that compares equal for every relevant algorithm (mergesort, introsort, pdqsort, timsort, smoothsort, heapsort) is not a useful metric.

> In the land of big-O, a factor of only 2 is considered not very exciting.

I live in the land of real-world performance, powered by actual benchmarks. In this land a factor of 2 is very exciting.

> So, for a hybrid sort routine, consider also radix sort.

I worked in the scope of std::sort, which is strictly a comparison sort.