September 05, 2016

nullprogram.com/blog/2016/09/05/

The C standard library includes a qsort() function for sorting arbitrary buffers given a comparator function. The name comes from its original Unix implementation, “quicker sort,” a variation of the well-known quicksort algorithm. The C standard doesn’t specify an algorithm, except to say that it may be unstable (C99 §7.20.5.2¶4) — equal elements have an unspecified order. As such, different C libraries use different algorithms, and even when using the same algorithm they make different implementation trade-offs.

I added a drawing routine to a comparison function to see what the sort function was doing for different C libraries. Every time it’s called for a comparison, it writes out a snapshot of the array as a Netpbm PPM image. It’s easy to turn concatenated PPMs into a GIF or video. Here’s my code if you want to try it yourself:

Adjust the parameters at the top to taste. Rather than call rand() in the standard library, I included xorshift64star() with a hard-coded seed so that the array will be shuffled exactly the same across all platforms. This makes for a better comparison.

To get an optimized GIF on unix-like systems, run it like so. (Microsoft’s UCRT currently has serious bugs with pipes, so it was run differently in that case.)

./a.out | convert -delay 10 ppm:- gif:- | gifsicle -O3 > sort.gif

The number of animation frames reflects the efficiency of the sort, but this isn’t really a benchmark. The input array is fully shuffled, and real data often not. For a benchmark, have a look at a libc qsort() shootout of sorts instead.

To help you follow along, clicking on any animation will restart it.

glibc

Sorted in 307 frames. glibc prefers to use mergesort, which, unlike quicksort, isn’t an in-place algorithm, so it has to allocate memory. That allocation could fail for huge arrays, and, since qsort() can’t fail, it uses quicksort as a backup. You can really see the mergesort in action: changes are made that we cannot see until later, when it’s copied back into the original array.

dietlibc (0.32)

Sorted in 503 frames. dietlibc is an alternative C standard library for Linux. It’s optimized for size, which shows through its slower performance. It looks like a quicksort that always chooses the last element as the pivot.

Update: Felix von Leitner, the primary author of dietlibc, has alerted me that, as of version 0.33, it now chooses a random pivot. This comment from the source describes it:

We chose the rightmost element in the array to be sorted as pivot, which is OK if the data is random, but which is horrible if the data is already sorted. Try to improve by exchanging it with a random other pivot.

musl

Sort in 637 frames. musl libc is another alternative C standard library for Linux. It’s my personal preference when I statically link Linux binaries. Its qsort() looks a lot like a heapsort, and with some research I see it’s actually smoothsort, a heapsort variant.

BSD

Sorted in 354 frames. I ran it on both OpenBSD and FreeBSD with identical results, so, unsurprisingly, they share an implementation. It’s quicksort, and what’s neat about it is at the beginning you can see it searching for a median for use as the pivot. This helps avoid the O(n^2) worst case.

BSD also includes a mergesort() with the same prototype, except with an int return for reporting failures. This one sorted in 247 frames. Like glibc before, there’s some behind-the-scenes that isn’t captured. But even more, notice how the markers disappear during the merge? It’s running the comparator against copies, stored outside the original array. Sneaky!

Again, BSD also includes heapsort(), so ran that too. It sorted in 418 frames. It definitely looks like a heapsort, and the worse performance is similar to musl. It seems heapsort is a poor fit for this data.

Cygwin

It turns out Cygwin borrowed its qsort() from BSD. It’s pixel identical to the above. I hadn’t noticed until I looked at the frame counts.

MSVCRT.DLL (MinGW) and UCRT (Visual Studio)

MinGW builds against MSVCRT.DLL, found on every Windows system despite its unofficial status. Until recently Microsoft didn’t include a C standard library as part of the OS, but that changed with their Universal CRT (UCRT) announcement. I thought I’d try them both.

Turns out they borrowed their old qsort() for the UCRT, and the result is the same: sorted in 417 frames. It chooses a pivot from the median of the ends and the middle, swaps the pivot to the middle, then partitions. Looking to the middle for the pivot makes sorting pre-sorted arrays much more efficient.

Pelles C

Finally I ran it against Pelles C, a C compiler for Windows. It sorted in 463 frames. I can’t find any information about it, but it looks like some sort of hybrid between quicksort and insertion sort. Like BSD qsort(), it finds a good median for the pivot, partitions the elements, and if a partition is small enough, it switches to insertion sort. This should behave well on mostly-sorted arrays, but poorly on well-shuffled arrays (like this one).

More Implementations

That’s everything that was readily accessible to me. If you can run it against something new, I’m certainly interested in seeing more implementations.