Using Mutable Arrays for Faster Sorting February 19, 2009

Various people have tried improving the Data.List.sort function in Haskell, see Data.List.sort, not so good? and Sorting At Speed. In this post, I will also attempt improving sort performance. I will use a simple algorithm:

Below we will see why converting a linked list to an array improves sorting performance.

Linked lists versus mutable arrays

I have depicted a linked list graphically below. Data constructors are shown as boxes, pointers as arrows, and elements as ellipses. As seen we need two pointers for each element. As Haskell’s linked lists are immutable we must recreate the list if we want to change one or more elements.

A Haskell mutable array is depicted below. We only use one pointer per element and the lookup operation and the update operation are both O(1).

Mergesort

Mergesort for immutable linked lists reallocates (recreates) a new list for each traversal. That is, it recreates the list for each time it has visited all elements once. There are log(n) traversals. All these memory allocations and latter cleanups by the garbage collector takes time. See Sorting At Speed for GHC’s implementation of mergesort.

Mergesort, using mutable arrays, do not need to recreate the list for each traversal. It uses two list operations, update and lookup, which are both O(1). Like mergesort for immutable linked lists, mergesort for mutable arrays still needs to traverse the list log(n) times.

The use of one pointer per element and no need to reallocate the list for each traversal, makes it seem likely that converting a linked list to a mutable array, sorting and converting back again, may be faster than sorting a linked list directly.

Benchmarks

Up until now it has all been theory and I will need to do measurements to back up the claim, that we can sort faster by converting to a mutable array. I do not know all the idiosyncrasies of modern hardware or which optimizations GHC may or may not do. There may be GHC optimizations, which makes immutable linked lists a lot faster than I imagine.

I have therefore implemented a mergesort using mutable arrays. I have used this program to benchmark my implementation. The program was compiled with GHC 6.10.1 using the optimization flag -O2. And I ran the benchmark program as: “./Benchmark +RTS -K32388608 -RTS >/dev/null”. I have tested both with lists of Ints and list of strings.

The benchmark program calls System.Mem.performGC before and after sorting. The last call to System.Mem.performGC is measured as part of the sorting. In this way I am internalizing the cost of garbage collection into the sort function. I use this in stead of measuring memory use directly. The the first call is done, so that we start with a clean sheet before each measurement.

Int sort results Iterations Number of elements Data.List.sort (ms) Array sort (ms) Speedup (%) 25000 10 264 272 -4 50000 10 684 572 16 100000 10 2396 2216 7 25000 20 1136 1056 7 10000 40 916 808 11 10000 50 1184 1016 14 10000 75 1872 1548 17 5000 100 1288 1048 18 5000 200 2912 2420 16 2000 400 2492 2024 18 2000 600 4000 3184 20 2000 800 5624 4276 23 1000 1000 3656 2744 24 20 10000 2276 1592 30 5 100000 8844 4260 51 2 1000000 62559 24677 60

String sort results Iterations Number of elements Data.List.sort (ms) Array sort (ms) Speedup (%) 10000 10 304 312 -3 20000 10 604 624 -4 40000 10 1216 1252 -3 10000 20 524 504 3 5000 50 632 544 13 5000 100 1320 1072 18 2000 200 1156 980 15 1000 400 1236 1008 18 750 600 1504 1184 21 500 800 1404 1080 23 500 1000 1828 1404 23 10 10000 992 716 27 5 100000 8860 4296 51

As can be seen, the speedups for large lists are quite good, but not for small lists. The speedup is even slighty negative for some small lists. That the benefits are better for large lists is not surprising, when considering that the cost of converting to and from arrays is O(n), whereas the gains are O(n * log(n)).

Sorting Ints shows better speedup than sorting strings. I speculate, that this is due to strings being more expensive to compare than Ints. As both Data.List.sort and my own sorting algorithm uses the same number of comparisons, the actual compare operations are equally expensive for both algorithms. When a constant part, the compare operations, is larger, then the (percentage) speedup gets smaller.

Taking the beauty out of functional programming?

Some might argue that using mutable arrays removes the beauty from functional programming – and they would be right. I do not see a problem though, as the nasty mutable code can be hidden behind a nice functional interface. Furthermore, a sorting algorithm only needs to be implemented once, but can be used man times.

Future Improvements

Although my sorting algorithm is, in most cases, faster than the stock GHC one, there is still room for improvement.

As shown, we are converting a linked list to an array, and back to a linked list again. As we are converting to an array, we might as well convert to an array of unboxed types in stead. Using unboxed types would avoid one pointer per element. Initial experiments shows significant speedups, as compared to my current sorting algorithm. Of cause, this only works for types which can be unboxed, which fortunately includes popular types such as Int, Double and Float. Using rewrite rules we can even hide the choice of using unboxed types from the user of the sorting algorithm.

Haskell has other types of arrays, such as Parallel arrays (module GHC.PArr) which is claimed to be fast. Choosing another array implementations may improve sorting performance.

Lastly, as Don Stewart explains in his blogpost “A Journal of Haskell Programming Write Haskell as fast as C: exploiting strictness, laziness and recursion” we may use GHC core to gain better performance.