decorate-sort-undecorate in Haskell Wednesday, 24th June, 2009

Real World Haskell

Chapter 3: Defining types, streamlining functions

Section: End of chapter exercises

Exercise 6: Create a function that sorts a list of lists based on the length of each sublist. (You may want to look at the sortBy function from the Data.List module.)

Answering the question

The immediate exercise threw up enough problems to make it worthwhile. The functions sort and sortBy are in Data.List . sortBy takes a comparison function (and sort is just sortBy compare ). So, here was my direct answer to the exercise:

import Data.List longer :: [a] -> [a] -> Ordering longer a b = compare (length a) (length b) sortByLength :: [[a]] -> [[a]] sortByLength a = sortBy longer a

re type signatures: I knew that longer would have to return the same type as compare:

*Main> :type compare compare :: (Ord a) => a -> a -> Ordering

Decorate-sort-undecorate

That isn’t how I would write this kind of sort in Python (my main language). For a sort with any kind of custom comparison function, I would use the decorate-sort-undecorate (dsu) algorithm. Here’s a verbose implementation:

>>> def dsu(func, xs): ... dec_xs = [(func(x), x) for x in xs] ... dec_xs.sort() ... undec_xs = [x[1] for x in dec_xs] ... return undec_xs ... >>> a = [[9], [1, 1, 1, 1, 1], [2, 2, 2]] >>> dsu(len, a) [[9], [2, 2, 2], [1, 1, 1, 1, 1]] >>> dsu(sum, a) [[1, 1, 1, 1, 1], [2, 2, 2], [9]]

Passing the comparison function to sort() directly will result in it being called every time two elements from the list are compared. With dsu, the comparison function is called once for each item on the list, then sort() can just use less than.

Here’s my first naive and verbose dsu sort in Haskell:

dsu :: (Ord a, Ord b) => (b -> a) -> [b] -> [b] dsu decFunc a = undecorate (sort (decorate decFunc a)) decorate :: (t -> t1) -> [t] -> [(t1, t)] decorate decFunc [] = [] decorate decFunc (x:xs) = ( ((decFunc x), x) : decorate decFunc xs ) undecorate :: [(a,b)] -> [b] undecorate [] = [] undecorate ( (_, y) : xs) = ( y : undecorate xs )

Trying it out with length and sum:

*Main> dsu length a [[9],[2,2,2],[1,1,1,1,1]] *Main> dsu sum a [[1,1,1,1,1],[2,2,2],[9]]

A comment by gerg on the RWH website gave the following implementation of sort by length:

sortByLength :: (Ord a) => [[a]] -> [[a]] sortByLength xss = map snd (sort (zip (map length xss) xss))

This is much terser and more apparently Haskellian than my sortByLength , and it already almost a general dsu. With one small change, and a type signature, we have:

dsu :: (Ord a, Ord a1) => (a1 -> a) -> [a1] -> [a1] dsu decFunc a = map snd (sort (zip (map decFunc a) a))

Type signatures

I must confess that I didn’t work out all those type signatures all by myself. I used Haskell’s type inference to work it out for me. Write the function without a type signature, then ask ghci what type it is:

Prelude> import Data.List Prelude Data.List> let dsu decFunc a = map snd (sort (zip (map decFunc a) a)) Prelude Data.List> :type dsu dsu :: (Ord a, Ord a1) => (a1 -> a) -> [a1] -> [a1]

Remember there’s no difference between a, b and t. They are not types of types, just more or less random letters used by different parts of ghci’s type inference mechanism.

Note that functions seem to be of type (a -> b).

Further work

I asked about decorate-sort-undecorate on Haskell-Beginners and received very helpful, thorough and friendly reponses. Some of their implementations of dsu used Haskell syntax I haven’t come across yet. I’d like to look into this new syntax further and write it up, but that can be another story for another day.