Sorting algorithms Heapsort

With its time complexity of O ( n log( n )) heapsort is optimal. It utilizes a special data structure called heap. This data structure is explained in the following.

Definition: Let T = ( V , E ) an almost complete binary tree with a vertex labelling a : V M that assigns to each vertex u a label a ( u ) from an ordered set ( M , ). A vertex u V has the heap property if it has no direct descendant with a greater label, i.e. v V : ( u , v ) E a ( u ) a ( v ) T is a heap if all vertices have the heap property, i.e. ( u , v ) E : a ( u ) a ( v ) We call T a semi-heap if all vertices except possibly the root r have the heap property, i.e. ( u , v ) E , u ≠ r : a ( u ) a ( v )

Example: Figure 1: Heap with n = 10 vertices

Observe that each leaf automatically has the heap property regardless of its label, since it has no descendants.

The data structure of the heapsort algorithm is a heap. The data sequence to be sorted is stored as the labels of the binary tree. As shown later, in the implementation no pointer structures are necessary to represent the tree, since an almost complete binary tree can be efficently stored in an array.

Heapsort algorithm

The following description of heapsort refers to Figure 2 (a) - (e).

(a) (b) (c)

(d) (e) Figure 2: Retrieving the maximum element and restoring the heap

If the sequence to be sorted is arranged as a heap, the greatest element of the sequence can be retrieved immediately from the root (a). In order to get the next-greatest element, the rest of the elements have to be rearranged as a heap.

The rearrangement is done in the following way:

Let b be a leaf of maximum depth. Write the label of b to the root and delete leaf b (b). Now the tree is a semi-heap, since the root possibly has lost its heap property.

Making a heap from a semi-heap is simple:

Do nothing if the root already has the heap property, otherwise exchange its label with the maximum label of its direct descendants (c). Let this descendant be v . Now possibly v has lost its heap property. Proceed further with v , i.e. make a heap from the semi-heap rooted at v (d). This process stops when a vertex is reached that has the heap property (e). Eventually this is the case at a leaf.

Making a heap from a semi-heap can conceptionally be implemented by the following procedure downheap:

procedure downheap (v) Input: semi-heap with root v Output: heap (by rearranging the vertex labels) Method: while v does not have the heap property do choose direct descendant w with maximum label a ( w ) exchange a ( v ) and a ( w ) set v := w

Procedure downheap can be used to build a heap from an arbitrarily labelled tree. By proceeding bottom-up, downheap is called for all subtrees rooted at inner vertices. Since leaves are already heaps they may be omitted.

procedure buildheap Input: almost complete binary tree T of depth d ( T ) with vertex labelling a Output: heap (by rearranging the vertex labels) Method: for i := d ( T ) – 1 downto 0 do for all inner vertices v of depth d ( v ) = i do downheap (v)

A call of buildheap is the first step of procedure heapsort, which can now be written down as follows:

procedure heapsort Input: almost complete binary tree with root r and vertex labelling a Output: vertex labels in descending order Method: buildheap while r is not a leaf do output a ( r ) choose leaf b of maximum depth write label a ( b ) to r delete leaf b downheap (r) output a ( r )

This simulation illustrates the process of building a heap from an arbitrarily labelled tree.

(Java applet for simulation of function buildheap)

An almost complete binary tree with n vertices has a depth of at most log( n ). Therefore, procedure downheap requires at most log( n ) steps. Procedure buildheap calls downheap for each vertex, therefore it requires at most n ·log( n ) steps. Heapsort calls buildheap once; then it calls downheap for each vertex, together it requires at most 2· n ·log( n ) steps.

Thus, the time complexity of heapsort is T ( n ) O ( n ·log( n )). The algorithm is optimal, since the lower bound of the sorting problem is attained.

An almost complete binary tree with n vertices and vertex labelling a can be stored most efficiently in an array a :

the root is stored at position 0

the two direct descendants of a vertex at position v are stored at positions 2 v +1 and 2 v +2

All vertices at positions 0, ..., n /2-1 are inner nodes, all vertices n /2, ..., n -1 are leaves (integer division / ). The heap with n = 10 vertices of Figure 1 is shown in Figure 3 as an example.

Figure 3: Array representation of the heap of Figure 1

Using this array representation of the heap, heapsort can be implemented as an in-place sorting algorithm. In each step of heapsort, the root label a ( r ) is not output but stored at the position of the leaf b that is deleted in the following. Deleting leaf b means to consider just the array elements left of b as the remaining heap.

In other words: the four steps

output of the label of the root a ( r )

( ) choose leaf b of maximum depth

of maximum depth write label of b to the root

to the root delete leaf b

are replaced by an exchange of the root label and the label of b :

exchange the label of the root with the label of the last leaf and do not consider that leaf anymore

The following Java class HeapSorter encapsulates the functions downheap, buildheap and heapsort. In order to sort an array b , Heapsort is called with the statement HeapSorter.sort(b).

public class HeapSorter { private static int [] a; private static int n; public static void sort( int [] a0) { a=a0; n=a.length; heapsort(); } private static void heapsort() { buildheap(); while (n>1) { n--; exchange (0, n); downheap (0); } } private static void buildheap() { for ( int v=n/2-1; v>=0; v--) downheap (v); } private static void downheap( int v) { int w=2*v+1; while (w<n) { if (w+1<n) if (a[w+1]>a[w]) w++; if (a[v]>=a[w]) return ; exchange(v, w); v=w; w=2*v+1; } } private static void exchange( int i, int j) { int t=a[i]; a[i]=a[j]; a[j]=t; } }

With its time complexity of O ( n log( n )) heapsort is optimal. Unlike mergesort, heapsort requires no extra space. On the other hand, heapsort is not stable. A sorting algorithm is stable, if it leaves the order of equal elements unchanged.

The heap data structure can also be used for an efficient implementation of a priority queue. A priority queue is an abstract list data structure with the operations insert and extractMax. With each element in the list a certain priority is associated. By insert an element together with its priority is inserted into the list. By extractMax the element with the highest priority is extracted from the list. Using a heap, both operations can be implemented in time O (log( n )).

Heapsort was originally published by J.W.J. Williams as Algorithm 232 in the journal Communications of the ACM [Wil 64].