5.6. Heapsort

  • File: Heaps.ml (continued)

Let us now exploit the ability of a max-heap to always keep the element with the largest key at the beginning, as well as being able to restore a heap from an “almost-heap” (i.e., the one that only has one offending triple) in \(\Theta(n \log n)\), for construct a very efficient sorting algorithm — Heapsort.

Heapsort starts by turning an arbitrary array into a max-heap (by means of build_max_heap). It then repeatedly takes the first element and swaps it with the “working” element in the tail, building a sorted array backwards. After each swap, it shifts the “front” of what is considered to be a heap (i.e., the mentioned above heap_size), and what is an already sorted array suffix, and restores the heap structure up to this front. The following code the final addition to the Heaps functor:

let heapsort arr =
  let len = Array.length arr in
  let heap_size = ref len in
  build_max_heap arr;
  for i = len - 1 downto 1 do
    swap arr 0 i;
    heap_size := !heap_size - 1;
    max_heapify !heap_size arr 0;
  done

5.6.1. Heapsort Complexity

The main bulk of complexity is taken by build_max_heap arr (which results in \(O(n \log n)\)) and in running the loop. Since the loop runs \(n/2\) iteration, and each reconstruction of the tree takes \(O(\log n)\) swaps, the overall complexity of Heapsort is \(O(n \log n)\).

5.6.2. Evaluating Heapsort

We can now use our checked to make sure that it indeed delivers sorted arrays:

module Checker = SortChecker(KV)

let c = generate_key_value_array 1000
let d = Array.copy c

The following are the results of the experiment:

# heapsort d;;
- : unit = ()
# Checker.sorted_spec c d;;
- : bool = true

5.6.3. Which sorting algorithm to choose?

By now we have seen three linearithmic sorting algorithms: merge sort, Quicksort and Heapsort. The first two achieve efficiency via divide-and-conquer strategy (structuring the computations in a tree). The last one exploits the properties of a maintained data structures (i.e., a heap), which also coincidentally turns out to be a tree.

It would be interesting to compare the relative performance of the three implementations we have, by running them on three copies of the same array:

let x = generate_key_value_array 100000
let y = Array.copy x
let z = Array.copy x

let quicksort = kv_sort_asc

Let us now time the executions:

# time heapsort x;;
Execution elapsed time: 0.511102 sec
- : unit = ()
# time quicksort y;;
Execution elapsed time: 0.145787 sec
- : unit = ()
# time merge_sort z;;
Execution elapsed time: 0.148201 sec
- : unit = ()

We can repeat an experiment on a larger array (e.g., \(10^6\) elements):

# time heapsort x;;
Execution elapsed time: 6.943117 sec
- : unit = ()
# time quicksort y;;
Execution elapsed time: 2.049979 sec
- : unit = ()
# time merge_sort z;;
Execution elapsed time: 2.192766 sec
- : unit = ()

As we can see, the relative performance of the three algorithms remains the same, with our implementation of heapsort being about 3.5 slower than both quicksort and merge_sort.

The reason why quicksort beats heapsort by a constant factor is because is almost doesn’t perform “unnecessary” element swapping, which is time consuming. In contrast, even if all of the array is already ordered, heapsort is going to swap all of them in order to make a heap structure.

However, on an almost-sorted array, heapsort will perform significantly better than quicksort and, unlike merge_sort, it will not require extra memory:

# let x = generate_key_value_array 10000;;
...
# time quicksort x;;
Execution elapsed time: 0.014825 sec
- : unit = ()
# time quicksort x;;
Execution elapsed time: 3.650797 sec
- : unit = ()
# time heapsort x;;
Execution elapsed time: 0.044624 sec
- : unit = ()