5.5. Maintaining Binary Heaps

  • File: Heaps.ml (continued)

Let us now fix the broken heap bad_heap by restoring an order in it. As we can see, the issue there is between the parent (10, "c") and a left child (11, "f") that are out of order.

5.5.1. “Heapifying” elements of an array

What we need to do is to swap the offending parent with the children (assuming that both subtrees reachable from the children obey the descending order), and also make sure that the swapped element (10, "c") “sinks down”, finding its correct position in a reachable subtree. This procedure of “sinking” is what is implemented by the most important heap-manipulating function shown below:

(* 3. Restoring the heap property for an element i *)
let rec max_heapify heap_size arr i =
  let len = Array.length arr in
  assert (heap_size <= Array.length arr);
  if i > (len - 1) / 2 || i >= heap_size then ()
  else
    let ai = arr.(i) in
    let largest = ref (i, arr.(i)) in
    let l = left arr i in

    (* Shall we swap with the left child?.. *)
    if l <> None &&
       (fst (get_exn l)) < heap_size &&
       comp (snd (get_exn l)) (snd !largest) > 0
    then largest := get_exn l;

    (* May be the right child is even bigger? *)
    let r = right arr i in
    if r <> None &&
       (fst (get_exn r)) < heap_size &&
       comp (snd (get_exn r)) (snd !largest) > 0
    then largest := get_exn r;

    if !largest <> (i, ai)
    (* Okay, there is a necessity to progress further... *)
    then begin
      swap arr i (fst !largest);
      max_heapify heap_size arr (fst !largest)
    end

The implementation of max_heapify deserves some attention, as it is not entirely trivial. It takes three arguments, an integer heap_size (whose role will be explained shortly), and array arr representing the heap, and an index i of a parent element of an offending triple.

The heap_size serves the purpose of “limiting the scope” of a heap in an array and is always assumed to be less or equal than the array size. The reason why one might need it is because in some applications (as we will soon see), it is convenient to consider only a certain prefix of an array as a heap (and, thus obeying the heap definition), while the remaining suffix does not to be a part of it. One can, therefore, think of heap_size as of a “separator” between the heap-y and a non-heapy parts of an array.

The body of max_heapify is rather straightforward. It first assumes that the element at the position arr.(i) is the largest one. It then tries to retrieve its both children (if those are within the array size and heap size ranges), and determine the largest of them. If such one is present, it becomes the new parent, swapping with previous one. However, such a swap might have broken the heap-property in one of the subtrees, so the procedure needs to be repeated. Hence, the operation happens recursively for the new child (which used to be a parent, and now, after the swap, resides at the position !larger).

Question: Why does max_heapify terminate?

Let us now restore the heap using the max_heapify procedure:

let bad_heap =
   [|(16, "a"); (14, "b"); (9, "c"); (8, "d"); (7, "e"); (11, "f"); (3, "g");
   (2, "h"); (4, "i"); (1, "j"); (1, "k"); (10, "l"); (6, "m")|];;

# open KVHeaps;;
# is_heap bad_heap;;
- : bool = false

# is_heap_print ~print:true bad_heap;;
Out-of-order elements:
Parent: (2, (9, c))
Left: (5, (11, f))
Right: (6, (3, g))
- : bool = false

# max_heapify 13 bad_heap 2;;
- : unit = ()

# is_heap_print ~print:true bad_heap;;
- : bool = true

# bad_heap;;
- : (int * string) array =
[|(16, "a"); (14, "b"); (11, "f"); (8, "d"); (7, "e"); (10, "l"); (3, "g");
  (2, "h"); (4, "i"); (1, "j"); (1, "k"); (9, "c"); (6, "m")|]

As we can observe the two elements have now been correctly swapped.

5.5.2. Complexity of heapify

The maximal number of steps required to reach a child in a tree is called a height of a tree. Notice that the way max_heapify “walks” and array is by taking left/right child of an element. This way, it will make at most \(\log_2 n\) steps (which is the height of a heap). That is, the max_heapify procedure will terminate very quickly.

5.5.3. Building a heap from an array

We can now use max_heapify iteratively to turn an arbitrary array into a max-heap. The following code should be added to the Heap functor:

let build_max_heap arr =
  let len = Array.length arr in
  for i = (len - 1) / 2 downto 0 do
    max_heapify len arr i
  done

Question: Why does the for-loop start only from i = (len - 1) / 2, not from len - 1?

The complexity of build_max_heap can be over-approximated by analysing the complexity of each iteration of the while-loop, and the number of the iteration it makes.

Why does this procedure deliver a heap? This can be established by the following invariant, which we state in plain English (implementing it is a home exercise):

Invariant

At the start of each iteration of the for-loop in build_max_heap, each node i + 1, i + 2, len - 1 is a root of a max-heap.

Question: Why does this invariant holds for the elements from the second half of the array?

Question: What happens if we start building the heap from the beginning of the array, moving right. How correctness and performance will be affected? Justify your answer by talking about loop invariants.

We can test our procedure on some random arrays:

# let a = generate_key_value_array 10;;
val a : (int * string) array =
  [|(6, "ktesl"); (9, "herli"); (7, "etqiz"); (4, "wrnqu"); (3, "ceojd");
    (2, "cklpw"); (2, "mvcme"); (7, "uowmp"); (5, "yeuzq"); (4, "yuzdw")|]
# KVHeaps.build_max_heap a;;
- : unit = ()
# a;;
- : (int * string) array =
[|(9, "herli"); (7, "uowmp"); (7, "etqiz"); (6, "ktesl"); (4, "yuzdw");
  (2, "cklpw"); (2, "mvcme"); (4, "wrnqu"); (5, "yeuzq"); (3, "ceojd")|]
# is_heap a;;
- : bool = true