12.1. Representing Sets via Binary Search Trees¶
https://github.com/ilyasergey/ysc2229-part-two/blob/master/lib/week_12_BST.ml
Binary search trees (BST) are one of the most versatile representations of mutable sets, supporting a variety of operations, such as insertion, deletion, checking membership, finding minimums, maximums, predecessors and successors.
The key to this expressivity is the invariant of a tree-shaped data structure, that mandates that, for any node,
- any element in its left subtree is less or equal than the element in the node itself, and
- any element in its right subtree is greater or equal than the element in the node itself.
This invariant is maintained by all operations that modify the set, and is relied upon by all queries.
12.1.1. A Data Structure for Binary-Search Trees¶
We start by defining a linked in-memory data structure for BSTs in its own module.
The tree is populated by nodes, each of which carries a value (immutable), and also maintains references (mutable) to its left/right children and a parent, which can be either absent or some other nodes:
open Week_01
open Week_06
module BinarySearchTree = struct
type 'e tree_node = {
value : 'e;
parent : 'e tree_node option ref;
left : 'e tree_node option ref;
right : 'e tree_node option ref;
}
type 'e tree = {
root : 'e tree_node option ref;
size : int ref
}
(* More definitions coming here *)
end
For convenience, we define several operations to dereference various components of a tree and its nodes:
let left n = !(n.left)
let right n = !(n.right)
let parent n = !(n.parent)
let get_root t = !(t.root)
let get_size t = !(t.size)
let mk_node e =
{value = e;
parent = ref None;
left = ref None;
right = ref None}
let mk_tree _ = {root = ref None; size = ref 0}
Finally, since nodes are represented by an option
type, we introduce the following combinator, simplifying working with option
-wrapped values:
let map_option o f z = match o with
| None -> z
| Some n -> f n
In words map_option
returns applies f
to the value n
within o
, if o = Some n
, or returns z
otherwise.
12.1.2. Inserting an element into a BST¶
The defined above mk_tree
function creates an empty tree. Let us now implement a procedure for populating it with elements by inserting them one by one:
let insert t e =
let rec insert_element n e =
let m = mk_node e in
if e < n.value
then match left n with
| Some m -> insert_element m e
| None ->
m.parent := Some n;
n.left := Some m;
true
else if e > n.value
then match right n with
| Some m -> insert_element m e
| None ->
m.parent := Some n;
n.right := Some m;
true
else false
in
match !(t.root) with
| None -> (
t.root := Some (mk_node e);
t.size := 1;
true)
| Some n ->
if insert_element n e
then (t.size := !(t.size) + 1; true)
else false
Notice that the main working routine insert_element
is careful with respect to the BST property defined above: it positions the node m
with the element e
, so it would be in the right subtree (smaller-left/greater-right) with respect to its parent nodes. Finally, insert_element
returns a boolean to indicate whether the element has been indeed added (true
) or ignored as duplicated (false
). In the former case the size of the tree is increased, in the latter it remains the same.
12.1.3. Binary-Search-Tree Invariant¶
Let us now assert tree-manipulating operations such as insert
indeed preserve the BST property. For this, let us define the BST invariant in the form of the following function:
let check_bst_inv t =
let rec walk node p =
(p node.value) &&
let res_left = match left node with
| None -> true
| Some l -> walk l (fun w -> p w && w <= node.value)
in
let res_right = match right node with
| None -> true
| Some r -> walk r (fun w -> p w && w >= node.value)
in
res_left && res_right
in
match !(t.root) with
| None -> true
| Some n -> walk n (fun _ -> true)
The main recursive sub-function wal
works by “growing” a predicate p
that applies to each node further down the tree, making sure that it is correctly positioned with regard to all its parents. At the top level p
is instantiated with (fun _ -> true)
, as there are no restrictions imposed for the root of the tree, but more and more conjuncts added, as the checking proceeds recursively.
12.1.4. Testing Tree Operations¶
https://github.com/ilyasergey/ysc2229-part-two/blob/master/lib/week_11_Tests.ml
Let us put or invariant to work by using it to test the correctness of insert
.
We do so by frist defining a function for generating random trees from random arrays via insertion:
open Week_01
open Week_03
open Week_11_BinaryTree
open BinarySearchTree
let mk_tree_of_size n =
let t = mk_tree () in
let a = generate_key_value_array n in
for i = 0 to n - 1 do
insert t a.(i)
done;
t
Next, we check that the generated trees indeed satisfy the BST property:
let%test "Testing insertion" =
let n = 1000 in
let t = mk_tree_of_size n in
check_bst_inv t
12.1.5. Printing a Tree¶
It would be very nice if we could not only test but also visualise our binary search trees.
Unfortunately, printing a tree in a standard top-down fashion requires quite a bit of book-keeping of tree-specific information (implementation of a this procedure in a particular case is left to you as a homework assignment). Printing a tree left-to-right is, however, can be doen quite easily as follows:
let print_tree pp snum t =
let print_node_with_spaces l s =
for i = 0 to s - 1 do
Printf.printf " "
done;
print_endline (pp l.value);
in
let rec walk s node = match node with
| None -> ()
| Some n -> begin
walk (s + snum) (right n);
print_node_with_spaces n s;
walk (s + snum) (left n);
end
in
map_option (get_root t) (fun n -> walk 0 (Some n)) ()
The first auxiliary function print_node_with_spaces
Prints a string of s
spaces and the value of a node l
.
The second function walk
traverses the tree recursively, accumulating the “offset” proportionally to the depth of the tree node. It first prints the right sub-tree, then the node itself and then the left sub-tree, making use of the accumulated offset for printing the necessary number of spaces. Finally, it runs walk
for the top-level root node, if it exists.
Let us observe the effect of print_tree
by instantiating it to print trees of key-value pairs:
let print_kv_tree = print_tree
(fun (k, v) -> Printf.sprintf "(%d, %s)" k v) 12
We can now use utop
to experiment with it:
utop # open Week_11_BinaryTree;;
utop # open BinarySearchTree;;
utop # let t = mk_tree ();;
val t : '_weak1 tree = {root = {contents = None}}
utop # let a = Week_03.generate_key_value_array 10;;
val a : (int * string) array =
[|(4, "ayuys"); (7, "cdrhf"); (4, "ukobi"); (5, "hwsjs"); (8, "uyrla");
(0, "uldju"); (3, "rkolw"); (7, "gnzzo"); (7, "nksfe"); (4, "geevu")|]
utop # for i = 0 to 9 do insert t a.(i) done;;
- : unit = ()
utop # print_kv_tree t;;
(8, uyrla)
(7, nksfe)
(7, gnzzo)
(7, cdrhf)
(5, hwsjs)
(4, ukobi)
(4, geevu)
(4, ayuys)
(3, rkolw)
(0, uldju)
- : unit = ()
That is, on can see that (4, "ayuys")
is the root of the tree, and the whole structure satisfies the BST property.
12.1.6. Searching Elements¶
We define the search
function so it would return not just the element, but also the node that contains it. It does so by recursively traversing the tree, while relying on its BST property:
let search t k =
let rec walk k n =
let nk = n.value in
if k = nk then Some n
else if k < nk
then match left n with
| None -> None
| Some l -> walk k l
else match right n with
| None -> None
| Some r -> walk k r
in
map_option (get_root t) (walk k) None
In the absence of the abstract module signature, it is quite dangerous to return a node (node just its value), as one can break the BST properties, by checking its mutable components. However, returning a node also simplifies the implementation of various testing and manipulation procedures, specifically, deletion of tree nodes.
12.1.7. Tree Traversals¶
There are multiple ways to flatten a tree into a list, which can be convenient for the sake of testing and other inspections.
The simples way to do it is via an accumulator (implemented as a mutable queue) and a procedure, known as Depth-First-Search (DFS), which traverses the tree recursively, following its shape:
open DLLBasedQueue
let depth_first_search_rec t =
let rec walk q n =
enqueue q n.value;
(match left n with
| Some l -> walk q l
| None -> ());
(match right n with
| Some r -> walk q r
| None -> ());
in
let acc = (mk_queue 0) in
map_option (get_root t) (walk acc) ();
queue_to_list acc
Keeping in mind the correspondence between implicit call stack and explicit call stack, we can rewrite this procedure without relying on recursion, but using an explicit stack instead:
let depth_first_search_loop t =
let open ListBasedStack in
let loop stack q =
while not (is_empty stack) do
let n = get_exn @@ pop stack in
enqueue q n.value;
(match right n with
| Some r -> push stack r
| _ -> ());
(match left n with
| Some l -> push stack l
| _ -> ());
done
in
let acc = (mk_queue 0) in
let stack = mk_stack 0 in
(match get_root t with
| None -> ()
| Some n -> begin
push stack n;
loop stack acc;
end);
queue_to_list acc
With the stack (implicit or explicit), DFS traverses the tree in a Last-In-First-Out mode (LIFO). By replacing the stack with a mutable queue (First-In-First-Out, FIFO), we can obtain an alternative traversal, known as Breadth-First-Search (BFS), so it would accumulate tree elements by following its “layers”:
let breadth_first_search_loop t =
let loop wlist q depth =
while not (is_empty wlist) do
let n = get_exn @@ dequeue wlist in
enqueue q n.value;
(match left n with
| Some l -> enqueue wlist l
| _ -> ());
(match right n with
| Some r -> enqueue wlist r
| _ -> ());
done
in
let acc = (mk_queue 0) in
let wlist = mk_queue 0 in
(match get_root t with
| None -> ()
| Some n -> begin
enqueue wlist n;
loop wlist acc 0;
end);
queue_to_list acc
Notice that the code of depth_first_search_loop
and breadth_first_search_loop
is almost identical, modulo the used container data structure and its operations (e.g., enqueue
/push
and dequeue
/pop
).
We can also define all elements of the set in terms of the traversal:
let elements t = breadth_first_search_loop t
12.1.8. Testing Element Retrieval and Tree Traversals¶
As we know well how to work with lists, we can use traversals to test each other, as well as the search
function:
(******************************************)
(* Testing traversals *)
(******************************************)
let check_elem_in_tree t e =
let n = search t e in
(get_exn @@ n).value = e
let%test "Testing DFS" =
let n = 1000 in
let t = mk_tree_of_size n in
let l1 = depth_first_search_rec t in
let l2 = depth_first_search_loop t in
List.length l1 = n && l1 = l2 &&
List.for_all (fun e -> check_elem_in_tree t e) l1
let%test "Testing BFS" =
let n = 1000 in
let t = mk_tree_of_size n in
let l1 = depth_first_search_rec t in
let l2 = breadth_first_search_loop t in
List.length l1 = n &&
List.for_all (fun e -> List.mem e l2) l1 &&
List.for_all (fun e -> List.mem e l1) l2
(******************************************)
(* Testing retrieval *)
(******************************************)
let%test "Testing retrieval" =
let n = 1000 in
let t = mk_tree_of_size n in
let m = Random.int n in
let l = breadth_first_search_loop t in
let e = List.nth l m in
let z = search t e in
z <> None
12.1.9. More BST operations¶
Thanks to its invariant, a BST makes it almost trivial to implement operations, such as
- Getting minimum/maximum element in a set representing by a tree
- Find a successor/predecessor of an element
For instance, finding the minimal element of a subtree starting from a node n
can be achieved by the following operation:
let rec find_min_node n =
match left n with
| Some m -> find_min_node m
| None -> n
12.1.10. Deleting a node from BST¶
Deletion of a node from a BST is the most complicated operation, as it requires significant restructuring of the tree in order to maintain its invariant.
Deletion of a non-leaf node from a tree will require some other nod to take its place. This can be achieved by the following operation for performing “transplantation” of one node by another:
(* Replacing node U by (optional) node V in T. *)
let transplant t u v =
(match parent u with
| None -> t.root := v
| Some p ->
match left p with
| Some l when u == l -> p.left := v
| _ -> p.right := v);
(* Update parent of v *)
match v with
| Some n -> n.parent := parent u
| _ -> ()
Let us now discuss possible scenarios for removing a node z
from the tree T
by preserving the BST property.
- The simplest case is when
z
is a leaf, so we can simply remove it. - The node
z
has no left child. In this case, we can simply replace it by its right child (argue, why this is correct) as on the picture below:
![_images/delete-b.png](_images/delete-b.png)
- A similar situation takes place when
z
has only the left child, which replaces it (viatransplant
):
![_images/delete-c.png](_images/delete-c.png)
- In the case when
z
has two children, we need to look up for the node that corresponds to its successor in the ordering of elements. In this particular case, such a successor,y
, is the immediate right child ofz
that has no left child itself (convince yourself that in this casey
is indeed a successor ofz
), therefore we can transplatey
to replacez
:
![_images/delete-d.png](_images/delete-d.png)
- Finally, in the most nasty case,
y
, the successor ofz
, is deep belowz
, and potentially hasa right child (but no left child, otherwise it wouldn’t be the successor ofz
) . In this case we need to make to perform the transformation as follows:
![_images/delete-e.png](_images/delete-e.png)
Specifically, in the last case we first transplant y
and its right child x
and then make r
, the former right child of z
to be the right child of y
. After that we simply transplant y
to the place of z
.
The full code of deletion is as follows:
let delete_node t z =
t.size := !(t.size) - 1;
if left z = None
then transplant t z (right z)
else if right z = None
then transplant t z (left z)
else
(* Finding the successor of `z` *)
let z_right_child = (get_exn @@ right z) in
let y = find_min_node z_right_child in
(* Fact: `y` has no left child *)
(if parent y <> None &&
z != get_exn @@ parent y
then
(* If y is not immediately under z,
replace y by its right subtree *)
let x = right y in
(transplant t y x;
y.right := right z;
(get_exn @@ right y).parent := Some y));
(* Now `y` replaces `z` at its position *)
transplant t z (Some y);
y.left := !(z.left);
(get_exn @@ left y).parent := Some y
How would we test deletion? We can do so by generating a random BST, choosing a random node in it z
, and then checking the following properties for the modified tree after the deletion of z
:
- The tree still satisfies the BST invariant;
- It has the same number of elements;
- All elements from the modified tree plus the deleted one are the elements of the old tree.
These checks can be automatically performed by the following function, parameterised by the size of the tree:
let test_delete n =
let t = mk_tree_of_size n in
let m = Random.int n in
let l = breadth_first_search_loop t in
let e = List.nth l m in
let z = get_exn @@ search t e in
delete_node t z;
(* Checkign the tree invariant *)
assert (check_bst_inv t);
(* Checkign the tree size *)
let ld = breadth_first_search_loop t in
assert (List.length ld = n - 1);
(* Checking integrity *)
assert (List.for_all (fun x -> List.mem x ld || x == e) l)