# 5.3. Sorting in Linear Time¶

- File:
`LinearTimeSorting.ml`

As we have just determined, one cannot do comparison-based sorting better than
in \(O(n \log n)\) in the worst case. However, we can improve this
complexity if we base the logic of our algorithm *not* just on comparisons, but
will also exploit the intrinsic properties of the data used as keys for
elements to be sorted (e.g., integers). In this chapter we will see some
examples of such specialised sorting procedures.

## 5.3.1. Simple Bucket Sort¶

Bucket sort works well for the case, when the size of the set, from which we draw the keys is limited by a certain number bnum. In this case, we can allocate an auxiliary array of “buckets” (implemented as lists), which will serve to collect elements with the key corresponding to the bucket number. The code is as follows:

```
let simple_bucket_sort bnum arr =
let buckets = Array.make bnum [] in
let len = Array.length arr in
for i = 0 to len - 1 do
let key = fst arr.(i) in
let bindex = key mod bnum in
let b = buckets.(bindex) in
buckets.(bindex) <- arr.(i) :: b
done;
let res = ref [] in
for i = bnum - 1 downto 0 do
res := List.append (List.rev (buckets.(i))) !res
done;
list_to_array !res
```

Having created an array `buckets`

, the sort than traverses the
initial array `arr`

, and puts each element with a key `key`

into
the bucket with the corresponding index, obtained as ```
bindex = key
mod bnum
```

. Notice that if the all keys are in range limited by
`bnum`

, the `mod`

operation returns the key itself.

Therefore, the first `for`

-loop has a complexity \(\Theta(n)\), where
\(n\) is the size of `arr`

. The second loop walks through the array of
buckets all the buckets (making `bnum`

iterations) and concatenates all the
lists, returning the result as the array. It is straightforward to show that the
resulting complexity of the algorithm is in \(O(\mathtt{bnum} \cdot n)\) (it
can be made \(O(\mathtt{bnum} + n)\) if we use append-only buffers, so we
don’t have to re-traverse the lists), i.e., it is linear in `n`

.

We can see `simple_bucket_sort`

in action:

```
# let c =[|9; 9; 0; 9; 4; 7; 9; 2; 3; 3|];;
# simple_bucket_sort 10 c;;
- : int array = [|0; 2; 3; 3; 4; 7; 9; 9; 9; 9|]
```

## 5.3.2. Enhanced Bucket Sort¶

If the size of the space of keys exceeds the number of the buckets, one can still use the same idea, while also sorting each bucket individually with a suitable sorting, such as insertion sort (implemented for lists), as it will be operating on small and almost sorted sub-arrays:

```
let bucket_sort max ?(bnum = 1000) arr =
let buckets = Array.make bnum [] in
let len = Array.length arr in
for i = 0 to len - 1 do
let key = arr.(i) in
let bind = key * bnum / max in
let b = buckets.(bind) in
buckets.(bind) <- arr.(i) :: b
done;
let res = ref [] in
for i = bnum - 1 downto 0 do
let bucket_contents = List.rev (buckets.(i)) in
let sorted_bucket = InsertSort.insert_sort bucket_contents in
res := List.append sorted_bucket !res
done;
list_to_array !res
```

The code of `bucket_sort`

above takes an optional parameter `bnum`

for the number of buckets (default is 10, if omitted) and a parameter
`max`

to indicate the maximal possible key (should be guessed by the
client of the sorting). When allocating elements to the corresponding
buckets, it divides the entire space of keys (up to the maximal one)
into `bnum`

portions, and puts the corresponding element into the
appropriate bucket. Since elements with different keys (from the same
segment) may end up in the same bucket, and additional sorting is
required. Let us test this implementation:

```
# let e = generate_int_array 10000;;
val e : int array = [|4505; 6905; 5076; 9250; 5101; 2539; 1721; ... |]
# bucket_sort 10000 e;;
- : int array = [|0; 1; 3; 3; 5; 5; 5; 6; 6; 9; 10; ... |]
```

## 5.3.3. Stability of sorting¶

An important property of a sorting algorithm is **stability**. A sorting
algorithms is *stable* if it preserves the ordering between the elements
with equal keys in the initial array.

An example of a stable sorting algorithm is `kv_bucket_sort`

shown
below, which sorts an array of key-value pairs based on the keys:

```
let kv_bucket_sort bnum arr =
let buckets = Array.make bnum [] in
let len = Array.length arr in
for i = 0 to len - 1 do
let key = fst arr.(i) in
let bindex = key mod bnum in
let b = buckets.(bindex) in
buckets.(bindex) <- arr.(i) :: b
done;
let res = ref [] in
for i = bnum - 1 downto 0 do
res := List.rev_append buckets.(i) !res
done;
list_to_array !res
```

As an example, consider its following execution:

```
# let f = [|(3, "zqped"); (8, "esmup"); (7, "tvqej"); (8, "xhlzj"); (4, "blann");
(9, "ouors"); (0, "iocvx"); (3, "dacht"); (7, "rncpn");
(7, "khott")|];;
# kv_bucket_sort 10 f;;
- : (int * string) array =
[|(0, "iocvx"); (3, "zqped"); (3, "dacht"); (4, "blann"); (7, "tvqej");
(7, "rncpn"); (7, "khott"); (8, "esmup"); (8, "xhlzj"); (9, "ouors")|]
```

The initial array has elements `(7, "rncpn")`

and `(7, "khott")`

in this very order. In the same order, the appear in the resulting
array. Other stable sorting algorithm is insertion sort. Not all
sorting algorithms are stable though. Try to answer, whether merge
sort is stable? What about Quicksort?

## 5.3.4. Radix Sort¶

The stability comes into play, when one sorting algorithm uses another one as a black-box, relying on the fact that original order of elements in partially-sorted arrays with “almost-same” keys will be preserved.

As an example, radix sort is a linear-time sorting, building on the
idea of bucket-sort, but making it scale logarithmically, which is
necessary if the space of possible keys is too large (e.g., comparable
with the length of an array, in which case bucket sort’s complexity
becomes quadratic). It makes use of bucket sort as its component,
applying it iteratively and sorting a list of integer-keyed elements
*per key digit*, startgin from the smallest register:

```
let radix_sort arr =
let len = Array.length arr in
let max_key =
let res = ref 0 in
for i = 0 to len - 1 do
if arr.(i) > !res
then res := arr.(i)
done; !res
in
if len = 0 then arr
else
let radix = ref max_key in
let ls = array_to_list arr in
let combined = list_to_array (list_zip ls ls) in
let res = ref combined in
while !radix > 0 do
res := kv_bucket_sort 10 !res;
for i = 0 to len - 1 do
let (k, v) = !res.(i) in
!res.(i) <- (k / 10, v)
done;
radix := !radix / 10
done;
let result_list = array_to_list !res in
list_to_array result_list |> Array.map snd
```

It starts by determining the largest key `max_key`

in the initial
array. Next, it creates an array `combined`

, which pairs all
elements in the original array with their keys. In the `while`

loop,
it sorts elements, using `kv_bucket_sort`

, based on their digit. It
starts from the lowest register, and then keeps dividing the key
component of each element, “attached” for the sorting purposes, by 10,
repeating the bucket sort, until it runs out of registers.

How many iterations the `while`

-loop will make? Notice that each
time it divides the key space by 10, so it will only run for
\(\log_{10}( \mathtt{max\_key})\) iterations. This determines the
complexity of the radix sort, which is, therefore \(O(n
\log(\mathtt{max\_key}))\), i.e., it is linear if `max_key`

is
considered as a constant.

One can test the implementation of radix sort as follows:

```
let%test "radix-sort" =
let a = generate_int_array 1000 in
let b = radix_sort a in
array_sorted b &&
same_elems (array_to_list a) (array_to_list b)
```