11.1. File Input and Output in OCaml¶
Any realistic program interacts with an outside world by either getting an input from the user via textual or graphical interface, or reading/writing from/to files.
Input/Output (IO) with files in OCaml can be implemented in multiple ways, and we will employ some of the state-of-the art libraries that provide convenient mechanisms to do so. In order to compile and run the rest of this lecture, please make sure that your have packages
batteries installed via
opam install core batteries
Amongst other things,
core redefines and enhances some of the familiar modules, which we used before, such as
Array. Specifically, it heavily uses named arguments for functions. Such arguments require a specific
name to be provided before the value passed (in a form
~name:value). With such, they can be placed at any position in the parameter list. As an example, the following expression:
List.filter (fun x -> x > 1) [1; 2; 3];;
can be written, using a version of
core as follows:
List.filter ~f:(fun x -> x > 1) [1; 2; 3];;
List.filter [1; 2; 3] ~f:(fun x -> x > 1);;
Since the parameter
f is named, it can be located at any position.
11.1.1. Reading and Writing with Channels¶
In an operating system, files can be concurrently accessed for reading/writing by multiple applications. Because of this, the access to them needs to be controlled. OCaml enables this via channels — an abstraction that guarantees that no one is modifying the file, from which reading is done, and no one is reading from a file, to which we write.
A channel for reading can be used as in the following example that reads all lines from a file with the path
let read_file_to_strings filename = let file = In_channel.create filename in let strings = In_channel.input_lines file in In_channel.close file; strings
Notice that before the function returns its result, it has to close the channel, thus giving up the read access to it, so other applications could use it. This should be done because operating systems have a limit on the number of files that can be opened simultaneously for reading and writing.
In OCaml, the pattern of reading from a file and closing the channel after completing the optation can be done using the
with_file function which takes a file name an a function that tells
f how to obtain a result from the input channel
input of the file:
let read_file_to_single_string filename = In_channel.with_file filename ~f:(fun input -> In_channel.input_all input)
Writing from the files is done similarly, although the corresponding functions for manipulating with write-channels take some additional parameters:
let write_string_to_file filename text = let outc = Out_channel.create ~append:false filename in Out_channel.output_string outc text; Out_channel.close outc let write_strings_to_file filename lines = Out_channel.with_file ~append:false ~fail_if_exists:false filename ~f:(fun out -> List.iter lines ~f:(fun s -> Out_channel.fprintf out "%s\r\n" s))
For instance, both
Out_channel.with_file take optional parameters (that come with default values)
~fail_if_exists that determine the corresponding behaviour in the case if the file already exists. For instance, by passing
~append:false we indicate that the contents of the file needs to be rewritten, rather than appended to.
11.1.2. Copying Files¶
We can use the functions above to copy files:
let copy_file old_file new_file = let contents = read_file_to_single_string old_file in write_string_to_file new_file contents
Any Unix-like system comes with hash utilities to ensure that the contents of a file are intact by computing its checksum or hash. This can be done for a file
filename using either:
11.1.3. Representing Strings¶
One can think of files as of sequences of 0 and 1 stored in a computer’s memory. How can one tell that a file stores “text” or it is “binary”?
The text files are identified (usually empirically) according to the encoding used to represent text in them. One of the most common encoding ASCII, uses 8-bit sequences (known as bytes or OCaml type
char) to encode 256 characters, including upper/lowercase letters of the latin alphabet, numbers and some punctuation marks. Another encoding UTF-16 uses 16-bit sequence, which allows it to encode 65536 symbols, so it includes all of ASCII plus the letters of most of existing alphabets. OCaml strings are treated as sequences of bytes (represented by the data type
char). Therefore, the characters from ASCII are represented by
char accurately, while
UTF-16 characters are broken into two bytes, when considering them as string components. The difference can be observed via the following example:
utop # let ascii_string = "ATR";; val ascii_string : string = "ATR" utop # String.length ascii_string;; - : int = 3 utop # ascii_string.;; - : char = 'R'
Let us try a string that has a Cyrillic character from UTF-16 encoding:
utop # let utf16_string = "ATЯ";; val utf16_string : string = "ATЯ" utop # String.length utf16_string;; - : int = 4 utop # utf16_string.;; - : char = '\208'
When working with strings the following functions implemented via
core machinery will come useful:
let trimmer = Core.String.strip ~drop:(fun c -> Core.List.mem ['\n'; ' '; '\r'] c ~equal:(fun a b -> equal_char a b)) let splitter s = String.split_on_chars ~on:['\n'; ' '; '\r'] s |> List.filter ~f:(fun s -> not @@ String.is_empty s)