2. Week 02: SMR and Consensus Protocols

2.1. State-Machine Replication and Atomic Commit

State Machine Replication (SMR) is a general method for implementing a fault-tolerant service by replicating servers and coordinating client interactions with server replicas. The approach also provides a framework for understanding and designing replication management protocols.

Distributed software is often structured in terms of clients and services. Each service comprises one or more servers and exports operations (executed by the said state-machines) that clients invoke by making requests. Although using a single, centralised server is the simplest way to implement a service, the resulting service can only be as fault tolerant as the processor executing that server. If this level of fault tolerance is unacceptable, then multiple servers that fail independently can be used. Usually, replicas of a single server are executed on separate processors of a distributed system, and protocols are used to coordinate client interactions with these replicas. The physical and electrical isolation of processors in a distributed system ensures that server failures are independent, as required.

A particular scenario of implementing a family of replicated state machines is by making all of them to read from the same log of commands and execute them. This reduces the problem of reconciling replicated machines to reconciling the sequences of entries in the log. This can be done by means of performing an atomic commit, in which a number of replicas should collectively agree on a course of actions.

So how to we keep this log consistent across multiple machines?

A simplest protocol for achieving the atomic commit for the replicated log is the Two-Phase Commit protocol (TPC).

Mandatory Reading

The TPC protocol is pretty much folklore nowadays, so check out this Wikipedia article on Two-Phase Commit to get the idea of how it works. You can also read up the explanations in Section 4 of this paper.

A simple implementation of TPC along with tests implemented in Scala/Akka, can be found in this project, accompanying the module. Look for the classes starting with TwoPhaseCommit*.

2.2. Distributed Consensus and Paxos

A more general to implement SMR is by relying on a general notion of distributed consensus. In distributed computing consensus is a procedure for a number of processes to agree on a single proposal made by one of them. The key consensus properties are:

  • Uniformity: Only a single value is chosen
  • Non-triviality: Only a value that has been proposed may be chosen
  • Irrevocability: Once agreed on a value, the processes do not change their decision.

In the previous section we saw how to reach a consensus with Lamport’s logical clocks, under the condition that no replicas are faulty.

Now, let us see how to build a practical distributed consensus algorithm from the ground up, culminating with Lamport’s famous Paxos construction

Mandatory Reading

The slides on Paxos illustrate the progression towards the Paxos construction.

Mandatory Reading

The original paper by Lamport provides more intuition and details on the construction of the “single-decree” Paxos.

Self-check: Make sure you understand the roles of proposers, acceptors, and learners, as well as how ballots are handled by the acceptors.

A simple implementation of Single-Decree Paxos and tests for it can be found in the same project. For the implementation look into the file SimplePaxos.scala. Tests are located in the folder src/test/scala/org/protocols/paxos/simple.

The “full” Paxos, which works for a sequence of operations over the so-called “distributed log” is described in a varieary of papers, such as Lamport’s famous “The Part-Time Parliament”, as well as more recent (and much more readable) “Paxos Made Moderately Complex”.

The main critique of Paxos is usually concerned with its opaqueness, derived from its choice of the single-decree subset as its foundation. Single-decree Paxos is dense and subtle: it is divided into two stages that do not have simple intuitive explanations and cannot be understood independently. Because of this, it is difficult to develop intuitions about why the single-decree protocol works. The composition rules for Multi-Paxos add significant additional complexity and subtlety.

Optional Reading

The paper Paxos Consensus, Deconstructed and Abstracted proposes one way to dervie Multi-Paxos systematically from a single-decree via a series of semantics-preserving transformations. You can check out the implementation of those ideas in the accompanying consensus project on GitHub.

2.3. Bibliography