Specifying Liveness in TLA+¶
Specifying State Transition Systems in TLA+¶
Recommended Reading (Watching)
The pragmatics and basics of TLA+ are given in Videos 1-7 of the video course. Additional details can be found in the first chapters of the Hyperbook.
Highlights:
In TLA, systems are defined as state machines with their changes described via steps. An execution is represented as a sequence of states, and a step is a change from one state to the next one.
As in our previous studies, specifications for systems describe all possible behaviours by means of (a) providing an initial state of the system and (b) defining steps for the system to evolve.
The control flow in ordinary programs are modelled by means of introducing program counters.
Both states and steps are defined as predicates on variables. Steps use “primed” variables do denote the values in the next state.
The TLC model checker performs state space exploration for verifying the specification.
The accompanying code repository provides speicications and safety invariants for the following case studies:
Jug Filling Problem from Die Hard 3 (file
DieHard.tla).Transaction Commit Specification (
TCommit.tla); make sure you understand the invariantsTwo-Phase Commit (
TwoPhaseCommit.tla), as an instance ofTPCommit, but now with some implementation details, namely, messages.Single-Decree Paxos, as another instance of
TPCommit. This protocols’s state space explodes easily, so only model-checked for very small values. To ensure that the framework is useful, try removinga2from one of the quorums (this will also require adapting symmetry conditions).
Note
When checking the basic examples, such as SimpleProgram.tla, you might
need to disable the TLC deadlock checker by adding the line CHECK_DEADLOCK
FALSE to your model configuration file (*.cfg).
Temporal Reasoning, Stuttering, Liveness, and Fairness¶
TLA+ utilises temporal specifications for checking that an implementation (a low-level specification) implements a (high-level) specification. Make sure to check the Lecture 8 of TLA+ video course for details on how to define those.
Recommended Reading (Watching)
The principles of temporal reasoning in TLA+ are outlined in Videos 8-9 of the video course.
Highlights:
We revise the files
TCommit.tlaandTwoPhaseCommit.tla.Variables in TLA+ are global (i.e., not scoped to a module they are defined). This is a feature that allows to check preservation of properties defined in a different module. For instance, both
TwoPhaseCommitandTCommitdefine the variablermState, which is, thus, considered the same in the statement of the theoremTPSpec => TCSpec. That is, the implication works as the spec on the right restricts only a subset of variables of the spec on the left.Implication on specifications allows for stuttering, i.e., skipping the equivalent states in the (more refined) specification on the left.
Deadlock is identified as an infinite sequence of stuttering steps. All our systems so far allow for deadlocks as they admit stuttering steps when no involved variables change.
So far, our specs only tell what systems may do (hence the possible stuttering), but not what they must do.
TLA+ has normal lists with all usual operations. See
SeqSpec.tlafor an example. You can parse it and then runTLA+: Evaluate Expressionaction for an input such asRemove (3, <<1,2,3,4>>).For the fair definition, consider the specification in
ABSpec.tla.Liveness property can always be satisfied by a behaviour by completing it with a number of states.
Weak fairness of action A asserts of a behavior: If A ever remains continuously enabled, then an A step must eventually occur. It’s written as
WF_vars(A)in TLA+. Thevarssubscript ensures that this step is not stuttering, i.e., it will change some of the variables.When reasoning about fairness with TLC, we can’t use symmetry sets, as they introduce equivalence classes. These classes might make the checker might fail to detect unproductive loops (e.g.,
aswitched to an equivalentband back), leading to liveness violation. Therefore, the symmetry properties can provide false negatives: the checker can say that your spec satisfies the properties when it actually doesn’t.An implementation of an alternating bit protocol is given in
AB.tla. It utilises sequences to represent message buffers. Notice that, according to the spec,AandBcan keep sending messages faster than they get lost or received. There is no limit to how long the sequencesAtoBandBtoAcan be, so we might need to limit that using state constraints.Trying weak fairness in the example shows that it’s not enough to prove the temporal specification: the weak fairness doesn’t guarantee that receive-messages, once enabled, will remain enabled forever. Therefore, the protocol can make no progress, because it gets enabled when the message is sent, but is disabled once it’s lost. Weak fairness of
ARcvis true because it is never continuously enabled (hence the whole property is vacuous).Strong fairness of action A asserts of a behavior: If A ever remains repeatedly enabled, then an A step must eventually occur. It’s written as
SF_vars(A)in TLA+. Making sure thatARcvandBRcvare subject of strong liveness makes abstract temporal specification true.
Implementation Refinement¶
Recommended Reading (Watching)
The (2-part) Lecture 10 of the video course explains refinement proofs in TLA+.
Highlights:
The substitution principle for temporal formulas only holds if they are “persistent”, i.e., hold at any state/step, as in \((\square~(v = e)) \Longrightarrow (f = (f[e/v]))\).
The new module
AB2.tlais a modified version ofAB.tlawhere messages can be corrupted and detected as such.When we try to show that the fairness specification of
AB2.tlaimplies fairness in the style ofAB.tla, we notice that this is not true. The reason for that is explained in the comment right after the definition ofFairSpec. One way to fix it is to introduce auxiliary boolean variables that state which messages are going to be corrupted, and which won’t. An example of such extension is given in a derived moduleAB2P.tla(check the comments for more details). Notice that the constraints for model-checking had to be redefined for tractability.Importantly, the (low-level)
AB2protocol correctly implementAB(high-level), where theLoseStepofABis implemented byCorruptMsgofAB2. The transformation from the “richer” state ofAB2(as it contains bad messages) to the state ofABis obtained by simply “erasing” bad messages, which is done by the (recursive) functionRemoveBaddefined inAB2H.tla. This module defines a so-called refinement mapping between states ofAB2andAB.More specifically, in
AB2H.tla,AtoBandBtoAare imaginary (auxiliary) variables that are added as an impedance matched to relate the behaviours ofAB2to those ofAB.An alternative way to define the refinement mapping is to follow an example in
AB2.tlaandAB.tla, in which the correspondence between their statements is established by building a refinement mapping that removes corrupted messages from the state ofAB2as done in line:AB == INSTANCE AB WITH AtoB <- RemoveBad(AtoB2), BtoA <- RemoveBad(BtoA2)
The result of establishing the refinement mapping is a “proof” that
AB2produces behaviours equivalent to those ofABmodulo the corrupted messages.
Some Recent TLA+ Formalisations¶
Almost all major distributed protocols have been specified in TLA+: among those are Paxos, Disk Paxos, Flexible Paxos, BFT, Abstract by Guerraoui et al, and Raft. Here are some more modern developments (the list is courtesy of Jialin Li):
Just Say NO to Paxos Overhead: Replacing Consensus Overhead with Network Ordering
Eris: Coordination-Free Consistent Transactions Using In-Network Concurrency Control
Harmonia: Near-Linear Scalability for Replicated Storage with In-Network Conflict Detection
Pegasus: Tolerating Skewed Workloads in Distributed Storage with In-Network Coherence Directories