Specifying Liveness in TLA+¶
Specifying State Transition Systems in TLA+¶
Recommended Reading (Watching)
The pragmatics and basics of TLA+ are given in Videos 1-7 of the video course. Additional details can be found in the first chapters of the Hyperbook.
Highlights:
In TLA, systems are defined as state machines with their changes described via steps. An execution is represented as a sequence of states, and a step is a change from one state to the next one.
As in our previous studies, specifications for systems describe all possible behaviours by means of (a) providing an initial state of the system and (b) defining steps for the system to evolve.
The control flow in ordinary programs are modelled by means of introducing program counters.
Both states and steps are defined as predicates on variables. Steps use “primed” variables do denote the values in the next state.
The TLC model checker performs state space exploration for verifying the specification.
The accompanying code repository provides speicications and safety invariants for the following case studies:
Jug Filling Problem from Die Hard 3 (file
DieHard.tla
).Transaction Commit Specification (
TCommit.tla
); make sure you understand the invariantsTwo-Phase Commit (
TwoPhaseCommit.tla
), as an instance ofTPCommit
, but now with some implementation details, namely, messages.Single-Decree Paxos, as another instance of
TPCommit
. This protocols’s state space explodes easily, so only model-checked for very small values. To ensure that the framework is useful, try removinga2
from one of the quorums (this will also require adapting symmetry conditions).
Note
When checking the basic examples, such as SimpleProgram.tla
, you might
need to disable the TLC deadlock checker by adding the line CHECK_DEADLOCK
FALSE
to your model configuration file (*.cfg
).
Temporal Reasoning, Stuttering, Liveness, and Fairness¶
TLA+ utilises temporal specifications for checking that an implementation (a low-level specification) implements a (high-level) specification. Make sure to check the Lecture 8 of TLA+ video course for details on how to define those.
Recommended Reading (Watching)
The principles of temporal reasoning in TLA+ are outlined in Videos 8-9 of the video course.
Highlights:
We revise the files
TCommit.tla
andTwoPhaseCommit.tla
.Variables in TLA+ are global (i.e., not scoped to a module they are defined). This is a feature that allows to check preservation of properties defined in a different module. For instance, both
TwoPhaseCommit
andTCommit
define the variablermState
, which is, thus, considered the same in the statement of the theoremTPSpec => TCSpec
. That is, the implication works as the spec on the right restricts only a subset of variables of the spec on the left.Implication on specifications allows for stuttering, i.e., skipping the equivalent states in the (more refined) specification on the left.
Deadlock is identified as an infinite sequence of stuttering steps. All our systems so far allow for deadlocks as they admit stuttering steps when no involved variables change.
So far, our specs only tell what systems may do (hence the possible stuttering), but not what they must do.
TLA+ has normal lists with all usual operations. See
SeqSpec.tla
for an example. You can parse it and then runTLA+: Evaluate Expression
action for an input such asRemove (3, <<1,2,3,4>>)
.For the fair definition, consider the specification in
ABSpec.tla
.Liveness property can always be satisfied by a behaviour by completing it with a number of states.
Weak fairness of action A asserts of a behavior: If A ever remains continuously enabled, then an A step must eventually occur. It’s written as
WF_vars(A)
in TLA+. Thevars
subscript ensures that this step is not stuttering, i.e., it will change some of the variables.When reasoning about fairness with TLC, we can’t use symmetry sets, as they introduce equivalence classes. These classes might make the checker might fail to detect unproductive loops (e.g.,
a
switched to an equivalentb
and back), leading to liveness violation. Therefore, the symmetry properties can provide false negatives: the checker can say that your spec satisfies the properties when it actually doesn’t.An implementation of an alternating bit protocol is given in
AB.tla
. It utilises sequences to represent message buffers. Notice that, according to the spec,A
andB
can keep sending messages faster than they get lost or received. There is no limit to how long the sequencesAtoB
andBtoA
can be, so we might need to limit that using state constraints.Trying weak fairness in the example shows that it’s not enough to prove the temporal specification: the weak fairness doesn’t guarantee that receive-messages, once enabled, will remain enabled forever. Therefore, the protocol can make no progress, because it gets enabled when the message is sent, but is disabled once it’s lost. Weak fairness of
ARcv
is true because it is never continuously enabled (hence the whole property is vacuous).Strong fairness of action A asserts of a behavior: If A ever remains repeatedly enabled, then an A step must eventually occur. It’s written as
SF_vars(A)
in TLA+. Making sure thatARcv
andBRcv
are subject of strong liveness makes abstract temporal specification true.
Implementation Refinement¶
Recommended Reading (Watching)
The (2-part) Lecture 10 of the video course explains refinement proofs in TLA+.
Highlights:
The substitution principle for temporal formulas only holds if they are “persistent”, i.e., hold at any state/step, as in \((\square~(v = e)) \Longrightarrow (f = (f[e/v]))\).
The new module
AB2.tla
is a modified version ofAB.tla
where messages can be corrupted and detected as such.When we try to show that the fairness specification of
AB2.tla
implies fairness in the style ofAB.tla
, we notice that this is not true. The reason for that is explained in the comment right after the definition ofFairSpec
. One way to fix it is to introduce auxiliary boolean variables that state which messages are going to be corrupted, and which won’t. An example of such extension is given in a derived moduleAB2P.tla
(check the comments for more details). Notice that the constraints for model-checking had to be redefined for tractability.Importantly, the (low-level)
AB2
protocol correctly implementAB
(high-level), where theLoseStep
ofAB
is implemented byCorruptMsg
ofAB2
. The transformation from the “richer” state ofAB2
(as it contains bad messages) to the state ofAB
is obtained by simply “erasing” bad messages, which is done by the (recursive) functionRemoveBad
defined inAB2H.tla
. This module defines a so-called refinement mapping between states ofAB2
andAB
.More specifically, in
AB2H.tla
,AtoB
andBtoA
are imaginary (auxiliary) variables that are added as an impedance matched to relate the behaviours ofAB2
to those ofAB
.An alternative way to define the refinement mapping is to follow an example in
AB2.tla
andAB.tla
, in which the correspondence between their statements is established by building a refinement mapping that removes corrupted messages from the state ofAB2
as done in line:AB == INSTANCE AB WITH AtoB <- RemoveBad(AtoB2), BtoA <- RemoveBad(BtoA2)
The result of establishing the refinement mapping is a “proof” that
AB2
produces behaviours equivalent to those ofAB
modulo the corrupted messages.
Some Recent TLA+ Formalisations¶
Almost all major distributed protocols have been specified in TLA+: among those are Paxos, Disk Paxos, Flexible Paxos, BFT, Abstract by Guerraoui et al, and Raft. Here are some more modern developments (the list is courtesy of Jialin Li):
Just Say NO to Paxos Overhead: Replacing Consensus Overhead with Network Ordering
Eris: Coordination-Free Consistent Transactions Using In-Network Concurrency Control
Harmonia: Near-Linear Scalability for Replicated Storage with In-Network Conflict Detection
Pegasus: Tolerating Skewed Workloads in Distributed Storage with In-Network Coherence Directories