Special Topics in Distributed Computing (2021)¶

Formally Specifying and Verifying Distributed Systems¶

It is hard to overstate the significance and ubiquity of distributed services in many aspects of modern life, such as health care, online commerce, transportation, entertainment and cloud-based applications. Given the importance of distributed software and its complexity, stemming from its concurrent nature and the necessity to tolerate reordering and loss of packets, it is nowadays considered vital in industry to have a rigorous verification methodology for establishing its correctness properties, ensuring that, once a distributed system is up and running, it will never go wrong and will eventually complete its goals.

Why should we formally verify distributed systems? First, some of the widely-used systems come without fully-fledged correctness arguments, and conducting a formal proof for them is a great way to ensure that they contain no crucial design flaws. Second, even for the algorithms that have been proven correct “with paper and pencil”, mistakes can be introduced by the developers who translate them to the runnable implementations. Indeed, due to practical constraints and optimisations, no implementation can be a 100%-faithful translation of pseudocode.

In this module, we will be studying the state of the art in computer-aided verification of modern distributed systems. In the first half of this module we will (a) review the essential foundations of building distributed protocols (vector clocks, consensus, state-machine replication and Byzantine fault tolerance) and (b) study a number of methodologies and tools for computer-assisted reasoning about essential properties of distributed systems: IO-Automata, proof by refinement, TLA+/PlusCal, model checking, and deductive verification. The second half of the module will be dedicated to seminar-style discussions of recent papers on modern distributed protocols and proofs of their correctness. The assessment for the module will be done by via quizzes, a review of a classical paper, a coding assignment on system specification via TLA+/PlusCal, paper-based seminar presentations, and a research project (proposed independently by each student). For more details on the specific topics, please refer to the Course Syllabus.

Prerequisites¶

The module attendees should be familiar with basics of concurrent and distributed programming, covered in classes such as CS4231, CS5223, or YSC3248. Knowledge of first-order logic and machine-assisted reasoning and verification (e.g., in Isabelle/HOL or Coq) will be helpful.

Module Logistics¶

Grading Policy
Instructor: Ilya Sergey
Lectures: Mondays, 2:00-4:00pm. First lecture on January 11, 2021.
Location: COM1, VCRM, COM1-02-13 (next to the Student Lounge on COM1 Level 2)
Office hours: arranged on demand by appointment

Lecture Notes (Weeks 1-6)¶

The remaining lecture notes will be posted gradually as we progress through the course.

The course’s GitHub Organisation

Paper-Based Seminars (Weeks 8-13)¶

Each week will have two paper-based presentations, connected by the common theme and, ideally, accompanied both by a tool demonstrations.

The following arrangements are tentative and can change in due course of the module.

In case if several papers are listed for the talk, some of them are marked as Optional for peresentation; those of them required to read before the seminar for answering lightning quiz questions marked with Quiz.

Week 08: Testing and Invariant Inference
- Presenter 1: Andreea Costea
  - Compositional programming and testing of dynamic distributed systems
- Presenter 2: Darius Foo
  - I4: Incremental Inference of Inductive Invariants for Verification of Distributed Protocols
Week 09: Verification of Real-World Systems
- Presenter 1: Yunjeong Lee
  - IronFleet: Proving Practical Distributed Systems Correct
- Presenter 2: George Pîrlea
  - (Quiz) Verdi: A Framework for Implementing and Formally Verifying Distributed Systems
  - (Optional) Planning for Change in a Formal Verification of the Raft Consensus Protocol
Week 10: Custom logics
- Presenter 1: Jyoti Prakash
  - Reducing Liveness to Safety in First-Order Logic
- Presenter 2: George Pîrlea
  - (Quiz) Aneris: A Mechanised Logic for Modular Reasoning about Distributed Systems
  - (Optional) Distributed Causal Memory: Modular Specification and Verification in Higher-Order Distributed Separation Logic
Week 11: Reasoning about BFT
- Presenter 1: Yunjeong Lee
  - Survey on Parameterized Verification with Threshold Automata and the Byzantine Model Checker
- Presenter 2: Kiran Gopinathan
  - (Optional, Quiz) Velisarios: Byzantine Fault-Tolerant Protocols Powered by Coq
  - Asphalion: trustworthy shielding against Byzantine faults
Week 12: Verified Blockchain Protocols
- Presenter 1: Yasunari Watanabe
  - (Optional, Quiz) Mechanising Blockchain Consensus
  - (Optional) The Bitcoin Backbone Protocol: Analysis and Applications
  - Formalizing Nakamoto-Style Proof of Stake
- Presenter 2: Yasunari Watanabe
  - (Optional, Quiz) Fast and secure global payments with Stellar
  - On the Formal Verification of the Stellar Consensus Protocol
Week 13: Varia
- Presenter 1: Kiran Gopinathan
  - Mergeable Replicated Data Types
- Presenter 2: Darius Foo
  - An Empirical Study on the Correctness of Formally Verified Distributed Systems

For the full list of available options to present (grouped by common themes) check out the list of Papers to present at the seminars.

Grading Policy¶

The final grade is combined from the following components:

Quiz 1 (Week 4, Fundamentals of Distributed Computing): 10%
Quiz 2 (Week 7, Specification and Verification Techniques): 10%
Written review of a classical paper: 10%. Evaluated based on
- problem statement: 2%
- summary of contributions: 3%
- example: 3%
- a survey of the paper’s impact: 2%
Midterm Homework (TLA+ exercises): 10%
Random Paper-Based Lightning Quizzes (Weeks 8-13): 10%
Paper-Based Presentations (at least two): 20%
Participation in discussions: 5%
Research Projects (teams of one or two): 25%. This grade is combined from:
- Initial project proposal: 5% (why the problem is interesting, why feasible)
- Code quality: 5% (well-documented, corresponds to the text in the report)
- Written report: 10% (evaluated based on structure, clarity)
- Presentation: 5% (you should be able to explain your design choices)

Textbooks and Resources¶

Introduction to Reliable and Secure Distributed Programming by Christian Cachin et al.

More resources for seminar presentations will be added here.

Acknowledgements¶

This module is inspired by the following courses: