Special Topics in Distributed Computing (2021)

Formally Specifying and Verifying Distributed Systems

It is hard to overstate the significance and ubiquity of distributed services in many aspects of modern life, such as health care, online commerce, transportation, entertainment and cloud-based applications. Given the importance of distributed software and its complexity, stemming from its concurrent nature and the necessity to tolerate reordering and loss of packets, it is nowadays considered vital in industry to have a rigorous verification methodology for establishing its correctness properties, ensuring that, once a distributed system is up and running, it will never go wrong and will eventually complete its goals.

Why should we formally verify distributed systems? First, some of the widely-used systems come without fully-fledged correctness arguments, and conducting a formal proof for them is a great way to ensure that they contain no crucial design flaws. Second, even for the algorithms that have been proven correct “with paper and pencil”, mistakes can be introduced by the developers who translate them to the runnable implementations. Indeed, due to practical constraints and optimisations, no implementation can be a 100%-faithful translation of pseudocode.

In this module, we will be studying the state of the art in computer-aided verification of modern distributed systems. In the first half of this module we will (a) review the essential foundations of building distributed protocols (vector clocks, consensus, state-machine replication and Byzantine fault tolerance) and (b) study a number of methodologies and tools for computer-assisted reasoning about essential properties of distributed systems: IO-Automata, proof by refinement, TLA+/PlusCal, model checking, and deductive verification. The second half of the module will be dedicated to seminar-style discussions of recent papers on modern distributed protocols and proofs of their correctness. The assessment for the module will be done by via quizzes, a review of a classical paper, a coding assignment on system specification via TLA+/PlusCal, paper-based seminar presentations, and a research project (proposed independently by each student). For more details on the specific topics, please refer to the Course Syllabus.

Prerequisites

The module attendees should be familiar with basics of concurrent and distributed programming, covered in classes such as CS4231, CS5223, or YSC3248. Knowledge of first-order logic and machine-assisted reasoning and verification (e.g., in Isabelle/HOL or Coq) will be helpful.

Module Logistics

  • Grading Policy
  • Instructor: Ilya Sergey
  • Lectures: Mondays, 2:00-4:00pm. First lecture on January 11, 2021.
  • Location: COM1, VCRM, COM1-02-13 (next to the Student Lounge on COM1 Level 2)
  • Office hours: arranged on demand by appointment

Paper-Based Seminars (Weeks 8-13)

Each week will have two paper-based presentations, connected by the common theme and, ideally, accompanied both by a tool demonstrations.

The following arrangements are tentative and can change in due course of the module.

In case if several papers are listed for the talk, some of them are marked as Optional for peresentation; those of them required to read before the seminar for answering lightning quiz questions marked with Quiz.

For the full list of available options to present (grouped by common themes) check out the list of Papers to present at the seminars.

Grading Policy

The final grade is combined from the following components:

  • Quiz 1 (Week 4, Fundamentals of Distributed Computing): 10%
  • Quiz 2 (Week 7, Specification and Verification Techniques): 10%
  • Written review of a classical paper: 10%. Evaluated based on
    • problem statement: 2%
    • summary of contributions: 3%
    • example: 3%
    • a survey of the paper’s impact: 2%
  • Midterm Homework (TLA+ exercises): 10%
  • Random Paper-Based Lightning Quizzes (Weeks 8-13): 10%
  • Paper-Based Presentations (at least two): 20%
  • Participation in discussions: 5%
  • Research Projects (teams of one or two): 25%. This grade is combined from:
    • Initial project proposal: 5% (why the problem is interesting, why feasible)
    • Code quality: 5% (well-documented, corresponds to the text in the report)
    • Written report: 10% (evaluated based on structure, clarity)
    • Presentation: 5% (you should be able to explain your design choices)

Textbooks and Resources

More resources for seminar presentations will be added here.