In the last post, we described a protocol that should satisfy the requirements and invariants established in the first post. Today we will look at formal verification with TLA+.

Formal verification is just another (niche) tool in the toolbox. Some tools require more skill than others to use. Some tools are more expensive than others. It is up to the practioner to decide if/when/how to use them.

The hard part is that you won't necessarily know if it is beneficial to a given problem you face, if you aren't already skilled in it. If a tool is very difficult to learn, then you might never invest in it enough to be able to make that call. Or you might invest a lot of time into it, to find it isn't a great match for your problem. At which point it gets stowed in your toolbox where it may or may not get used again. I expect many software engineers see learning formal methods as a difficult (it is) and high risk venture.

So, given the above, my aim of this post is for software engineers without prior experience of TLA+ to be able to get the gist of the spec and see why it was useful for this project. Please give me feedback if I succeeded or not.

I’ll split it up into three parts:

An intro to the syntax and data structures

An overview of the spec

The model checker and how it discovered a defect in my first version of the protocol.

For learning TLA+ and PlusCal (which is a pseudo-code which compiles to TLA+) I highly recommend Leslie Lamport’s video series and book, and Hillel Wayne has great content and a book and Ron Pressler writes about TLA+ on his blog.

Section 1 - Introduction to TLA+

Warning: I will try to avoid abstract and overly mathematical language and try to keep it closer to the language that programmers are used to. However, if you want to get serious about TLA+ then you’re going to have to embrace the strangeness and mathematical nature of TLA+, I learned it by watching Leslie Lamports video series and reading his book Specifying Systems.

A TLA+ specification models a set of variables and constants which represent the state of the system we want to model. A state in TLA+ terms is the assignment of concrete values to all variables in the specification, within a single step.

A state represents one moment in time of the system, a snapshot of the system. The notion of time is not continuous but a set of discrete steps where variables are assigned values and invariants can be verified. Each new step means a new state. Each state assigns a value to every variable, though not all variables must change, from one state to another. When we don’t modify a variable we must explicitly say that the value remains unchanged.

Let’s take a super simple example: a counter i, that is incremented forever. The specification is formed by formulae that either describe the initial state of the system, or the possible next states. This is what all TLA+ specs are based on: initial state and next state.

The initial state sets the variable i to 0. The next state formula will increment the value of i by one in the next state. The next state formula in this spec is “executed” again and again forever.