This is part 3 in a four-part series. Part 1, Part 2, Part 4. A video of a 40-minute talk that covers parts of this series.

If you find TLA+ and formal methods in general interesting, I invite you to visit and participate in the new /r/tlaplus on Reddit.

In part 2 we learned how we specify data and operations on data. In this post we will finally learn how we specify and reason about processes, or dynamic systems — in particular, algorithms — in TLA+ and how we check their correctness. While there is only very little syntax left to learn, our study of the theory of representing programs as mathematical objects is now just beginning. Most of our discussion will not concern TLA+ specifically, but TLA, Lamport’s Temporal Logic of Actions.

This post is the longest in the series by far. Covering in depth a mathematical theory — even a relatively simple one — for reasoning about computation is not light reading. But I will repeat my warning from part 1: very little of the theory and its background presented here is necessary in order to use TLA+ to specify and verify real-world software.

A Bit of Historical Background

The need to mathematically reason about non-trivial programs was recognized very early on in the history of computation. Possibly the earliest attempt at a method for mathematical reasoning about programs — the first formal method — was devised by Alan Turing in his paper Checking a Large Routine and presented in a talk in 1949, who, interestingly, used a similar notation, that of primed variables denoting a variable’s value following a program step, to that of Lamport’s TLA. Unfortunately, it seems that talk, like much of Turing’s work, was ignored at the time and probably did not influence later ideas. A very similar, yet somewhat expanded approach to that of Turing’s, was invented by Robert Floyd in 1967 and then presented in a more formal way by Tony Hoare in 1969 as what is now known as Floyd-Hoare logic (or just Hoare logic).

Turing’s paper begins thus:

How can one check a routine in the sense of making sure that it is right? In order that the man who checks may not have too difficult a task, the programmer should make a number of definite assertions which can be checked individually, and from which the correctness of the whole programme easily follows.

Turing’s, Floyd’s and Hoare’s method uses what’s known as “assertional reasoning”; its main idea — listing at each point in the program which facts, assertions, about the program’s state are true at that point.

Lamport, who in 1977 began working on expanding Floyd and Hoare’s techniques to concurrent programs, was an early believer in the idea of program verification. In 1979 he wrote a letter to the editor of Communications of the ACM regarding their recently introduced rules for publishing algorithms:

For years, we did not know any better way to check programs than by testing them to see if they worked… But the work of Floyd and others has given us another way. They taught us that a program is a mathematical object, so we can apply the reasoning methods of mathematics to deduce its properties… After Euclid, a theorem could no longer be accepted solely on the basis of evidence provided by drawing pictures. After Floyd, a program should no longer be accepted solely on the basis of how it works on a few test cases. A program with no demonstration of why it is correct is the same as a conjecture — a statement which we think may be a theorem. A conjecture must be exceptionally interesting to warrant publication. An unverified program should also have to be exceptional to be published… The ACM should require that programmers convince us of the correctness of the programs that they publish, just as mathematicians must convince one another of the correctness of their theorems. Mathematicians don’t do this by giving “a sufficient variety of test cases to exercise all the main features,” and neither should computer scientists.

One point of contention between Lamport and others who were working on reasoning about programs at the time from a more linguistic point of view, like Tony Hoare and Robin Milner, was about the representation of the program’s control state, namely the program counter and (if applicable) the call stack. Those who work on programming language theory are adamantly reluctant — to this day — to make the program’s control’s state explicit, while Lamport claims that it makes reasoning about concurrency significantly easier.

Amir Pnueli’s introduction of temporal logic into computer science was somewhat of a watershed moment for software verification, resulting in not one but two Turing Awards, one to Pnueli and one to Clarke, Emerson and Sifakis for the invention of temporal logic model checkers. Pnueli’s logic offered “a unified approach to program verification… which applies to both sequential and parallel programs.” Note that “parallel programs” also refers to nonterminating interactive programs, where it’s useful to consider the user as a concurrent process.

Lamport was impressed with temporal logic but became disillusioned with its practical application when he saw his colleagues “spending days trying to specify a simple FIFO queue — arguing over whether the properties they listed were sufficient. I realized that, despite its aesthetic appeal, writing a specification as a conjunction of temporal properties just didn’t work in practice.”

The Temporal Logic of Actions (TLA), which forms the core of TLA+, was invented in the late ’80s and is the culmination of Lamport’s ideas about how to reason about algorithms and software or hardware systems. TLA differs from earlier uses of temporal logics in two important ways. The first is that Lamport, a staunch believer in the power of ordinary math, made TLA restrict the need for temporal reasoning to a bare minimum:

TLA differs from other temporal logics because it is based on the principle that temporal logic is a necessary evil that should be avoided as much as possible. Temporal formulas tend to be harder to understand than formulas of ordinary first-order logic, and temporal logic reasoning is more complicated than ordinary mathematical … reasoning.

Another difference is that temporal logic is usually used to reason about programs written in some programming language, while TLA is a universal mathematical notation in which one writes both the algorithm and its claimed properties as formulas in the same logic. This makes TLA a program logic, and a particularly powerful one at that. Lamport writes:

Classical program verification, begun by Floyd and Hoare, employs two languages. The program is written in a programming language, and properties of the program are written in the language of formulas of some logic. Properties are derived from the program text by special proof rules, and from other properties by reasoning within the logic… A program logic expresses both programs and properties with a single language. Program $\Pi$ satisfies property $P$ if and only if $\Pi \implies P$ is a valid formula of the logic.

Algorithms and Programs

Because algorithms are described in TLA rather differently from what you may be used to in programming, this would be a good point to ask, what is an algorithm?

In that 1979 letter to the editor of CACM, Lamport explains how he sees the difference between an algorithm and a program:

The word “algorithm” usually means a general method for computing something, and “program” means code that can be executed on a computer.

Bob Harper, a programming language theorist, disagrees and writes that:

Algorithms are programs. The supposed distinction only arose because of ultra-crappy programming languages.

Let’s examine Harper’s assertion. Suppose we read on Wikipedia a description of Tony Hoare’s famous Quicksort algorithm — which we will formally specify later in this post — and implement it in, say, BASIC and Pascal. Do the two programs encode the same algorithm or not? If we say they do, then Harper’s statement is taken to mean “algorithms are programs modulo some equivalence,” but this requires us to define what it means for programs to be equivalent, or “essentially the same”, and it is then that equivalence — which is bound to be far from trivial — that captures the essence of what an algorithm is rather than a specific program. Indeed, a paper I will discuss briefly in part 4, defines an algorithm precisely in this way, an equivalence relation on programs, and shows that it is the resulting “quotient” that captures the more important essence of what we mean when we say algorithm. On the other hand, if we do not consider the two programs to be the same algorithm — perhaps because the two languages have some differences in implementation, concerning, say, how arrays are represented in memory, and we consider those differences significant — then the very same Lisp program could actually encode several different algorithms depending on what version of compiler or interpreter we use to run it, as different versions may have implementation details that are just as different as those between Pascal and BASIC.

As an example, consider Euclid’s algorithm for computing the greatest common divisor of two natural numbers. I picked this example because the algorithm is one of the oldest known that is still in use (according to Wikipedia, it was published in 300BC), it is very simple, and because it is a favorite example of Leslie Lamport’s in his TLA+ tutorials and talks. This is the algorithm: if the two numbers are equal, then they’re equal to their GCD; otherwise, we instead consider the pair comprising the smaller of the two numbers and the difference between the two numbers, and repeat.

Now, here’s a simple question that echoes a favorite topic of discussion in some online programming forums: is Euclid’s algorithm imperative or pure-functional? We cannot answer this question because the algorithm doesn’t talk about specific memory operations, doesn’t say anything about the scope and mutation rules for variables, and doesn’t dictate or forbid the use function abstractions when programming it. Nevertheless, it is no doubt an actual algorithm. We can even reason about it and prove its correctness — so we understand exactly what it means and why it works — yet it leaves unspecified this particular detail, which turns out to be irrelevant to understanding how and why the algorithm works. The lack of this detail is not a flaw, let alone one due to Euclid’s choice of an ultra-crappy language to describe his algorithm (although, personally, Greek would not be my first choice).

An algorithm provides some details and leaves others unspecified; we say that an algorithm works if it does what it claims to do no matter how we fill in the gaps in the description. A program, being a description of an algorithm in a (formal) programming language, is a description at a specific level of detail fully dictated by the language in which it is expressed. It can provide neither more nor less detail than that required by the language. A Python program cannot define precisely how its data is to be laid-out in memory, and when writing a mergesort algorithm, it cannot leave out the detail of whether it is to be done sequentially or in parallel (some languages may give more leeway, but even then they have some fixed range of specificity). When Euclid’s algorithm is implemented in some programming language, we usually have no trouble determining whether the algorithm is imperative or pure-functional, yet, regardless of the answer to that, we also have no trouble determining whether or not the program actually implements Euclid’s algorithm. Qualities that seem so essential to discussions and endless debates about programming, are irrelevant detail for the algorithms our programs are ultimately written to carry out.

To drive this point home, let’s consider Quicksort again. This is the algorithm according to Wikipedia :

Pick an element, called a pivot, from the array. Partitioning: reorder the array so that all elements with values less than the pivot come before the pivot, while all elements with values greater than the pivot come after it (equal values can go either way). After this partitioning, the pivot is in its final position. This is called the partition operation. Recursively apply the above steps to the sub-array of elements with smaller values and separately to the sub-array of elements with greater values.

I claim that none of the steps of this algorithm is directly expressible as a program in virtually any well-known programming language. The first step requires picking an element in the array; but which element is it and how is it picked? There are many different choices. The second step requires reordering the elements of the array, but there are many different reorderings that satisfy the partitioning requirement. The third step calls for recursively applying the steps to two sub-arrays, but doesn’t mention in which order, or whether in any sequential order at all; maybe we can work on the two sub-arrays in parallel.

Without those details, we simply cannot write a program that implements Quicksort. However, those details are not missing from the algorithm, which is complete as given; they simply do not matter, or, in other words, their particulars are not assumed. The algorithm works no matter which element we pick as a pivot; it works no matter which particular reordering we choose as long as its a proper partition; it works no matter in what order we execute the recursive steps or even if we run them in parallel. In fact, any description that does fill in those details — and therefore any Quicksort program — cannot possibly fully describe Quicksort, but merely one of its many possible implementations.

This is not a limitation of one specific programming language or another, ultra-crappy or not. It is a limitation of all programming languages, as those are formalisms designed to program deterministic computers, while algorithms are often best specified — as in the case of Quicksort — as nondeterministic. Various kinds of implicit nondeterminism in programming languages (e.g. concurrency) are not rich enough to express the nondeterminism we use to describe algorithms, and even programming languages that aim to express nondeterminism explicitly (like Prolog) fail at this, because they all must still describe a deterministic executable if they are to serve as programming languages, and it is impossible to simulate arbitrary nondeterminism on deterministic machines without changing at least one of algorithm’s core properties, namely its computational complexity.

If we could describe the general Quicksort algorithm, we could verify that it is correct (i.e. that it sorts its input). Once we’ve done that, we no longer need to prove that a particular implementation of the algorithms is correct — we just need to prove that it is indeed an implementation of the algorithm. TLA+ allows precisely this form of verification, namely, verifying that some property holds for a high-level specification, and then verifying that a low-level specification implements the high-level one.

Now, you may think that as a programmer, rather than a researcher who publishes algorithms in scientific journals, you do not care about the distinction between an algorithm and a program because you’re only interested in programs, anyway. But you would be wrong. You are probably required to design large software systems, not just code isolated subroutines, and a system resembles an algorithm much more than a program. When you design the system, you don’t care about the details of which text search product will be used, which message queue will be picked, which database will store your data, and which MVC framework will be used on the client. You may very well be interested in some of their properties — for example, whether or not the message queue can drop messages, the consistency guarantees of the database, the performance of the search, etc. — but there are many more details you don’t care about. Nevertheless, you still need to know that the system will meet its requirements no matter which component is chosen (as long as it satisfies the requirements you do care about). You therefore need to test the correctness of a design that cannot be detailed enough to actually run, and it must not be: if it is too detailed, its behavior may accidentally depend on some database quirk that all of a sudden goes away in the next release. Even if your components are all picked in advance and you’re sure that they won’t change, as I said in part 1 verifying a specification that is very detailed, spanning everything from the code-level to high-level architecture is currently infeasible in practice. The ability to specify and verify a high-level design without supplying low-level details is necessary for a formal method to scale.

TLA gives a quite satisfying answer to the question of what an algorithm is (and how algorithms of different levels of detail relate to one another): an algorithm is quite literally the collection of all its implementations. Another way to look at it is that an algorithm is the common element among all of its implementation; yet another is that an algorithm is an abstraction — a less detailed description — of its implementations and a refinement — a more detailed description — of algorithms it implements itself, namely its own abstractions. This yields a partial-order relation on all algorithms, called the abstraction/refinement relation.

Some programming languages that include specification mechanisms (like Eiffel, SPARK and Clojure) also allow you to specify what an algorithm does in some language-specific level of detail, and those specifications also denote the set of all possible implementations, but they still make a very clear distinction between a specification and a program, the latter being the only thing that’s executable. The same is true in sophisticated research programming languages (like Agda and Idris) that make use of dependent types as a specification language. Even though the type definitions (like specification contracts) allow full use of the language’s syntax, there is a clear-cut, binary distinction between a specification — a type — which can be at some relatively flexible level of detail, and a program or a subroutine — a type inhabitant — whose level of detail is predetermined by the language’s particular semantics. Types and their inhabitants, even in dependently-typed languages, are two distinct semantic categories. You cannot execute a type, nor can a subroutine directly serve as a specification for another subroutine. TLA makes no such distinction; one algorithm can implement another, and “running” an algorithm means examining all of its possible behaviors, whether there are very few of them (as in a program) or very many (as in a non-detailed algorithm).

An algorithm is, then, something more general than a program. Every program is an algorithm, but not every algorithm is a program. Nevertheless, there is no theoretical distinction between a program and an algorithm; they differ in measure not in quality, and what constitutes sufficient detail to qualify as a program depends on the choice of a programming language. In fact, just as an algorithm can implement another algorithm by adding more detail, and a program can implement an algorithm, so too can a program implement another program, although that usually requires both programs to be written in different languages. For example, two different Lisp compilers produce two different machine code programs, both implementing the same Lisp program. I will therefore be using the two terms, algorithm and program, somewhat interchangeably unless it is clear from context that only one applies.

One of the nice things TLA+ teaches you is to think of an algorithm as an abstract concept. There is no computer (unless you want to simulate it), and no compilation — just a mathematical description, not unlike a mathematical model of a physical system. This lets you carefully think about what the algorithm is all about, or what it really does, but you can choose at what level of detail you want to describe it, what level of detail captures the essence of the algorithm. If you wish, you can say, “this algorithm sorts” adding no additional information; if you wish you can describe what’s required to happen all the way down to logic gates in the processor and their power consumption. This provides us with a nice separation of concerns: you can think about how an algorithm or a system works separately from how to cleanly organize the code that implements it.

Despite this long introduction, I still haven’t said anything about what kind of thing an algorithm is. Even if we forget about the difficulty of expressing an algorithm as a program, and suppose that there were some other precise way of describing an algorithm, that still doesn’t tell us anything about what an algorithm is. Even though virtually every story could be translated from its original language to English and then printed in black ink on white paper, it would be very reductive to say that a story is black English letters printed on white paper. It is also unsatisfying to say that a story is the common element to printed stories, spoken stories and acted stories. A story has some essential elements, and so does an algorithm, so let’s start listing them. An algorithm is clearly not a function, as both bubble-sort and mergesort compute the same functions, yet we consider them to be different algorithms. An algorithm is some series of steps – which we can call a process or a behavior – that take place through time. It doesn’t have to be actual physical time, but the steps depend on one another and are subject to causality, so we can call whatever dimension is related to causality “time”. But an algorithm does not necessarily describe a single process. For example, bubble-sorting two different lists would yield two different behaviors, or sequences of steps. So, if nothing else, one thing we can say for sure is that an algorithm is one or more behaviors. It turns out that this simple characterization can take us quite far.

Time in Computation

One of my goals in this series is to compare the mathematical theory of TLA+ with others mathematical theories of computation and programming. A much talked-about mathematical theory of programming, especially among programming-language researchers and enthusiasts, is the theory of functional programming. At its core, functional programming of the pure variety chooses to ignore the notion of time in computation, rather representing a program as a composition of functions. This is a major contrast with TLA, which represents the passage of (logical) time very explicitly. It is a direct result of this representation of time that programs with side effects, pure computation, batch programs, interactive programs, sequential programs, concurrent programs, parallel programs, distributed programs — any kind of program imaginable — are represented in exactly the same way, and reasoning about all kinds of programs is also done in the same way. In fact, it is impossible to formally tell the difference between a program that does pure calculations and one that has lots of side-effects in TLA. In fact, it is also impossible to formally tell the difference between sequential, concurrent and parallel programs in TLA. The difference between them lies entirely in how we choose interpret TLA formulas.

I believe that the first thorough, formal treatment of time in the context of programming was the situation calculus, introduced by John McCarthy (Lisp’s inventor) in his 1963 paper, Situations, Actions, and Causal Laws, as part of his research of AI. He writes:

Intuitively, a situation is the complete state of affairs at some instant of time. The laws of motion of a system determine from a situation all future situations. Thus a situation corresponds to the notion in physics of a point in phase space. In physics, laws are expressed in the form of differential equations which give the complete motion of the point in phase space.

Around the same time, Arthur Prior was working on temporal logic, a simpler formalism than McCarthy’s which was later introduced into computer science by Amir Pnueli, in his paper The Temporal Logic of Programs. Pnueli describes his formalism as one “in which the time dependence of events is the basic concept”. TLA is simpler still.

While I pointed out in part 1 that preference for different formalisms is often a matter of aesthetic taste, the fact that all kinds of computation are expressed in TLA in one simple manner — because of the treatment of time — that questions of side-effects or concurrency not only do not pose any serious difficulty but are entirely a non-issue, and that reasoning about different kinds of programs can be done in exactly the same straightforward way, at least suggests that TLA is a formalism that describes computation in a very “natural” way, meaning there is little friction between the formalism and what it seeks to model. This harmony, as we’ll see, comes at no cost in terms of our ability to talk about programs in very abstract or very concrete ways.

Introduction to TLA

TLA describes algorithms not as programs in some programming language but as mathematical objects, detached from any specific syntactic representation, just like the number four represents the same mathematical object whether it is written as the arabic numeral 4 or as the roman numeral IV. TLA views algorithms and computations as discrete dynamical systems, and forms a general logical framework for reasoning about such discrete systems that is similar in some ways to how ordinary differential equations are used to model continuous dynamical systems, but with constructs that are of particular interest to computer scientists.

In The Temporal Logic of Actions, Lamport writes:

A[n]… algorithm is usually specified with a program. Correctness of the algorithm means that the program satisfies a desired property. We propose a simpler approach in which both the algorithm and the property are specified by formulas in a single logic. Correctness of the algorithm means that the formula specifying the algorithm implies the formula specifying the property, where implies is ordinary logical implication. We are motivated not by an abstract ideal of elegance, but by the practical problem of reasoning about real algorithms. Rigorous reasoning is the only way to avoid subtle errors… and we want to make reasoning as simple as possible by making the underlying formalism simple.

TLA itself does not specify how data and primitive operations on it are defined, but only concerns itself with describing the dynamic behavior of a program, assuming we have some way of defining data and data operations. TLA therefore requires a “data language” or a “data logic”. In part 2 we saw the data logic provided by TLA+.

Let’s now look at how Euclid’s algorithm is expressed in TLA+. Note that we could use Euclid’s algorithm to define a GCD operator as we learned in part 2, like so:

But this does not describe an algorithm in TLA+; it is just another mathematical definition of the greatest common divisor, and not a very good one, as it is unclear that the greatest common divisor is the object being defined. Rather, this is Euclid’s algorithm in TLA+:

assuming that we treat either $x$ or $y$ as the “output” once the algorithm terminates (there are some missing details here, like how $Init$ and $Next$ are combined to form a single formula, as well an important detail of how termination is modeled, but this is “morally” correct, and, in fact, all you need to try the algorithm in the TLC model checker).

Your first objection may be that unlike my original English description of the algorithm, in this one you can tell that the algorithm is imperative because it contains the expression $x’ = x - y$ , which looks like an imperative assignment. It is not. All it means is that the variable, i.e. the name, $x$ will refer to the value $x - y$ in the next program step. If you look at a pure functional implementation of the algorithm, you see that the same happens there:

gcd ( x : Int , y : int ) = if x = y then x else if x > y then gcd ( x - y , y ) else if x < y then gcd ( x , y - x )

The variables x and y stand for different values at different times in the program’s execution, yet no imperative assignment is involved.

Another objection may be that, like a program, the above specification is only one particular description of the algorithm, where others could be chosen. For example, instead of two separate variables, $x$ and $y$, I could have used a single variable assigned a pair of values. However, that seemingly different description would actually be equivalent to the one above; it would still be the same algorithm in a very precise sense. TLA defines an equivalence relation under which two formulas that describe the same algorithm informally also describe the same algorithm formally. The specification above is Euclid’s algorithm, or, rather, it describes the mathematical object that is the algorithm; it is equivalent to all other descriptions of the algorithm, but not to different algorithms for computing the GCD (in part 4 we’ll learn how we can define exactly what it means for different algorithms to be the same or different).

A property of an algorithm can be something like “returns a positive, even integer”, or “runs in $O(n\log n)$”, or “finds the greatest common divisor”. Properties are things that, in the programming languages I mentioned above, you’d normally describe in a contract (Eiffel, SPARK, Clojure) or as a type (Agda, Idris). But in TLA, just as one algorithm can specify another by being a more abstract, less detailed, description — or, in other words, an algorithm can be a “type” — so too can a type describe an algorithm; as we’ll see, there is no distinction in TLA between an algorithm and an algorithm property.

TLA describes sequential, concurrent, parallel and even quantum algorithms — and their properties — all using a simple formalism on top of the logic that specifies data and operations/relations on it (like the one we’ve seen in part 2), of which only equality, = , is strictly required by TLA. It has just four constructs — $’$, $\Box$, , $\EE$ — only one of which, prime, is absolutely necessary in practice if all you want to do is specify programs and check their correctness in the TLC model-checker. This minimalism is a testament to the power and elegance of the logic.

The Standard Model, Safety and Liveness

We will describe TLA in two iterations: first a simple but naive version of TLA which Lamport calls “raw” TLA, or rTLA, and then, after describing the problems with rTLA, we’ll get to TLA, which fixes those problems.

TLA (and rTLA), is a logic, and a logic has both syntax and semantics. I will start with the semantics or, rather, with the conceptual framework that will form the semantics of the logic. This is a natural place to start because this was also the chronological order of development of the ideas behind TLA.

Computation As a Discrete Dynamical System

There are many ways to describe computation. One way is to describe a computation as a mathematical function. But this description is too imprecise. For example, a bubble-sort algorithm and a merge-sort algorithm both compute the same function (from a list of, say, integers, to a sorted list of the same integers), but if our formalism equated computation with functions, we would not be able to tell the two apart, and we know that the two have different properties — like their computational complexity — that we may be interested in, and that a universal formalism must be able to reason about. In addition, modeling important classes of algorithms, like interactive and concurrent systems, as functions is inconvenient, requiring a different treatment from sequential computations. For example, while even the most complex of compilers can be described as a single function from input to output, a trivial clock program that continuously displays the time cannot.

Instead, we will describe a computation as a discrete dynamical process. A dynamical process is some quantity that is a function of time. If time is continuous, we say that the dynamical process is continuous; if time is discrete (and so most likely denotes some logical time rather than physical time), the process is discrete. A continuous process can be expressed mathematically as a function of a positive real-valued time parameter. A discrete process is a function of a positive integer parameter or, put simply, a sequence, usually called a behavior or a trace.

But a sequence of what exactly? We have a few choices. We can have a sequence of what the program does at each step; that would make it a sequence of events (usually, the word action is used instead of event, but I will use event so as not to confuse it with the different concept of action in TLA). An event can be thought of as some output that the program emits or an input it consumes; it can be thought of as an effect. Another alternative is to describe a computation as a sequence of program states. Yet another is as a sequence of states and events (each event occuring at the transition between two consecutive states).

Process calculi (like the π-calculus or CSP) use a trace of events to describe a computation. This choice was made because process calculi use the language-based, or algebraic approach to specification, and programming languages don’t like to explicitly mention the program’s state, which includes control information like the program counter (the index of which instruction is to be executed next). But this choice has a major drawback on reasoning. Consider the following two flowcharts describing two programs (branching indicates nondeterministic choice):

%3 A A B B A->B a C C B->C b D D B->D c " %3 A A B B A->B a C C A->C a E E B->E b D D C->D c "

Both programs generate the same two possible traces: $\set{a \to b, a \to c}$, so they are similar in some sense, yet they are clearly different in another. This means that an algorithm cannot be uniquely defined by its set of possible traces, and we will therefore need some other — more complicated — notion of equivalence. After all, a definition of any kind of mathematical object must include a notion of equivalence that defines when two objects of that kind are the same.

This leaves us with representing a computation as a sequence of states or a sequence of states and events. We note that a sequence of states and events $s_1 \xrightarrow{e_1} s_2 \xrightarrow{e_2} s_3…$ can be easily represented as this sequence of states alone: $\seq{s_1, -} \to \seq{s_2, e_1} \to \seq{s_3, e_2}\ldots$ So we’ll pick the simpler option, use a sequence of states. This is also nicely analogous to continuous systems, which are also functions from time to state.

The Standard Model

We define a computation — an execution of an algorithm — as a behavior. A behavior is an infinite sequence of states, and a state is an assignment, or mapping, of variables to values. What values a variable can take is defined separately by what we’ll call the data logic; in part 2 we’ve seen that in the data logic of TLA+, a value is any set of ZFC set theory. While behaviors are always infinite, we call behaviors that have a terminal state that is reached in some prefix of the infinite sequence and then never changes terminating behaviors. An abstract system is a collection of behaviors. This is what Lamport calls the standard model.

In this post, will use the term “set” loosely, and interchangeably with “collection”, but note that a collection of behaviors may or may not be a formal set in the set theory sense, depending on the data logic. For example, in TLA+, since a variable can take any set as a value, a collection of behaviors may be “too large to be a set” in ZFC, meaning it may not be a set that can be constructed using the ZFC axioms, but rather a proper class.

An algorithm is a kind of an abstract system because we can think of it as something that generates different behaviors, different executions. Note that not every abstract system, i.e., a set of behaviors, can be reasonably called an algorithm because we require of algorithms that they have some finite description, and by a simple cardinality argument, there are many more sets of behaviors than there are possible strings of finite length. In addition, there may be other limitations (such as computability) that depend on what operations our data logic allows. But this is only a starting point. We will gradually refine the notion of an algorithm so that it nicely coincides with our intuitions as well as other definitions.

How does an algorithm generate more than one behavior? Because an algorithm allows for nondeterminism, which simply means that the specification allows for more than one state at some step of the program. What is the source of the nondeterminism? Input from the environment is one such source. Notice that we didn’t define the algorithm in any parameterized way, like “a function of an integer argument”. If an algorithm is a deterministic subroutine with a single integer parameter called, say, x , then in our standard model the algorithm will be the set of all of the routine’s behaviors, one for each possible integer value of x . Another source of nondeterminism may be the behavior of the operating system as it schedules processes. At any step, the OS may execute an instruction of one thread of the application or of another. However, the most important source of nondeterminism is our own choice of level of detail. We may choose to specify an algorithm down to the level of machine instructions or we may leave some details unspecified. The most important insight is that all of these kinds of nondeterminism are actually of all of the same kind — the last. When we specify anything, we describe only certain aspects of its behavior; nondeterminism is all the rest.

A property of a behavior is any collection of behaviors. For example, “the x variable is a natural number between 3 and 5” is the set of behaviors whose x variable is always between 3 and 5.

Now, what is a property of an algorithm or a system? If a property of behaviors is a set of behaviors, then a property of systems should be a set of systems, namely a set of sets of behaviors. However — and this is a crucial point in the design and theory of TLA — if we are willing to restrict the kind of algorithm properties we allow, we can simplify things greatly. If we only allow discussing properties that are true of an algorithm if and only if they are true of all its behaviors — those that say what an algorithm must or must not do in every execution — then we can define a property of an algorithm to also be just a set of behaviors. The algorithm satisfies a property iff it — i.e. the set of its behaviors — is a subset of the property. There is no ambiguity.

How can we enforce this restriction? By making our logic first-order. As we saw in part 2, in a first-order logic, the model of any formula is a set of objects in the logic’s structure. What does this buy us? This means that properties of behaviors, algorithms, and properties of algorithms are all the same kind of objects — a collection of behaviors. This is an extremely powerful idea. One of the selling points of research programming languages with dependent types is that the algorithm properties they specify — their types — can make full use of the language syntax. Semantically, however, types and programs are completely distinct: you can’t run a type and you can’t use a program as a property (at least, not in the same sense as in TLA). This separation makes sense for a programming language, as it ensures that programs are specified at a level that’s fit for efficient execution, but unifying programs and programs properties makes a formalism for reasoning simpler.

Safety and Liveness

Let us now define two kinds of properties: safety and liveness. We don’t need to say whether we’re talking about a property of behaviors or a property of algorithms, as the two are the same in our framework. A safety property, intuitively speaking, specifies what a behavior must never do. For example, “ x is always an even number”, is a safety property that is violated if x is ever not an even number. A liveness property says what a behavior must eventually do. For example, “ x will eventually be greater than 100”. Where sequential algorithms are concerned, partial correctness, meaning the assertion that if the program terminates then the result it produces will be the expected one, is a safety property. The only interesting liveness property of a sequential algorithm is termination, meaning the assertion that the program will eventually terminate. Interactive and concurrent programs, however, may have many interesting liveness properties, such as, “every request will eventually yield a response”, or “every runnable process will eventually run”. Worst-case computational complexity, of time or space, is a safety property, as it states that the algorithm must never consume more than some specific amount of time or memory. It may seem obvious that the safety property of worst-case time complexity implies the liveness property of termination, but actually, whether it does or not depends on the specific mathematical framework used (to see how it may not, consider that there may be different relationships between a state in a behavior and what constitutes a program step in the context of time complexity, or different ways to map the states in the behavior to time; but we’ll get to all that later).

The names “safety” and “liveness” in the context of software verification were coined by Lamport in his 1977 paper Proving the Correctness of Multiprocess Programs, and their interesting topological properties, which we will now explore, were discovered by Bowen Alpern and Fred Schneider in their 1987 paper, Recognizing Safety and Liveness. This part is among the least important part of the theory of TLA+ for practical purposes, but I personally find the view of computation from the perspective we’ll now explore to be extremely interesting.

Notice the following characteristics of safety and liveness properties: because a safety property says that nothing “bad” must ever happen, a behavior violates it iff some finite prefix of it (remember, behaviors are always infinite) violates the property, as the “bad” thing must happen at some finite point in time in order for the property to be violated. In contrast, a liveness property — something “good” must eventually happen — has the quality that any finite prefix of a behavior could be extended to comply with the property by adding appropriate states that eventually fulfill the property.

If $\sigma$ is some behavior, we will let $\sigma^n$ denote the prefix of length $n$ of the behavior. If we have an infinite sequence of behaviors, $\sigma_1, \sigma_2, …$ we’ll say that the behavior $\sigma$ is their limit, namely $\sigma = \lim_{i \to \infty}\sigma_i$ or just $\sigma = \lim \sigma_i$, iff, intuitively, the prefixes of the behaviors converge to $\sigma$. More precisely, $\sigma = \lim \sigma_i$ iff for every $n \geq 0$ there exists an $m \geq 0$ such that $i > m \implies \sigma_i^n = \sigma^n$. If $S$ is a set of behaviors, we say that the behavior $\sigma$ is a limit point of $S$, iff there are elements $\sigma_1, \sigma_2, …$ in $S$ such that $\sigma = \lim \sigma_i$.

This definition of a limit allows us to turn our universe of behaviors into a topological space, by defining a set $S$ of behaviors to be closed iff it contains all of its limit points. We then define the closure of any set $S$ of behaviors, denoted $\overline{S}$, as the set of all limit points of $S$. $\overline{S}$ is the smallest closed superset of $S$, and $S$ is closed iff $S = \bar{S}$. We will say that a set $S$ of behaviors is dense iff any behavior whatsoever in the space is a limit point of S.

Back to safety and liveness: Let $P$ be some safety property (i.e. a set of behaviors). If a behavior $\sigma$ does not satisfy $P$, i.e. $\sigma

otin P$, then the property is violated by some finite prefix of the behavior. Let’s say that the first state that violates the behavior is at index $n$. By the definition of convergence, if some subset $\sigma_i \subseteq P$ were to converge to $\sigma$ then all elements of the sequence of $\sigma_i$ would need to coincide with $\sigma^n$ starting at some index, but that would mean that all of the elements after that index would no longer be in $P$ and we get a contradiction. This means that any $\sigma

otin P$ cannot be a limit point of $P$, which means that $P$ contains all its limit points; a safety property is therefore a closed set. In fact, this works in the other direction, too: every closed set can be distinguished by a finite prefix, and so every closed set is a safety property. Now, let $L$ be some liveness property, and let $\sigma$ be any behavior whatsoever. As we said, any finite prefix of any behavior could be extended by adding more states to it so that it is in $L$. This means that for any finite prefix $\sigma_i$ of $\sigma$, we can create a behavior $\hat{\sigma_i}$ such that $\hat{\sigma_i} \in L$ but a prefix of length $i$ of $\hat{\sigma_i}$ is equal to $\sigma_i$ i.e. $\hat{\sigma_{i_i}} = \sigma_i$. This means that $\sigma = \lim \hat{\sigma_i}$. But as $\sigma$ is completely arbitrary, this means that every behavior in the entire space is a limit point of $L$; a liveness property is a dense set. This works in the other direction, too: if $L$ is some dense set, and $S$ is any set, every finite prefix of a behavior in $S$ could be extended by adding states so that it lies in $L$; a dense set is a liveness property.

It is a theorem that every set in a topological space is equal to the intersection of some closed set and some dense set. This means that any system, and therefore any algorithm, is an intersection of a safety property, stating what “bad” things must never happen, and a liveness property, stating what “good” thing must eventually happen.

Temporal Formulas

Let us now turn our attention to the syntax of the logic. TLA (and its simplified version, rTLA, which we’re starting from) is a temporal logic, which is a kind of modal logic. Whereas in the ordinary, non-modal logic we saw in part 2, a model is an assignment of values to variables that makes the formula true, in modal logic there are multiple modalities, or worlds, in which variables can take different values; $x = 1$ in one world and $x = 2$ in another. A variable that can have different values in different modalities is called a flexible variable. A variable that must take the same value in all modalities is called a rigid variable. In temporal logic, the different modalities represent different points in time. $x = 1$ at one point in time and $x = 2$ in another.

There are two basic kinds of temporal logic. The first, called linear temporal logic, or LTL, views time as linear, and the time modalities are points on a line. The second, computation tree logic, or CTL, views time as branching towards many possible futures, and the time modalities are nodes of a tree. Interestingly, LTL and CTL are incomparable, meaning there are things that can be expressed in one and not the other. There is, however, a consensus that LTL is easier to use, and probably more useful, and TLA is a linear-time logic (although it only borrows one construct from LTL), as that fits with our view of computations as behaviors, which are sequences rather than trees.

I will call the flexible variables of our logic temporal variables, and like the rigid variables — declared with the keyword or introduced with the quantifiers $\A$ and $\E$ — temporal variables are declared with the keyword (or its synonym ), or introduced with a temporal quantifier. A temporal variable may have a different value at each state of a computation, i.e. it may change during a single behavior – an execution of an algorithm. It is crucial to remember, however, that while s may not change during a behavior, they may be different for different behaviors.

Whereas a model for a formula in an ordinary logic is an assignment of values to variables that satisfies the formula, a model for a temporal formula is an assignment of values to variables in all modalities, i.e., all states of a behavior. The model, or meaning, or formal semantics of a temporal formula is, then, the set of behaviors that satisfy it. We will explain how a temporal formula is constructed by building it up from different kinds of simple expressions.

Our expressions have four syntactic levels. An expression that does not refer to any temporal variable, either directly or through some definition it refers to, is called a constant expression (level-0). An expression that refers to temporal variables, either directly or through a definition (and doesn’t also contain the operators we haven’t yet discussed) is called a state function (level-1), because it is some function of the state (remember, the state is the value of all temporal variables). For example, if $x$ and $y$ are temporal variables (declared with ), then the expression $x$ is a state function, as is $x * 2$, as is $x + y$. So is the expression $x < y$, but because that state function denotes a boolean value we will call it a state predicate (which is just a name for a boolean-valued state function).

Some TLA+ constructs, like declarations, can only be constant expressions; we cannot a state predicate or a formula of any of the higher levels we will now see.

Now things get more interesting. In TLA, we can write an expression that is not a function of a single state, but of two states. A state is an assignment of values to temporal variables, and when we want to talk about two different states, we need to consider two different assignments to any variable. We will refer to the variables in one state, say $x$, by denoting it as usual, and to its value in the second state with $x’$ (read, “$x$ prime”). We can now write expressions or predicates about two states, e.g. $x’ > x$, which says that the value of $x$ in the primed state is greater than the value of $x$ in the unprimed state. Why would we want to do that? As every TLA expression relates to a modality — or a point in time. By writing expressions relating two states, we can describe the relationship between the system’s states at two different points in time, and thus describe its evolution. The “current” state is always defined, and as our temporal modalities are arranged as an infinite sequence, a “next” state is also always defined. Unprimed variables will refer to the current state, and primed variables would refer to the next state. If the expression $e$ is a state function (or a state predicate), then $e’$ is equal to the expression $e$ with all the temporal variables in it — and in definitions it references — primed. So the value of the state function $e’$ is the value of the state function $e$ in the next state. An expression that contains primed variables or primed subexpressions is called a transition function or an action expression, or, if it is a predicate, a transition predicate (level-2). Transition predicates are better known as actions, the very same actions that give TLA its name and form its very core.

Only constant and state-function (i.e. level-0 and level-1) expressions can be primed. Priming an expression that already contains primed sub-expressions or other temporal operators we’ll get to soon, is a syntax error, and is ill-formed. Of course, as a constant has the same value in all states, priming it has no effect.

So actions, or transition predicates, are just predicates of two states, one denoted with unprimed variables and the other with primed variables. If the predicate is true, then the primed state is a possible next-state of the current, unprimed state.

For example, the expression $x’ = 1$ is a transition predicate — an action — which states that the value of $x$ in the next state will be 1. The expression $x’ = x + 1$ is also an action, this time one that says that the value of $x$ will be incremented by 1, as is $x’ = x + y$, or $x’ \in Nat \land x’ < y’$, which says that the next value of $x$ will be some natural number which is less than the next value of $y$.

If $A$ is an action, $\ENABLED A$ is a state predicate stating that there exists some next state that satisfies A. For example, if:

then $A$ could be read as “if $x$ is even in the current state, then in the next state, $x$ could be either $\frac{x}{2}$ or $-\frac{x}{2}$”. $A$ specifies a next state only if $x$ is currently even, so, in this case, $\ENABLED A \equiv x \%2 =0$.

For the sake of completeness, I will mention the action composition operator, $\cdot$ , even though it is unsupported by any of the TLA+ tools, and its use is discouraged. If $A$ and $B$ are actions, then $A \cdot B$ is the action that is true for the transition $s \rightharpoonup t$ iff there exists a state $u$ such that $A$ is true for $s \rightharpoonup u$ and $B$ is true for $u \rightharpoonup t$; basically, it’s an $A$ step followed by $B$ step rolled into a single transition.

So far we’ve been talking about expressions, but now have the basic components to start talking about formulas. If $F$ is a formula and $\sigma$ is a behavior, we will write $\sigma \vDash F$ if $F$ is true for the behavior, or $\sigma$ satisfies F. The meaning, or semantics of $F$ (which we write as $\sem F$) is all behaviors $\sigma$ such that $\sigma \vDash F$. I will denote — unlike in the previous section — the (i+1)’th state of $\sigma$ as $\sigma_i$, so $\sigma = \sigma_0 \to \sigma_1 \to \sigma_2 \to \ldots$ Also, $\sigma^{+n} = \sigma_n \to \sigma_{n+1} \to \ldots$, namely the suffix of $\sigma$, with its first $n$ states removed.

We will define well-formed formulas recursively. If $F$ and $G$ are formulas, then so is $F \land G$, and a behavior $\sigma$ satisfies $F \land G$ iff it satisfies both $F$ and $G$ (formally $\sigma \vDash (F \land G) \equiv (\sigma \vDash F) \land (\sigma \vDash G)$). $

eg F$ is a formula, and is satisfied by a behavior $\sigma$ iff $\sigma$ does not satisfy $F$. Similarly, we can define the meaning of all other connectives, but let’s take a look at $\implies$: $\sigma \vDash (F \implies G) \equiv (\sigma \vDash F) \implies (\sigma \vDash G)$. This means that if $F$ and $G$ are formulas, $F \implies G$ iff every behavior that satisfies $F$ also satisfies $G$, or, in other words, the set of behaviors defined by $F$ are a subset of those of $G$’s. This relation forms the core of what it means for an algorithm to implement another or satisfy a property, and we’ll take a closer look at it later.

Now, if $P$ is a state predicate then it is also a formula, and $\sigma \vDash P$ iff the predicate is satisfied by the first state of the behavior. For example, if $x$ is a temporal variable, the formula $x = 3$ denotes all behaviors in which $x$ is 3 in the first state.

If $A$ is an action — namely a transition predicate, or a predicate of two states — then it can also serve as a formula, and $\sigma \vDash A$ iff the action is satisfied by the first two states of the behavior. For example, if $x$ is a temporal variable, the formula $x = 3 \land x’ = x+1$ denotes all behaviors in which $x$ is 3 in the first state and 4 in the second. The formula $x \in Nat \land x’ = x+1$ denotes all behaviors in which $x$ is some natural number in the first state and is incremented by 1 in the second.

Now lets introduce the temporal operator $\Box$, borrowed from LTL. If $F$ is a formula then $\Box F$ is also a formula, and $\sigma \vDash \Box F$ iff $\sigma^{+n} \vDash F$ for all $n$ (i.e. $\sigma \vDash \Box F \equiv \A n \in Nat : \sigma^{+n} \vDash F$).

Therefore, if $P$ is a state predicate, then $\Box P$ means that $P$ is true for every state, because $\sigma \vDash \Box P$ iff $\sigma^{+n} \vDash P$ for all $n$ and the first state of $\sigma^{+n}$ is $\sigma_n$. If $A$ is an action, then $\Box A$ is a formula that means that $A$ is true for every pair of consecutive states, $\seq{\sigma_i, \sigma_{i+1}}$, because $\sigma \vDash \Box A \equiv \A n \in Nat : \sigma^{+n} \vDash A$, and the first two states of $\sigma^{+n}$ are $\seq{\sigma_n, \sigma_{n+1}}$.

How the $\Box$ operator works on general formulas, and how to quickly understand temporal formulas, will become clear when we analyze this example, taken from Specifying Systems:

So a behavior satisfies $\Box((x = 1) \implies \Box(y>0))$ iff if $x = 1$ at some state $n$, then $y > 0$ at state $m + n$ for all $m \in Nat$. The box operator therefore means “always” or “henceforth”, and the above formula means that if $x$ is ever one, then $y > 0$ from then on.

If $F$ is some formula, we define the dual temporal operator $\Diamond$ (diamond) as: $\Diamond F \equiv

eg \Box

eg F$. $\Diamond F$ therefore means “not always not F”, or, more simply, eventually F.

The temporal operators can be combined in interesting ways, so, for example, $\Diamond \Box F$, or “eventually always F”, means that in all behaviors that satisfy this formula, $F$ will eventually be true forever. As termination is defined as a state that stutters forever, such a formula can specify termination, e.g., for a formula with some tuple of variables $v$, the proposition $\E t : F \implies \Diamond\Box (v=t)$ states that the behaviors of $F$ terminate. $\Box \Diamond F$, or “always eventually F” means that at any point in time $F$ will be true sometime in the future, or, in other words, $F$ will be true infinitely often. Or we can define an operator (built into TLA+) that says that “$F$ leads to $G$”, meaning that if $F$ is ever true, then $G$ must become true some time after that, like so:

Because $\Box$ means “always” and $\Diamond$ means “eventually” you may — quite naturally — be tempted to think that $\Box$ defines only safety properties, while $\Diamond$ defines only liveness properties, but you would be wrong on both counts when the formula contains actions (and that’s a problem which will require addressing!), as we’ll see in the next section.

The following pairs of tautologies hold for the temporal operators:

The operators $\Box$ and $\Diamond$ are duals, and from any temporal tautology we can obtain another by substituting

and reversing the direction of all implications.

TLA has another, rarely used, temporal operator, $\whileop$, that cannot be directly defined in terms of the operators we’ve introduced and which we’ll cover in part 4.

Finally, just as in addition to declaring free ordinary, i.e. rigid, or constant variables with a declaration, we can also introduce bounded ordinary variables with the quantifiers $\A$ and $\E$, so to in addition to declaring free temporal variables with , we can introduce bounded, or quantified, temporal variables with temporal quantifiers.

The existential temporal quantifier, $\EE$, is the temporal analog of regular existential quantification. $\EE x : F$ , where $x$ is a temporal variable, basically says, “there exists some temporal assignment of $x$ such that $F$”; instead of a single value for $x$, it asserts the existence of a value for $x$ in each state of the behavior.

For those of you interested in the finer details of formal logic, the existential temporal quantifier behaves like an existential quantifier because it satisfies the same introduction and elimination rules:

There is also the dual universal temporal quantifier, $\AA x : F\defeq

eg\EE x:

eg F$. The existential temporal quantifier is used to hide internal state; we will talk a lot about that in part 4. The universal temporal quantifier is hardly ever used.

As with the ordinary quantifiers, tuples of variables can also be used with a temporal quantifier, so I will usually write $\EE v : …$ to denote the general form of the existential temporal quantifier with some tuple of variables. Unlike ordinary quantifiers, however, the TLA+ syntax does not support bounded temporal quantification (e.g. $\EE x \in Int : F$ is illegal).

It is also possible to define a temporal operator in the same vein, but Lamport writes that he did not add it to TLA+ because it is not necessary for writing specifications.

Note that the four expression levels form a linear hierarchy. A constant predicate (level-0) is a degenerate form of a state predicate (level-1), which is a degenerate form of an action (level-2), one that states that if the predicate holds in the current state then any next state is possible, which is a degenerate form of a temporal formula (level-3). TLA+ (TLA, really) enforces levels at the syntax level and applying operations to expressions with a level for which they are not defined is a syntax error; this is the second kind of property TLA+ enforces at the syntax level similarly to typed languages (the first was the enforcement of operator arity we saw in part 2). So technically, TLA has exactly four types, arranged in a single linear hierarchy.

Also note that the operators $’$ and $\Box$ (and so $\Diamond$, too) are different from the “ordinary” logical operators we saw in part 2. If the value of the variable $x$ is 3 at some point in time, then $x’$ is not the same as $3’$ (which, 3 being constant, is equal to 3). Similarly, if the value of $A$ is at some point in time, then $\Box A$ is not the same as . This is a feature (called referential opacity) of modal logic, and, in fact of a larger class of logics that modal logics belong to, called intensional logic. Intensional logics have great expressive power – i.e., they can say more things about their universe of discourse than ordinary, non-intensional (or extensional) logics can – in our case, being able to talk about systems at various points, of our choosing, in their behavior over time – but this power comes at the cost of certain inferences not being available (as we’ll see in the chapter, Invariants and Proofs).

Another Take on Algorithms

Actions and $\Box$ are combined to specify algorithms in the following way. Take a look at the formula (assuming $\VARIABLE x$),

This formula defines the following behavior: in the first state $x$ is 0 (recall how state predicates are interpreted), and then it is incremented by 1 at every step. We’ve defined what is clearly an algorithm by specifying an initial state, $x = 0$, and a transition, or a next state relation, $x’ = x + 1$. Using a formula of the form $Init \land \Box Next$ , where $Init$ is the initial condition and $Next$ is an action, we’ve defined a state machine! Now we are ready to refine the definition of an algorithm expressed in TLA: an algorithm is a state machine of the form above. This definition is still wrong, but we’ll fix it later.

We will look at state machines in much more detail and see what a powerful mechanism for defining algorithms they are, but it’s crucial to point out that the expression $x’ = x + 1$ is absolutely not an assignment in the imperative sense, meaning, it is not x := x + 1 . Rather, it is a relation on states called the next state relation, that we can denote with $\rightharpoonup$, such that $s \rightharpoonup t$ iff the state $t$ can follow the state $s$ in a behavior. This relation is written in TLA as an action, which is a predicate on two states given as two sets of variables: unprimed and primed. We could have just as well written $x’ - x = 1$ (this is not quite true in TLA+, as we’ll see in a moment, but it is “morally” true).

To drive this point home, let me give a few more examples. We can write an action, $x’ \in \set{-1, 1}$ that means that in the next state, the value of $x$ will be either 1 or -1, but we can write an equivalent action like so: $x’ \in Int \land (x^2)’ = 1$. This is because, as per our definition of the prime operator, $(x^2)’ = (x’)^2$. We can write the action $x’ = x + 1 \land y’ = -y$, but we can write it equivalently as $\seq{x, y}’ = \seq{x + 1, -y}$. It may be hard for people used to programming languages to grasp, but this works not because there is some kind of clever binding or destructuring or unification going on here, but merely because $\seq{x, y}’$ is the same as $\seq{x’, y’}$ and $\seq{x’, y’} = \seq{x + 1, -y}$ is just a predicate that is true iff $x’ = x + 1 \land y’ = -y$. It’s all just simple math.

The reason $x’ - x = 1$ cannot quite be used instead of $x’ = x + 1$ in the paragraph before last lies in the untyped data logic of TLA+ (and so it is outside the scope of our present discussion, but I think it is worth an extra paragraph here, as it may be a source of confusion in real TLA+ specifications). It is the same reason that in ordinary arithmetic the equation $y = ax$ is not equivalent to $y/x = a$, even if we assume $x$, $y$ and $a$ are all real numbers – we cannot simply divide both sides by $x$, because $y/x$ is undefined when $x$ is 0. Similarly, in our case $x’ = x + 1$ is not equivalent to $x’ - x = 1$. In the first equation, equality can always be used because it is defined for all sets, and we can assume $x$ is an integer because it was 0​ in the initial state, and it remains an integer by induction. However, we cannot subtract $x$ from both sides because $x’$ may not be an integer (remember, an action is a predicate on all possible pairs of states), and subtraction may not be defined for it. Another way of thinking about it is that $x’ = \str{hi}$ may be a solution to the equation (in the sense that it cannot be ruled out), because $\str{hi}-\; x$ is undefined, which means it is some unknown value, which means it could be 1. Therefore, to write a proper action using the second form, we must write $x’ \in Int \land x’ - x = 1$, which is equivalent to $x’ = x + 1$ (assuming the initial state predicate requires $x = 0$, or even more generally, $x \in Int$; otherwise it is equivalent to $x’ \in Int \land x’ = x + 1$. In short, when writing an action, it must be defined for all potential next states (whether or not they are possible next states, i.e. next states for which the action is true), and to make sure it means what we want it to mean, we must ensure all operations used in an action are defined for all potential next states. There is, indeed, an asymmetry here, as we can make do with the operations being defined for all current states that are actually possible in the specification.

That actions are transition predicates has an important consequence. What does an imperative assignment statement like x := x + 1 say about the next value of the variable y ? Such an assignment statement means that only the value of x changes and nothing else, and so y remains unchanged. But what does the action $x’ = x + 1$ say about the next value of $y$? Well, the transition predicate holds true of the state transition $[x: 1, y: 5] \rightharpoonup [x: 2, y: 5]$ (and not of $[x: 1, y: 5] \rightharpoonup [x: 3, y: 5]$), but it also holds true for $[x: 1, y: 5] \rightharpoonup [x: 2, y: -500]$ or $[x: 1, y: 5] \rightharpoonup [x: 2, y: \str{hi}]$. In other words, because it says nothing about the next value of $y$, any value is allowed. To say that $y$ doesn’t change, we can write $x’ = x + 1 \land y’ = y$. We can also specify that several variables don’t change by writing, $\seq{x, y, z}’ = \seq{x, y, z}$, but TLA+’s keyword lets us write instead $\UNCHANGED y$ or $\UNCHANGED \seq{x, y, z}$.

What if we want to specify a program that takes the initial value as an input and counts up from there? We could define our algorithm using a parameterized definition like so:

But it is better and more useful to model inputs from the environment as unknowns, or nondeterministic quantities in our algorithm, and instead write:

This is still a single algorithm, but now it has many possible behaviors, one for each initial value, or input.

Now let’s see how we can say things about our algorithms. We’ll define the following algorithm that models an hour clock, and add another definition, that states that the hour is always a whole number between 1 and 12:

We can now make the following claim:

That $HourClock$ implies $WithinBounds$ means that $HourClock$ algorithm satisfies the $WithinBounds$ property, but you can also think of it in terms of behaviors. $HourClock$ is the collection of all behaviors where $h$ counts the hour, and $WithinBounds$ is the collection of all behaviors where $h$ takes any value between 1 and 12 at each step. The behaviors of $HourClock$ are all behaviors of the property, or $\sem{HourClock} \subseteq \sem{WithinBounds}$.

But remember that there is no real distinction between a property of an algorithm and an algorithm; both are just collections of behaviors. To help you see that more clearly, we’ll define the same property differently:

$CrazyClock$ and $WithinBounds$ are equivalent, i.e., $WithinBounds \equiv CrazyClock$ — they specify the exact same set of behaviors — but the way $CrazyClock$ is defined makes it easier for us to think of it in terms of a (nondeterministic) algorithm: it starts by nondeterministically picking a starting value between 1 and 12, and then, at every step, picks a new value between 1 and 12.

It is therefore true that $HourClock \implies CrazyClock$, but both of them are algorithms! We say that $HourClock$ implements, or is an instance of $CrazyClock$ and that $CrazyClock$ is a specification of $HourClock$, or that $HourClock$ refines $CrazyClock$ and that $CrazyClock$ is an abstraction of $HourClock$. But it all means the same: all the behaviors of $HourClock$ are also behaviors of $CrazyClock$.

Logical implication in TLA corresponds with the notion of implementation. In part 4 we’ll explore this general notion in far greater detail, and see that implication can denote very sophisticated forms of implementation by manipulating the subformula on the right-hand side of the implication connective.

Two Serious Problems

The logic we’ve defined, rTLA, suffers from two problems. Suppose we want to write an algorithm for a clock that shows both the hour and the minute (remember that a conjunction list, like that appearing after the $\Box$ operator, is read as if it is enclosed in parentheses):

We would very much like it to be the case that $\vdash Clock \implies HourClock$ because a clock that shows both the minute and the hour is, intuitively at least, an instance of a clock that shows the hour. Unfortunately, this isn’t true in rTLA, because while the behaviors of $HourClock$ change the value of $h$ at each step, in $Clock$’s behaviors, $h$ changes only once every 60 steps.

There is also a problem of a slightly more philosophical nature (although it, too, has pragmatic implications). The way we’ve identified the notion of an algorithm with formulas has a very unsatisfying consequence. To see it, let’s revisit our analogy between how continuous dynamical systems are specified with ODEs and how discrete dynamical systems are specified in TLA.

The following continuous dynamical system, $x(0) = 0 \land \dot{x} = 10$ specifies a system that begins at 0, and grows linearly with a slope of 10. It is analogous to this rTLA formula: $x = 0 \land \Box(x’ = x + 10)$. We could even make this clearer by the operator $d(v) \defeq v’ - v$ and rewriting the above formula as $x = 0 \land \Box(d(x) = 10)$.

However, ODEs can be given boundary conditions rather than initial conditions. This ODE specifies the same system: $\dot{x} = 10 \land x(10) = 100$. Could the same be done in rTLA? Absolutely. We’ll make time explicit ($\VARIABLES x, t$):

Or, if you prefer, we can use the equivalence $\Box(A \land B) \equiv \Box A \land \Box B$ to rewrite the formula as:

So, even though we don’t tell our “algorithm” where to start — the initial condition is $x \in Nat$ — it somehow knows that it must start at 0 because that’s the only way it will get to 100 at step 10.

While this may be fine for describing a behavior, it is not how we normally think of an algorithm (although this sort of nondeterminism — called “angelic” nondeterminism, as it helps the algorithm do the “right thing” — is frequently used in theoretical computer science to study complexity classes; in fact, NP problems are precisely those that can be solved in polynomial time with the help of angelic nondeterminism). When we think of algorithms we assume that it’s not a process that’s allowed to use information from the future.

We could get rid of nondeterminism altogether, but nondeterminism is what gives specifications their power, as it allows us to to say exactly what we know or what we care about, and leave out things we don’t know or details we don’t care about. Also, there are kinds of nondeterminism that fit perfectly with our notion of algorithms, such as nondeterministic input, or nondeterministic scheduling by the OS.

Even the kind of nondeterminism we’ve seen only poses a problem in those formulas we’d like to consider algorithms rather than just general properties of behaviors. We would certainly like to use such nondeterminism in properties that we’d show our algorithms satisfy, i.e., we’d like to show that:

but it would be nice to be able to tell whether a formula is a “proper” algorithm or not.

We could separate algorithms and properties into two separate semantic and possibly syntactic categories. Indeed, this is precisely the road taken by specification languages such as Coq, which use dependent types for specification: arbitrary nondeterminism is allowed only in types. This is pretty much required from programming languages that need to be able to compile programs — but not specifications of their properties — into efficient machine code, so it makes sense for the syntax rules to impose this constraint on descriptions of algorithms (namely, the body of the program).

But from a specification language designed for reasoning, not programming this would require a complicated formalism whereas we want simplicity, and it would also make the important ability of showing that one algorithm implements another much more complicated if not nearly impossible.

This problem is related to why it is not true that $\Box$ specifies only safety properties and $\Diamond$ specifies only liveness properties. In fact, $\Box$ can imply liveness properties and $\Diamond$ can imply safety properties. For example:

and if

then

because for $x$ to reach 10 by the time the time $t$ runs out, it must always increment.

Luckily, one simple solution solves all those problems.

Stuttering Invariance and Machine Closure

We’ll now see how TLA fixes the two problems we encountered with rTLA. Much of the analysis in this section is taken from the paper by Martín Abadi and Leslie Lamport, The Existence of Refinement Mappings, 1991, although the paper doesn’t talk about any formal logic, only about abstract behaviors.

As before, we’ll begin with the semantics.

Invariance Under Stuttering

For a behavior $\sigma$ we’ll define $

atural\sigma$ — the stutter-free form of $\sigma$ — obtained by replacing any finite subsequence of consecutively repeating states with just a single instance of the state. For example $

atural \seq{a,b,b,b,c,d,d,…,d,…} = \seq{a,b,c,d,d,…,d,…}$. Note that the trailing repetition of an infinite sequence of d’s is not replaced. We now define an equivalence relation on behaviors. Behaviors $\sigma$ and $\tau$ are equivalent under stuttering (we write $\sigma \simeq \tau$) iff $

atural \sigma =

atural \tau$. Furthermore, for a set $S$ of behaviors $\Gamma(S)$ is the set of all behaviors that are stuttering-equivalent to those of $S$. We say that $S$ is closed under stuttering, if $\Gamma(S) = S$, i.e., if for every behavior in $S$ all behaviors that are stuttering-equivalent to it are also in $S$ ($\sigma \in S \land \sigma \simeq \tau \implies \tau \in S$).

Now on to syntax. We say that a formula $F$ is invariant under stuttering iff its model — i.e. the collection of all behaviors that satisfy it — is closed under stuttering. A stuttering-invariant formula cannot distinguish between two stuttering-equivalent behaviors (meaning, it cannot be for one and for the other).

Because state functions/predicates only talk about one state at a time and are therefore trivially stuttering invariant, only actions can potentially break stuttering invariance. The formula $x = 1 \land \Box(x’ = x + 1)$ is not invariant under stuttering because it distinguishes between $\seq{1, 2, 3, 4, …}$ and $\seq{1, 1, 1, 2, 3, … }$, as it admits the first but not the second. However, the formula $x = 1 \land \Box(x’ = x + 1 \lor x’ = x)$ doesn’t, and is indeed stuttering invariant.

The following formula is also invariant under stuttering:

because all the added $\Diamond$ clause requires is that an increment of $x$ occurs at least once at some point, namely, that $x$ isn’t always 1. It still cannot distinguish between behaviors that are stuttering equivalent.

If $A$ is an action of the kind we’ve seen so far, and $e$ is some state function, we’ll define the actions:

In practice, $e$ is virtually always just a variable or a tuple of variables. It’s common in TLA+ specifications to define a tuple of all temporal variables, $vars \defeq \seq{x ,y, z}$, and write $[A]_{vars}$.

Now, instead of distinct categories for algorithms and properties that complicate matters considerably, all TLA does is enforce a syntax that ensures that all TLA formulas are invariant under stuttering, and all that takes are the following syntax rules:

When $\Box$ is immediately followed by an action, the action must be of the form $[A]_e$ When $\Diamond$ is immediately followed by an action, the action must be of the form $\seq{A}_e$

Additionally, an action is not a well-formed formula on its own; it must be immediately preceded by a temporal operator ($\Box$ or $\Diamond$).

So the formula $\Box(x’ = x + 1)$ is ill-formed in TLA (but is well-formed in rTLA) — it is a syntax error. Note that $\Box(x > 0)$ is still fine, as $x > 0$ is not an action but a state predicate (there are no instances of $x’$).

We similarly change the definition of $\EE$ to mean $\EE x : F$ is true for a behavior $\sigma$ iff it is true for a behavior $\tau$ obtained from $\sigma$ by changing the values of $x$ and adding or removing stuttering steps.

These rules ensure the following: every TLA formula is invariant under stuttering, and therefore cannot distinguish between two behaviors that are stuttering-equivalent.

This solves our first problem. Let’s revisit our clocks and fix their definition so that they’re valid TLA by placing the actions inside $[\ldots]_e$:

Indeed, now $Clock \implies HourClock$ (we’ll later see how we can show similar facts using syntactic manipulation).

The precise significance of the conjuncts ($\Box\Diamond(…)$) will be explained later (and we’ll see a shortcut for writing them), but because $\Box[A]_e$ can mean that the action $A$ is never taken (i.e. the initial state stutters forever), those conjuncts ensure the liveness property that the clocks keep ticking an infinite number of times. They never get stuck forever.

Stuttering invariance buys us something else, too. If we think of each state in a behavior as a snapshot in a moment of time, and of the transition between the states as the ticking of some universal clock, when we specify the behavior of variable $x$ there is always room for some other variable to change any (finite) number of times between any changes to $x$ – as we’ve seen in the clock example. But this means that we can now view a TLA specification not as specifying a single system defined by a state of some given temporal variables, but as specifying all of them, and by all of them, I mean all discrete dynamic systems whatsoever. Every state in a behavior is an assignment of values to all variables, not just the variables we mention in our formula. Thanks to stuttering invariance, a TLA formula doesn’t impose any speed or pace on the algorithm or property relative to all others. A TLA formula mentions how just a small finite subset of an infinite set of temporal variables change, and essentially says “I don’t know anything about any other variables”; it really means that it allows any assignment of any value to any of the variables not mentioned, including changes to unmentioned variables that may occur between changes to those mentioned. All systems exist in the same logical universe, which allows us to compare and compose them in interesting ways. Every formula describes a tiny bit of the behavior of the entire universe. Everything else is part of the nondeterminism of the formula — anything that the formula doesn’t determine.

In fact, every TLA formula specifies the entire world, which includes both the system and its environment. Proving that a formula entails some property guarantees that if the world behaves according to the specification, then the behaviors it exhibits will have the property. A particular kind of TLA formulas that explicitly separates the environment from the system and explicitly specifies their interaction is called an “open system specification,” and will be covered in part 4, but all formulas specify both the system and the environment, whether or not they are explicitly separated in the specification. The necessity for invariance under stuttering when specifying systems – i.e. things that exist in the physical world – as opposed to purely abstract mathematical systems is nicely explained in this short note by Lamport.

Machine Closure and Fairness

Now, we’ve seen that a formula of the form $\Box[A]_e$ only specifies a safety property and cannot imply a liveness property (as it can stutter forever), but the converse — that $\Diamond F$ cannot imply a safety property — isn’t true, which brings us to the second problem.

When we rewrite our rTLA formula analogous to the ODE we saw above to get a well-formed TLA formula we get:

This seems to avoid the problem, because now, instead of a behavior that knows the future and picks the right starting value for $x$ , we get many behaviors, some starting with $x=0$ and hitting 100 when $t=10$ and then continuing on incrementing $x$ forever (or stuttering), while others begin with other values, and necessarily stutter forever at some point before the tenth step. Say we start at $x = 2$ and get $x = 102$ when $t = 10$. At that point, the predicate inside the $[]_{\seq{t, x}}$ will be false, but the brackets add an implicit $\lor \seq{t, x}’ = \seq{t, x}$, which means that the behavior will stay at $t = 10, x = 102$ forever.

It doesn’t matter even if we write our formula like so:

However, we can reclaim the power to see the future if we change the $\Box$ in the last conjunct, a safety property, into a $\Diamond$:

We will call the initial condition and the transition relation (i.e. the $\Box[Action]$ part) — in this case, $t = 0 \land x \in Nat\land \Box[t’ = t +1 \land x’ = x + 10]_{\seq{t, x}}$ — the machine property of the specification, and any conjuncts starting with $\Diamond$, the liveness property. It is easy to see that the machine property is a safety property. The problem is this: while $\Diamond(t = 10 \land x = 100) $ in our example is indeed a liveness property, namely every finite prefix of any behavior can be extended so that it satisfies $\Diamond(t = 10 \land x = 100) $, not every behavior that satisfies the machine property can be extended while still complying with the machine property. This particular liveness property interferes with the safety condition of the machine property and rules out even finite prefixes — in fact it rules out all behaviors that don’t begin with $x$ behaving like $\seq{0, 10, 20, 30}$. It implies the safety property that $x$ must be 0 in the initial state. When a liveness property, concerned with what will eventually happen, interferes with a safety property, concerned with what must hold at every state and rules out behaviors that comply with the safety property, that means that information about the future is allowed to affect the present, and that is not how algorithms should be defined.

What we’d want is to make sure that the liveness property does not interfere with the machine property, which implies that every finite prefix of any behavior that satisfies the machine property can be extended so that it satisfies the liveness property. If we denote the set of behaviors that comprise the machine property as $M$ and the liveness property as $L$, we would like it so that $L$ would not specify any safety property (i.e. a closed set) not already implied by $M$. What we want is that $\overline{M \cap L} = M$. If that condition holds, we say that the pair, $\seq{M, L}$ is machine closed, or that $L$ is machine closed with respect to $M$ (or simply that $L$ is machine closed, if it is clear which machine property we’re talking about). A liveness property that is machine closed with respect to its machine property, is also called a fairness property.

A liveness property specifies what must eventually happen. A fairness property must do so in a way that does not dictate precisely what states are allowed at any specific time. The way this is achieved is by having the liveness property specify only that the algorithm moves forwards — makes progress — without specifying any targets it must progress towards. This can be intuitively thought about as specifying the job of the operating system and/or the CPU. Fairness can be thought of as the assumptions we make about how the OS schedules our program. Indeed, when we write a program, we operate under the assumption that it will be scheduled by an OS, but we also assume that the OS will treat our program fairly, and guarantee that it makes progress if it can. In TLA, if we care about liveness and want to prove what our program will eventually do, we must make this assumption explicit, and if our program is multi-threaded, we must state our assumptions about how multiple threads are scheduled (we will see how to model threads/processes in the next section). Both the algorithm and the scheduler can still make nondeterministic choices, but machine closure dictates that those choices can in no way be restricted by what has not yet happened, i.e., we allow nondeterminism, but not angelic nondeterminism. Fairness just guarantees progress.

Note that machine-closure is no longer a property of the model of an algorithm — a set of behaviors — but of how that algorithm is written as an intersection of safety and liveness properties. Indeed, two formulas can be equivalent, yet one would be machine-closed and the other won’t. If we replace $x \in Nat$ with $x = 0$ in the initial condition of the specification above, we’ll get one that’s equivalent to it (i.e., admits the very same set of behaviors), yet is now machine-closed, as the liveness property no longer places additional restrictions on the safety property. This is to be expected, because the problem of angelic nondeterminism demonstrates that a set of behaviors is not enough to fully capture the concept of an algorithm.

But how do we know that our liveness property is a fairness property and that our specification is machine-closed? The answer is that we don’t in general, but TLA+ has two built-in fairness operators, and if our liveness property is a conjunction of those operators, then we’re guaranteed that it’s machine-closed. The operators, for weak fairness and for strong fairness take an action $A$ as an argument, and ensure that it will eventually occur if it is enabled “often enough”. They only differ in what “often enough” means.

Weak fairness will ensure $A$ eventually occurs if it remains enabled from some point onward; in other words, the weak fairness condition is violated if at some point $\ENABLED \seq{A}_e$ remains true without $\seq{A}_e$ ever occurring. Here is the definition of :

Strong fairness requires $\seq{A}_e$ to occur infinitely often if $\ENABLED \seq{A}_e$ is true infinitely often, although it need not remain enabled constantly:

Strong fairness implies weak fairness, i.e., $\SF_e(A) \implies \WF_e(A)$.

When we specify a sequential algorithm and are interested in liveness, $Fairness$ can be just $\WF_{vars}(Next)$, where $vars$ is a tuple of all of our temporal variables. In the next section, when we learn how threads/processes are simulated, we’ll see why it may be necessary to have more than one / conjunct. Intuitively, because the $Next$ action will result in a step of one or several processes, we would need a more fine-grained specification of how the scheduler schedules the different threads, namely, we would need to state that each thread/process is scheduled fairly, meaning that every thread makes progress.

We can now refine yet again our definition of an algorithm, and say that an algorithm in TLA — or, at least a realizable algorithm — is anything that can be specified by a formula of the form

where $v$ is a tuple of temporal variables, and $w$ is a tuple of a subset of those variables. Of course, each of the components can be absent (and often the temporal quantifier is absent). This is called the normal form of a TLA formula. Such a formula is guaranteed to be machine-closed, provided that $Fairness$ is a conjunction of a finite number of $\WF/\SF(A)$ forms, such that $A \implies Next$.

One can ask why we would want to allow non-machine closed specification at all. For example, in formulas that specify algorithms — namely those that contain actions rather than just state predicates — we could syntactically allow only the use of / and no other liveness property. Lamport and others answer this question in a short note, and say that it is often beneficial and elegant to specify an algorithm at a high-level that doesn’t need to be realizable in code, where, in order to show how the algorithm works, we may want to pretend that it can see into the future. Only lower-level specifications, those that could be directly translated into program code need to be machine closed, and, of course, we may want to show how a low level, machine closed specification implements a high level, non machine closed one (general relations between algorithms will be covered in part 4).

But if we want some way to tell syntactically whether or not we’re specifying realizable algorithms, at least we’ve now isolated our problems to liveness properties and offered a simple sufficient condition for realizability — use only / as liveness conditions (and actions that imply $Next$, but that comes naturally in most cases, as we’ll see in the section about concurrency).

There is another quirk involving liveness with existential quantification, that I find interesting. Consider the following specification:

Recall that formulas of the form $Init \land \Box[Next]_v$ are safety properties, and because they’re also stuttering invariant, they do not imply any liveness properties. However, with the existential quantifier we get a specification with the single free temporal variable $m$, which admits all behaviors that increment $m$ and eventually stop; in other words, it admits only terminating behaviors that increment $m$. Formally, $Spec \implies \E k : \Diamond\Box(m = k)$. But termination is clearly a liveness property, and so this specification is not equivalent to any machine machine-closed specification involving $m$ and no other variables. The specification $m = 0 \land \Box[m’ = m + 1]_m$ (with only a machine property and no liveness) is not equivalent to our specification as it admits a behavior that increments indefinitely, while the specification $m = 0 \land \Box[m’ = m + 1]_m \land \E k : \Diamond\Box(m = k)$, which is equivalent to $Spec$, is not machine-closed as the liveness property rules out behaviors allowed by the machine property (with $m$ incrementing indefinitely).

As it turns out, the cause of this phenomenon is the combination of the quantifier, which hides the $n$ variable, and what happens at the very first step of the algorithm. That single transition from $m=0 \to m=1$ hides an infinite nondeterministic choice of $n$. If, instead, the nondeterministic choice were finite, say between 0 and 1000, as in:

Then $Spec$ would be equivalent to the following formula that only mentions $m$:

To see why, recall that equivalence on behaviors is stuttering-invariant, so a behavior that stutters and then resumes incrementing, e.g. $\seq{1, 2, 2, 2, 3, \ldots}$, is equivalent to one that doesn’t stutter, e.g. $\seq{1, 2, 3, \ldots}$, unless the stuttering goes on indefinitely. So the formula allows $m$ to be incremented as long as it’s < 1000, but doesn’t require it to; it may stop incrementing at any value less than 1000.

A specification that only allows hidden (i.e., of a bound temporal variable) deterministic choice from a finite set at every step is said to have finite invisible nondeterminism, or FIN. Like machine closure, FIN is a necessary condition for an important result regarding the existence of something called a refinement mapping, which we’ll learn about in part 4 .

The Unreasonable Power of Liveness

I will end this chapter with an example of how problematic, or powerful (depending on your point of view), liveness properties can be. Consider the definition:

where $M$ is an arbitrary formula, and $state$ is a tuple of variables. Note that we’re using an ordinary quantifier, not a temporal one, so $t$ is a constant. You may think that $Halts$ is suspicious because it tells you if an arbitrary algorithm halts and must therefore be undecidable, but there is nothing wrong with defining what it means to halt, and that is all the definition does; it does not claim that there exists an algorithm for deciding halting. Termination is an important property that we may wish to prove for some specific algorithms, and in order to prove it, we must be able to state it. That, in isolation, is an example of a perfectly good use of $\Diamond$.

You might fear that we can use $Halts$ to construct an algorithm to decide halting, as if it were a subroutine we could call, but that is not so simple. $Halts$ is a (parameterized) temporal formula, a level-3 expression, and so according to our syntax rules we can’t use it in an action, as in $answer’ = Halts(M, vars)$.

However, consider the specification of an “algorithm” that decides things (meaning, answers “yes” or “no”):

Our decider simply answers for each element in the given set $S$ either “yes” or “no”. It does so simply by using the supplied operator $Is$ which provides the decider with the answer. Seems innocuous enough, except that our decider doesn’t use $Is$ inside an action like so: $answer’ = \IF Is(x) \;\THEN \str{yes}\ELSE \str{no}$, but, instead, uses it in a liveness condition that is very much not machine closed — it fully determines the action’s nondeterministic choice. The final two conjuncts say this: $answer$ will eventually be “yes” or “no”, and it will be “yes” iff $Is$ is true. In the action, we just pick “yes” or “no” nondeterministically, but in the liveness property we “pull” the answer in the right direction. This allows $Is$ to be a temporal formula, bypassing the restriction of using temporal operators in actions.

Using the decider we could define:

Now $H$ “decides” — i.e., forms an “algorithm” for deciding halting. Note that we have not used any non-computable operations in the data logic. The only source of non-computability here is the use of $\Diamond$ in the decider in a way that isn’t machine-closed.

Issues of fairness, however, truly matter only for concurrency experts. When writing real TLA+ specifications, you’ll find that you rarely use the $\Diamond$ operator (in 1000 lines of specifications, it is common to see $\Diamond$ or $\leadsto$ appear just once or twice), and when you do, it will be to specify very simple and very natural liveness properties (such as: “when we receive a request, we will eventually send a response”). If you only care about safety properties, you can ignore fairness altogether. Indeed, when we use the TLC model checker to check a formula $F$ in normal form, if we only ask it to check for safety properties (i.e. that $\vDash F \implies \Box P$), it will ignore any fairness conjuncts in $F$ as they can have no effect on safety.

It is entirely possible and reasonable to write complete specifications of large real-world systems without once worrying about liveness. In fact, the entire discussion of temporal formulas and liveness appears only in section 2, “More Advanced Topics”, of Lamport’s book, Specifying Systems. The book even has a short section entitled “The Unimportance of Liveness”, which I’ll reproduce here:

[I]n practice the liveness property [of a specification]… is not as important as the safety part. The ultimate purpose of writing a specification is to avoid errors. Experience shows that most of the benefit from writing and using a specification comes from the safety part. On the other hand, the liveness property is usually easy enough to write. It typically constitutes less than five percent of a specification. So, you might as well write the liveness part. However, when looking for errors, most of your effort should be devoted to examining the safety part.

Liveness can indeed be complicated. Lamport et al. end the note about the importance of allowing non-machine-closed specifications that I mentioned above, with these words:

[A]rbitrary liveness properties are “problematic”. However, the problem lies in the nature of liveness, not in its definition. One cannot avoid complexity by definition. — Stephen Jay Gould

We have now completed defining TLA, and explored its trickier nuances (all but one, actually, which we’ll save for part 4). For those of you who may be interested in a technical discussion of this logic’s expressivity, see Alexander Rabinovich, Expressive completeness of temporal logic of action (sic), 1998, and Stephan Merz, A More Complete TLA, 1999.

State Machines

Let us now emerge from the depths of TLA, and turn to more mundane things. In a section called “Temporal Logic Considered Confusing” of Specifying Systems, Lamport writes:

Temporal logic is quite expressive, and one can combine its operators in all sorts of ways to express a wide variety of properties. This suggests the following approach to writing a specification: express each property that the system must satisfy with a temporal formula, and then conjoin all these formulas… This approach is philosophically appealing. It has just one problem: it’s practical for only the very simplest of specifications — and even for them, it seldom works well. The unbridled use of temporal logic produces formulas that are hard to understand. Conjoining several of these formulas produces a specification that is impossible to understand.

Algorithms As State Machines

In practice, we write our TLA+ specifications as state machines, and check or prove whether they imply some simple properties, usually written as very simple temporal formulas, or whether they implement other, higher-level specifications also written as state machines (we will cover the implementation relation in part 4).

In Computation and State Machines (2008), Lamport writes:

State machines provide a framework for much of computer science. They can be described and manipulated with ordinary, everyday mathematics—that is, with sets, functions, and simple logic. State machines therefore provide a uniform way to describe computation with simple mathematics.

A state machine — and, note, w