Contributed articles Turing Lecture: The Computer Science of Concurrency: The Early Years

Credit: Richard Morgenstein

To Edsger Dijkstra

It is insufficiently considered that men more often require to be reminded than informed.Samuel Johnson

I don't know if concurrency is a science, but it is a field of computer science. What I call concurrency has gone by many names, including parallel computing, concurrent programming, and multiprogramming. I regard distributed computing to be part of the more general topic of concurrency. I also use the name algorithm for what were once usually called programs and were generally written in pseudo-code.

This is a personal view of the first dozen years of the history of the field of concurrencya view from today, based on 40 years of hindsight. It reflects my biased perspective, so despite covering only the very beginning of what was then an esoteric field, it is far from complete. The geneses of my own contributions are described in comments in my publications web page.

The omission that would have seemed most striking to someone reading this history in 1977 is the absence of any discussion of programming languages. In the late 1960s and early 1970s, most papers considered to be about concurrency were about language constructs for concurrent programs. A problem such as mutual exclusion was considered to be solved by introducing a language construct that made its solution trivial. This article is not about concurrent programming; it is about concurrent algorithms and their underlying principles.

Back to Top

The Beginning: Mutual Exclusion

The Problem. While concurrent program execution had been considered for years, the computer science of concurrency began with Edsger Dijkstra's seminal 1965 paper that introduced the mutual exclusion problem.5 He posed the problem of synchronizing N processes, each with a section of code called its critical section, so that the following properties are satisfied:

Mutual Exclusion. No two critical sections are executed concurrently. (Like many problems in concurrency, the goal of mutual exclusion is to eliminate concurrency, allowing us to at least pretend that everything happens sequentially.)

No two critical sections are executed concurrently. (Like many problems in concurrency, the goal of mutual exclusion is to eliminate concurrency, allowing us to at least pretend that everything happens sequentially.) Livelock Freedom. If some process is waiting to execute its critical section, then some process will eventually execute its critical section.

Mutual exclusion is an example of what is now called a safety property, and live-lock freedom is called a liveness property. Intuitively, a safety property asserts that something bad never happens; a liveness property asserts that something good must eventually happen. Safety and liveness were defined formally in 1985.1

Dijkstra required a solution to allow any computer to halt outside its critical section and associated synchronizing code. This is a crucial requirement that rules out simple, uninteresting solutionsfor example, ones in which processes take turns entering their critical sections. The 1-buffer case of the producer-consumer synchronization algorithm given below essentially is such a solution for N = 2.

Dijkstra also permitted no real-time assumption. The only progress property that could be assumed was process fairness, which requires every process that has not halted to eventually take a step. In those days, concurrency was obtained by having multiple processes share a single processor. One process could execute thousands of steps while all other processes did nothing. Process fairness was all one could reasonably assume.

Dijkstra was aware from the beginning of how subtle concurrent algorithms are and how easy it is to get them wrong. He wrote a careful proof of his algorithm. The computational model implicit in his reasoning is that an execution is represented as a sequence of states, where a state consists of an assignment of values to the algorithm's variables plus other necessary information such as the control state of each process (what code it will execute next). I have found this to be the most generally useful model of computationfor example, it underlies a Turing machine. I like to call it the standard model.

The need for careful proofs should have become evident a few months later, when the second published mutual exclusion algorithm9 was shown to be incorrect.10 However, incorrect concurrent algorithms are still being published and will no doubt continue to be for a long time, despite modern tools for catching errors that require little effortin particular, model checkers.

The First "Real" Solution. Although of little if any practical use, the bakery algorithm11 has become a popular example of a mutual exclusion algorithm. It is based on a protocol sometimes used in retail shops: customers take successively numbered tickets from a machine, and the lowest-numbered waiting customer is served next. A literal implementation of this approach would require a ticket-machine process that never halts, violating Dijkstra's requirements. Instead, an entering process computes its own ticket number by reading the numbers of all other synchronizing processes and choosing a number greater than any that it sees.

A problem with this algorithm is that ticket numbers can grow without bound. This should not be a practical problem. If each process chooses a number at most one greater than one that was previously chosen, then numbers should remain well below 2128. However, a ticket number might have to occupy more than one memory word, and it was generally assumed that a process could atomically read or write at most one word.

The proof of correctness of the algorithm revealed that the read or write of an entire number need not be atomic. The bakery algorithm is correct as long as reading a number returns the correct value if the number is not concurrently being written. It doesn't matter what value is returned by a read that overlaps a write. The algorithm is correct even if reading a number while it is changing from 9 to 10 obtains the value 2496.

This amazing property of the bakery algorithm means that it implements mutual exclusion without assuming that processes have mutually exclusive access to their ticket numbers. It was the first algorithm to implement mutual exclusion without assuming any lower-level mutual exclusion. In 1973, this was considered impossible.4(p88) Even in 1990, experts still thought it was impossible.21(question 28)

One problem remained: How can we maintain a reasonable bound on the values of ticket numbers if a read concurrent with a write could obtain any value? For example, what if reading a number while it changes from 9 to 10 can obtain the value 22496? A closely related problem is to implement a system clock that provides the current time in nanoseconds if reads and writes of only a single byte are atomic, where a read must return a time that was correct sometime during the read operation. Even trickier is to implement a cyclic clock. I recommend these problems as challenging exercises. Solutions have been published.12

A Rigorous Proof of Mutual Exclusion. Previous correctness proofs were based on the standard model, in which an execution is represented as a sequence of states. This model assumes atomic transitions between states, so it does not provide a natural model of the bakery algorithm with its non-atomic reads and writes of numbers.

Before I discuss a more suitable model, consider the following conundrum. A fundamental problem of interprocess synchronization is to ensure that an operation executed by one process precedes an operation executed by another process. For example, mutual exclusion requires that if two processes both execute their critical sections, then one of those operation executions precedes the other. Many modern multiprocessor computers provide a Memory Barrier (MB) instruction for implementing interprocess synchronization. Executing an instruction A then an MB then instruction B in a single process ensures that the execution of A precedes that of B. Here is the puzzle: An MB instruction enforces an ordering of two operations performed by the same process. Why is that useful for implementing interprocess synchronization, which requires ordering operations performed by different processes? The reader should contemplate this puzzle before reading the following description of the two-arrow model.

In the two-arrow model, an execution of the algorithm is represented by a set of operation executions that are considered to have a finite duration with starting and stopping times. The relations and on this set are defined as follows, for arbitrary operation executions A and B:

It is easy to check that these relations satisfy the following properties, for any operation executions A, B, C, and D:

The model abstracts away the explicit concept of time and assumes only a set of operation executions and relations and on it satisfying A1A4. (An additional property is needed to reason about liveness, which I ignore here.)

Proving correctness of the bakery algorithm requires some additional assumptions:

All the operation executions within a single process are totally ordered by .

For any read R and write W of the same variable, either R W or W R holds.

Each variable in the algorithm is written by only a single process, so all writes to that variable are ordered by . We assume that a read that does not overlap a write obtains the correct value. More precisely, if a read R of a variable satisfies R W or W R for every write W of the variable, then R obtains the value written by the latest write W with W R.

With these assumptions, the two-arrow formalism provides the most elegant proof of the bakery algorithm that I know of. I published such a proof of a variant of the algorithm.14

The conundrum of the MB command described at the beginning of this section is easily explained in terms of the two-arrow formalism. Suppose we want to ensure that an operation execution A in process p precedes an operation execution D in a different process qthat is, to ensure A D. Interprocess communication by accessing shared registers can reveal only that an operation execution C in q sees the effect of an operation execution B in p, which implies B C. The only way to deduce a relation from a relation is with A4. It allows us to deduce A D from B C if A B and C D. The latter two relations can be ensured by using MB instructions, which enforces relations between operation executions by the same process.

Back to Top

Producer-Consumer Synchronization

The FIFO Queue. The second fundamental concurrent programming problem to be studied was producer-consumer synchronization. This form of synchronization was used at the hardware level in the earliest computers, but it was first identified as a concurrency problem by Dijkstra in 1965, though not published in this formulation until 1968.6 Here, I consider an equivalent form of the problem: a bounded FIFO (first-in-first-out) queue. It can be described as an algorithm that reads inputs into an N-element buffer and then outputs them. The algorithm uses three variables:

in The infinite sequence of unread input values.

The infinite sequence of unread input values. buf A buffer that can hold up to N values.

A buffer that can hold up to values. out The sequence of values output so far.

A Producer process moves values from in to buf, and a Consumer process moves them from buf to out. In 1965 the algorithm would have been written in pseudo-code. Today, we can write it in the PlusCal algorithm language15 as algorithm PC of Figure 1. The initial value of the variable in is the constant Input, which is assumed to be an infinite sequence of values; variables buf and out initially equal the empty sequence. The processes Producer and Consumer are given the identifiers 0 and 1. In PlusCal, an operation execution consists of execution of the code from one label to the next. Hence, the entire body of each process's while loop is executed atomically. The await statements assert enabling conditions of the actions. The keywords fair specify process fairness.

Figure 2 shows the first four states of an execution of the algorithm represented in the standard model. The letter P or C atop an arrow indicates which process's atomic step is executed to reach the next state.

Algorithm PC is a specification; a bounded FIFO queue must implement that specification. A specification is a definition, and it makes no formal sense to ask if a definition is correct. However, we can gain confidence that this algorithm does specify a bounded FIFO queue by proving properties of it. The most important class of properties one proves about an algorithm are invariance properties. A state predicate is an invariant iff it is true in every state of every execution. The following invariant of algorithm PC suggests that it is a correct specification of an N-element bounded queue:

where Len(buf) is the length of the sequence buf and º is sequence concatenation.

The basic method for proving that a predicate Inv is an invariant of a concurrent algorithm was introduced by Edward Ashcroft in 1975.2 We find a suitable predicate I (the inductive invariant) and prove that (i) I is true in every initial state, (ii) I is left true by every step of the algorithm, and (iii) I implies Inv. It is easy to prove that the state predicate above is an invariant of algorithm PC. The appropriate inductive invariant I is the conjunction of this invariant with a predicate asserting that each variable has a "type-correct" value. (PlusCal is an untyped language.)

Another Way of Looking at a FIFO Queue. The FIFO queue specification allows only a single initial state, and executing either process's action can produce only a single next state. Hence the execution of Figure 2 is completely determined by the sequence P C P P ... of atomic-action executions. For N = 3, all such sequences are described by the graph in Figure 3. The nodes of the graph are called events, each event representing an atomic execution of the algorithm step with which the event is labeled. The graph defines an irreflexive partial order on the events, where e f iff e f and there is a path through the graph from event e to event f. For want of a standard term for it, I will call such a partially ordered set of events, in which events are labeled with atomic steps, an event history.

This event history describes all sequences of states that represent executions of algorithm PC in the standard model. Such a sequence of states is described by a sequence of infinitely many P and C eventsthat is, by a total ordering of the events in the event history. A total ordering of these events describes a possible execution of algorithm PC iff it is consistent with the partial order . To see this, observe that the downward pointing diagonal arrows imply that the ith P event (which moves the ith input to the buffer) must precede the ith C event (which moves that input from the buffer to the output). The upward pointing diagonal arrows indicate that the ith C event must precede the (i + 3)rd P event, which is necessary to ensure that there is room for the (i + 3)rd input value in the buffer, which can hold at most three elements.

We can view the event history of the figure to be the single "real" execution of algorithm PC. The infinitely many different executions in the standard model are artifacts of the model; they are not inherently different. Two events not ordered by the relationfor example, the second C event and the fourth P eventrepresent operations that can be executed concurrently. However, the standard model requires concurrent executions of the two operations to be modeled as occurring in some order.

Mutual Exclusion vs. Producer-Consumer Synchronization. Producer-consumer synchronization is inherently deterministic. On the other hand, mutual exclusion synchronization is inherently nondeterministic. It has an inherent race condition: two processes can compete to enter the critical section, and either might win.

Resolving a race requires an arbiter, a device that decides which of two events happens first.3 An arbiter can take arbitrarily long to make its decision. (A well-designed arbiter has an infinitesimal probability of taking very long.) Any mutual exclusion algorithm can therefore, in principle, take arbitrarily long to allow some waiting process to enter its critical section. This is not an artifact of any model. It appears to be a law of nature.

Producer-consumer synchronization has no inherent nondeterminism, hence no race condition. It can be implemented without an arbiter, so each operation can be executed in bounded time. It is a fundamentally different class of problem than mutual exclusion.

The FIFO Queue as an N-Process System. The graph in Figure 3 is drawn with two rows, each containing the events corresponding to actions of one of the two processes. Figure 4 is the same graph drawn with three rows. We can consider the three rows to be three separate processes. If we number these rows 0, 1, and 2 and we number the elements in the Input sequence starting from 0, then the events corresponding to the reading and outputting of element i of Input are in row i mod 3. We can consider each of those rows to be a process, making the FIFO queue a 3-process system for N = 3, and an N-process system in general. If we were to implement the variable buf with an N-element cyclic buffer, each of these processes would correspond to a separate buffer element.

In the event history model, any totally ordered subset of events can be considered a process. The standard model has no inherent notion of processes. In that model, an execution is just a sequence of states. Processes are an artifact of the way the sequence of states is represented. The set of executions of algorithm PC can also be described by an N-process PlusCal algorithm.

Generalized Producer-Consumer Synchronization. The generalization of producer-consumer synchronization is marked-graph synchronization. Marked graphs were introduced by Holt and Commoner in 1970.8 A marked graph is a directed graph together with a marking that assigns a finite set of indistinguishable tokens to each arc. A node is fired in a marking by removing one token from each of its input arcs and adding one token to each of its output arcs (producing a new marking). A firing sequence of a marked graph is a sequence of firings that can end only with a marking in which no node may be fired. (By definition of firing, a node can be fired iff it has at least one token on each input arc.)

A marked graph synchronization problem is described by labeling the nodes of a marked graph with the names of atomic operations. This specifies that a sequence of atomic operation executions is permitted iff it is the sequence of labels of the nodes in a possible firing sequence of the marked graph. For example, the following marked graph describes the FIFO queue for N = 3.

A token on the top arc represents a value in the buffer, and a token on the bottom arc represents space for one value in the buffer. Observe that the number of tokens on this marked graph remains constant throughout a firing sequence. The generalization of this observation to arbitrary marked graphs is that the number of tokens on any cycle remains constant.

All executions of a marked graph synchronization algorithm are described by a single event history. Marked graph synchronization can be implemented without an arbiter, so each operation can be executed in a bounded length of time.

Marked graphs can be viewed as a special class of Petri nets.18 Petri nets are a model of concurrent computation especially well-suited for expressing the need for arbitration. Although simple and elegant, Petri nets are not expressive enough to formally describe most interesting concurrent algorithms. Petri nets have been used successfully to model some aspects of real systems, and they have been generalized to more expressive languages. But to my knowledge, neither Petri nets nor their generalizations have significantly influenced the field of concurrent algorithms.

The Two-Arrow Formalism Revisited. Let be an event history with partial order . Suppose we partition into nonempty disjoint subsets called operation executions. We can define two relations and on the set of operation executions as follows, for any operation executions A and B:

It is straightforward to see that these definitions (and the assumption that is an irreflexive partial order) imply properties A1A4 on page 72. Thus, we can obtain a two-arrow representation of the execution of an algorithm with non-atomic operations from an event history whose events are the atomic events that make up the operation executions. The event history does not have to be discrete. Its events could be points in a space-time continuum, where is the causality relation introduced by Minkowski.17

Back to Top

Distributed Algorithms

Pictures of event histories were first used to describe distributed systems. Figure 5 is an event history that I once used as an illustration.13 The events come from three processes, with time moving upwards. A diagonal arc joining events from two different processes represents the causality relation requiring that a message must be sent before it is received. For example, the arc from q 4 to r 3 indicates that event q 4 of the second process sent a message that was received by event r 3 of the third process.

In general, executions of such a distributed system can produce different event histories. For example, in addition to the history of Figure 5, there might be an event history in which the message sent by event q 1 is received before the message sent by event q 4 . In such a case, there is true nondeterminism and the system requires arbitration.

Let a consistent cut of an event history consist of a set C of events such that for every two events c and d, if event c is in C and d c, then d is in C. For example, {p 1 , q 1 , q 2 , r 1 , r 2 } is a consistent cut of the event history of Figure 5. Every consistent cut defines a global state of the system during some execution in the standard modelthe state after executing the steps associated with the events in the consistent cut.

An event history like that of Figure 5 allows incompatible consistent cutsthat is two consistent cuts, neither of which is a subset of the other. They describe possible global states that, in the standard model, may not occur in the same execution. This shows that there is no meaningful concept of a unique global state at an instant. For example, there are different consistent cuts containing only events q 1 and q 2 of the second process. They represent different possible global states immediately after the process has executed event q 2 . There is no reason to distinguish any of those global states as the global state at that instant.

Because the standard model refers to global states, it has been argued that the model should not be used for reasoning about distributed algorithms and systems. While this argument sounds plausible, it is wrong. An invariant of a global system is a meaningful concept because it is a state predicate that is true for all possible global states, and so does not depend on any preferred global states. The problem of implementing a distributed system can often be viewed as that of maintaining a global invariant even though different processes may have incompatible views of what the current state is at any instant.

Thinking is useful, and multiple ways of thinking can be even more useful. However, while event histories may be especially useful for helping us understand distributed systems, the best way to reason about these systems is usually in terms of global invariants. The standard model provides the most practical way to reason about invariance.

Back to Top

Afterwards

After distributed systems, the next major step in concurrent algorithms was the study of fault tolerance. The first scientific examination of fault tolerance was Dijkstra's seminal 1974 paper on self-stabilization.7 However, as sometimes happens with work that is ahead of its time, that paper received little attention and was essentially forgotten for a decade. A survey of fault tolerance published in 197820 does not mention a single algorithm, showing that fault tolerance was still the province of computer engineering, not of computer science.

At about the same time that the study of fault-tolerant algorithms began in earnest, the study of models of concurrency blossomed. Arguably, the most influential of this work was Milner's CCS.16 These models were generally event-based, and avoided the use of state. They did not easily describe algorithms or the usual way of thinking about them based on the standard model. As a result, the study of concurrent algorithms and the study of formal models of concurrency split into two fields. A number of formalisms based on the standard model were introduced for describing and reasoning about concurrent algorithms. Notable among them is temporal logic, introduced by Amir Pnueli in 1977.19

The ensuing decades have seen a huge growth of interest in concurrencyparticularly in distributed systems. Looking back at the origins of the field, what stands out is the fundamental role played by Edsger Dijkstra, to whom this history is dedicated.

Back to Top

References 1. Alpern, B., Schneider, F.B. Defining liveness. Inform. Process. Lett. 21, 4 (October 1985), 181185. 2. Ashcroft, E.A. Proving assertions about parallel programs. J. Comput. Syst. Sci. 10 (February 1975), 110135. 3. Barros, J.C., Johnson, B.W. Equivalence of the arbiter, the synchronizer, the latch, and the inertial delay. IEEE Trans. Comput. C-32, 7 (July 1983), 603614. 4. Hansen, P.B. Operating System Principles. Prentice-Hall, Inc., Englewood Cliffs, New Jersey, 1973. 5. Dijkstra, E.W. Solution of a problem in concurrent programming control. Commun. ACM 8, 9 (September 1965), 569. 6. Dijkstra, E.W. Cooperating sequential processes. In Programming Languages. F. Genuys, ed. Academic Press, New York, 1968, 43112. Originally appeared as EWD123 (1965). 7. Dijkstra, E.W. Self-stabilizing systems in spite of distributed control. Commun. ACM 17, 11 (November 1974), 643644. 8. Holt, A., Commoner, F. Events and conditions. In Record of the Project MAC Conference on Concurrent Systems and Parallel Computation. Project MAC, June 1970, 352. 9. Hyman, H. Comments on a problem in concurrent programming control. Commun. ACM 9, 1 (Jan. 1966). 10. Knuth, D.E. Additional commments on a problem in concurrent program control. Commun. ACM 9, 5 (May 1966), 321322. 11. Lamport, L. A new solution of Dijkstra's concurrent programming problem. Commun. ACM 17, 8 (August 1974), 453455. 12. Lamport, L. Concurrent reading and writing. Commun. ACM 20, 11 (November 1977), 806811. 13. Lamport, L. Time, clocks, and the ordering of events in a distributed system. Commun. ACM 21, 7 (July 1978), 558565. 14. Lamport, L. A new approach to proving the correctness of multi-process programs. ACM Trans. Program. Lang. Syst. 1, 1 (July 1979), 8497. 15. Lamport, L. The PlusCal algorithm language. In Theoretical Aspects of Computing, ICTAC 2009, Volume 5684 of Lecture Notes in Computer Science. M. Leucker and C. Morgan, eds. Springer-Verlag, 2009, 3660. 16. Milner, R. A Calculus of Communicating Systems, Volume 92 of Lecture Notes in Computer Science. Springer-Verlag, Berlin, Heidelberg, N.Y., 1980. 17. Minkowski, H. Space and time. In The Principle of Relativity. Dover, 1952, 7391. 18. Petri, C.A. Fundamentals of a theory of asynchronous information flow. In Information Processing 1962, Proceedings of IFIP Congress 62. C.M. Popplewell, ed. North-Holland, 1962, 386390. 19. Pnueli, A. The temporal logic of programs. In Proceedings of the 18th Annual Symposium on the Foundations of Computer Science. IEEE, November 1977, 4657. 20. Randell, B., Lee, P.A., Treleaven, P.C. Reliability issues in computing system design. Comput. Surv. 10, 2 (June 1978), 123165. 21. Rudolph, B.A. Self-assessment procedure xxi. Commun. ACM 33, 5 (May 1990), 563575.

Back to Top

Figures Figure 1. Producer-consumer synchronization. Figure 2. An execution of the FIFO queue. Figure 3. An event history for the FIFO queue with = 3. Figure 4. Another view of the FIFO queue for N = 3. Figure 5. An event history for a distributed system.

Back to top

Copyright held by author. Publication rights licensed to ACM.

The Digital Library is published by the Association for Computing Machinery. Copyright © 2015 ACM, Inc.