Imagine the 1960s, when programmers were first exposed to the structured control flow primitives we now take for granted: while loops, if/then conditional blocks, and function calls. Prior to that, computer languages imposed almost no structure on code. Programmers applied a single mechanism to every control flow situation: the goto statement or jump machine instruction. Once the higher level control flow primitives arrived and people weened themselves from goto , the unleashed productivity was enormous — all due to having code which was more readable, less error-prone, and easier to diagnose.

In “Go To Statement Considered Harmful”, Edsger W. Dijkstra attributed this to: “[shortening] the conceptual gap between the static program and the dynamic process, to make the correspondence between the program (spread out in text space) and the process (spread out in time) as trivial as possible”. However, the narrowing of this gap didn’t end there. With structured control flow as a foundation, language designers added modern exception handling on top. (The “catch any error in scope and re-throw if necessary” construct we are familiar with today was created in 1990 by William Mitch Bradley for the Forth language.) Now programs could ensure correct propagation of errors without explicitly passing error codes up the program stack and diligently checking their results. (I’m aware of the trend to disinvent exceptions — subject for another article perhaps. For the reader who doesn’t like the word “exception”, think instead “language feature which ensures that errors don’t get dropped on the floor”.)

With these fundamental advancements in place, programmers were then able to be productive and manage a large amount of complexity — as long as the program dealt with only a single thread of control. Once concurrency is put into the mix, all that structure flies out the window. The primary difficulty comes from a new rift between program and process: managing the lifetimes of concurrent tasks coming into and out of existence, communicating between tasks, dealing with errors in other threads of control, and ensuring graceful shutdown. Counting somewhat arbitrarily from the time of Bradley’s CATCH and THROW , this conceptual gap has been the bane of software industriousness for almost 30 years.

Well, if you’re a programmer now, you’re in luck. A software renaissance of similar magnitude to the goto shakeup is taking place with regard to the authoring of concurrent programs. To his continued credit, most of the solution involves applying the same principle of alignment between program text and process time that Dijkstra described in 1968. I believe the resulting impact on productivity is so large that no modern language will be able to ignore this shift— just as no language after the 1960s was able to ignore structured function calls, conditional blocks, and loops.

In this article I’d like to enumerate the three ingredients of “concurrency made easy” and uncover where production-ready incarnations of it are available today. (While I use the term “concurrency made easy”, or CME, for these principles, it’s not intended as a formal definition. Furthermore, it’s not to be confused with the formal research effort by ETH Zurich under the same name, nor the recent Go language keynote presentation.)

Structured concurrency

For humans writing code, much of the complexity of concurrent programs stems from not being able to see the lifetimes and hierarchy of concurrent tasks as part of the code’s structure. Traditionally in such programs, a concurrent task can be spawned from one block of code and live until some other, completely unrelated, code location where the task is explicitly cancelled or otherwise terminates. Given this lack of structure, it’s very difficult to conceptualize the interaction among several such tasks. It’s also hard to manage trapping task errors, considering that their lifetime is not tied to the stack.

Within a relatively short span of two or three years, several people were independently thinking about this problem and landing on a nearly identical solution. Martin Sústrik’s article “Structured Concurrency” and libdill C library (2016), the Trio asynchronous I/O library for Python (2017), and the Kotlin language’s experimental coroutine library (2018) separately introduced the notion of tying task lifetime to lexical scopes and the stack in a deliberate way while providing sane error propagation and cancellation semantics.

Nathaniel J. Smith’s article “Notes on structured concurrency, or: Go statement considered harmful” (2018) pieced things together the most completely. It made the analogy between ad hoc spawning of tasks and goto , introduced a control flow primitive providing the necessary structured concurrency, and offered a cross-platform implementation — namely the Trio library for Python. The control flow primitive, which the article calls a “nursery”, is a scoped container for concurrent tasks. A synopsis (in the style of Python):

try:

async with open_nursery() as nursery:

nursery.start_task(a)

nursery.start_task(b)

nursery.start_task(c)

# end of nursery scope

# at this point tasks a, b, and c must be completed

...

except FooError: # e.g. some exception raised by task "b"

...

The point is that tasks started in the container’s scope will not live beyond that scope. Scope exit will block until all tasks are complete (normally or by cancellation). Should one or more tasks have an exception, all other tasks in the container are cancelled, and the resulting set of exceptions are aggregated and raised at the stack frame where the nursery was opened.

What this means is that task lifetimes can be readily understood by looking at the structure of a program. They are tied to scoped blocks of code, and those scoped blocks are in turn bound to the program call stack. Hence, groups of parallel tasks can be nested in a hierarchy mirroring lexical scopes and the call stack, and we’re free to use the language’s normal exception catching mechanism to trap task errors. Furthermore, shutdown works just as you’d want it to, regardless of the task hierarchy’s complexity, because no tasks are leaked and cancel order is deterministic (child tasks, then parent).

(To be clear: inklings of structured concurrency appeared prior to the cited articles and libraries. For example, in 2014, the Go language’s context library contained some relevant constructs. However, it was this recent wave of library authors who framed the problem clearly, proposed a coherent solution including cancellation and error propagation semantics, and argued for avoiding ad hoc spawned tasks.)

Concurrency honoring imperative constructs

What to do when you need to represent concurrency within a single thread? Use callbacks! This resulted in the infamous “callback hell” of JavaScript, Python’s Twisted framework, etc. The problem is that callbacks decouple threads of control from the program stack. Nested function definitions or lambda functions are then employed to make things appear somewhat imperative, but this only masks the problem. Function return statements are broken, exception handling is broken, looping is broken. A synopsis (in the style of JavaScript):

do_request(url) {

request(url, (err, res, body) => {

if (err) { // sorry, can't use try/catch!

...

} else {

let body = JSON.parse(body).value;

if (some_condition) return null; // not what you think!

...

}

}

}

The promise pattern was then introduced to reduce the mess of nesting and chaining such callbacks, but doesn’t address the broken control flow.

Authors of programs having only a single thread of control have been blissfully productive for almost three decades. What’s needed is a representation of concurrency which doesn’t interfere with the proven, imperative constructs we know and love. Fortunately, solutions have existed for some time (C# introduced async/await in 2012). They typically involve the use of coroutines, since allocating a full OS thread to each flow of control just to allow blocking is prohibitively expensive. Back to the example in JavaScript, where the async/await syntactic feature can be employed:

async do_request(url) {

try {

let body = await request(url);

body = JSON.parse(body).value;

if (some_condition) return null;

...

} catch (err) {

...

}

}

When async/await and equivalent solutions came about, it was as if the clouds had finally parted. The callbacks and promises employed up until then inherently violated structured control flow and obscured the more subtle concurrency management issues beneath. Once light was shed on this area, we saw several parallel efforts address the structured concurrency problem in the span of just a few years (as covered in the previous section).

There is a strong synergy between structured concurrency and honoring imperative constructs. In fact, they can combine to make programs simpler than they would be in classic sequential programming. Martin Sústrik concluded a pair of articles entitled “Getting rid of state machines” as follows:

“If a language has efficient green threads and built in support for cancellation, every program can be decomposed into set of simple functions containing no implicit or explicit state machines.”

While I can’t speak for the Kotlin case, I have direct experience with Trio and async/await in Python and can confirm it’s possible to write complex programs without resorting to state machines (and yes, without a single callback). It suggests that a traditional sequential program consisting of one or more state machines could be rendered in a form which is more readable and less error prone by employing this well-reasoned concurrency. And that is truly turning the concurrency “problem” on its head.

Explicit, cooperative multitasking

The third stage of the CME rocket is a little more loose and controversial than the others. However, it elevates concurrent programming to a very productive and accessible level.

Up until now, I’ve said nothing about how the concurrency is implemented. Is it by OS threads, multiprocessing, coroutines? If coroutines, is the context switching implicit or explicit? The reason for this lack of stance is that neither structured concurrency nor the honoring of imperative constructs is dependent on how concurrency is implemented. (While it’s true that async/await and similar solutions employ coroutines, the tasks are free to wrap OS threads or other processes as needed.)

In the article “Unyielding” (2014), the author of Python’s Twisted event-driven programming framework argued strongly against shared-state concurrency combined with implicit context switching:

“Threads make local reasoning difficult, and local reasoning is perhaps the most important thing in software development.”

That leaves only coroutines/green threads/fibers having an explicit context switch as meeting the ideal. Consider this simple example (in the style of Python):

async def foo(self):

await a()

self.balance += 100 # modify shared state

self.b() # modify even more shared state

await self.c()

Looking at this code, we know that a context switch could only happen at the await calls. Everything in between can be considered atomic. This makes it much easier to reason about the code, significantly lowers the chance of being surprised by some race condition, and makes debugging much easier should one exist. Programmers can also take advantage of this to reduce the need for explicit locking of shared state. (Note that in the case of the Go language, any function call is a potential context switch. This foils the ability to trivially encapsulate atomic operations given a single-threaded context. Kotlin straddles a gray area in this regard, since it distinguishes async functions, yet doesn’t have the equivalent of await to mark calls sites which may switch context.)

I’m not saying that a concurrent program must be limited to a single thread. Only that the majority of a program should be in the context of a single thread in order to take advantage of the greatly reduced cognitive load on the programmer. Individual threads coded in such a way may then be combined with message passing, channels, etc. to achieve parallelism. Kotlin’s coroutine library supports this seamlessly by allowing threading granularity to be specified per task container.

For the case of Python, luckily single-threading happens to be a sweet spot anyway. OS threads are a terrible value due to the GIL (global interpreter lock). This makes Trio very compelling, as all three ingredients of CME are hit, and the utmost can be made of the one CPU core to which Python is constrained.

Readers may argue that the shared state programming style can and should be avoided in the first place, so there’s no need to accept this bias towards cooperative multitasking. However, there is no denying the comfort and accessibility of this style. What makes Python, for example, a great language to learn to program or “get things done” is now directly applicable to concurrent programming.