If you want to make programs go faster on parallel hardware, then you need some kind of concurrency. Right?

In this article I’d like to explain why the above statement is false, and why we should be very clear about the distinction between concurrency and parallelism. I should stress that these ideas are not mine, and are by no means new, but I think it’s important that this issue is well understood if we’re to find a way to enable everyday programmers to use multicore CPUs. I was moved to write about this after reading Tim Bray’s articles on Concur.next: while I agree with a lot of what’s said there, particularly statements like

Exposing real pre-emptive threading with shared mutable data structures to application programmers is wrong

it seems that parallelism and concurrency are still being conflated. Yes we need concurrency in our languages, but if all we want to do is make programs run faster on a multicore, concurrency should be a last resort.

First, I’ll try to establish the terminology.

A concurrent program is one with multiple threads of control. Each thread of control has effects on the world, and those threads are interleaved in some arbitrary way by the scheduler. We say that a concurrent programming language is non-deterministic, because the total effect of the program may depend on the particular interleaving at runtime. The programmer has the tricky task of controlling this non-determinism using synchronisation, to make sure that the program ends up doing what it was supposed to do regardless of the scheduling order. And that’s no mean feat, because there’s no reasonable way to test that you have covered all the cases. This is regardless of what synchronisation technology you’re using: yes, STM is better than locks, and message passing has its advantages, but all of these are just ways to communicate between threads in a non-deterministic language.

A parallel program, on the other hand, is one that merely runs on multiple processors, with the goal of hopefully running faster than it would on a single CPU.

So where did this dangerous assumption that Parallelism == Concurrency come from? It’s a natural consequence of languages with side-effects: when your language has side-effects everywhere, then any time you try to do more than one thing at a time you essentially have non-determinism caused by the interleaving of the effects from each operation. So in side-effecty languages, the only way to get parallelism is concurrency; it’s therefore not surprising that we often see the two conflated.

However, in a side-effect-free language, you are free to run different parts of the program at the same time without observing any difference in the result. This is one reason that our salvation lies in programming languages with controlled side-effects. The way forward for those side-effecty languages is to start being more explicit about the effects, so that the effect-free parts can be identified and exploited.

It pains me to see Haskell’s concurrency compared against the concurrency support in other languages, when the goal is simply to make use of multicore CPUs (Edit: Ted followed up with a clarification). It’s missing the point: yes of course Haskell has the best concurrency support :-), but for this problem domain it has something even better: deterministic parallelism. In Haskell you can use multicore CPUs without getting your hands dirty with concurrency and non-determinism, without having to get the synchronisation right, and with a guarantee that the parallel program gives the same answer every time, just more quickly.

There are two facets to Haskell’s determinstic parallelism support:

par/pseq and Strategies. These give you a way to add parallelism to an existing program, usually without requiring much restructuring. For instance, there’s a parallel version of ‘map’. Support for this kind of parallelism is maturing with the soon to be released GHC 6.12.1, where we made some significant performance improvements over previous versions.

Nested Data Parallelism. This is for taking advantage of parallelism in algorithms that are best expressed by composing operations on (possibly nested) arrays. The compiler takes care of flattening the array structure, fusing array operations, and dividing the work amongst the available CPUs. Data-Parallel Haskell will let us take advantage of GPUs and many-core machines for large-scale data-parallelism in the future. Right now, DPH support in GHC is experimental, but work on it continues.

That’s not to say that concurrency doesn’t have its place. So when should you use concurrency? Concurrency is most useful as a method for structuring a program that needs to communicate with multiple external clients simultaneously, or respond to multiple asynchronous inputs. It’s perfect for a GUI that needs to respond to user input while talking to a database and updating the display at the same time, for a network application that talks to multiple clients simultaneously, or a program that communicates with multiple hardware devices, for example. Concurrency lets you structure the program as if each individual communication is a sequential task, or a thread, and in these kinds of settings it’s often the ideal abstraction. STM is vitally important for making this kind of programming more tractable.

As luck would have it, we can run concurrent programs in parallel without changing their semantics. However, concurrent programs are often not compute-bound, so there’s not a great deal to be gained by actually running them in parallel, except perhaps for lower latency.

Having said all this, there is some overlap between concurrency and parallelism. Some algorithms use multiple threads for parallelism deliberately; for example, search-type problems in which multiple threads search branches of a problem space, where knowledge gained in one branch may be exploited in other concurrent searches. SAT-solvers and game-playing algorithms are good examples. An open problem is how to incorporate this kind of non-deterministic parallelism in a safe way: in Haskell these algorithms would end up in the IO monad, despite the fact that the result could be deterministic. Still, I believe these kinds of problems are in the minority, and we can get a long way with purely deterministic parallelism.

You’ll be glad to know that with GHC you can freely mix parallelism and concurrency on multicore CPUs to your heart’s content. Knock yourself out 🙂