Michael McCool is an engineer with Intel.

Parallel programming is challenging for a number of reasons. In addition to all the challenges of serial computation, parallel programs can also suffer from race conditions and deadlock, and even if correct may be non-deterministic, which complicates testing. Achieving high performance with a parallel program also requires minimization of contention for limited resources such as communication and memory bandwidth. Failure to properly account for these factors can lead to programs which dramatically underperform.

This document discusses and advocates a structured approach to parallel programming. This approach is based on a core set of common and composable patterns of parallel computation and data management with an emphasis on determinism and scalability. By using these patterns and also paying attention to a small number of factors in algorithm design (such as data locality), programs built using this approach have the potential to perform and scale well on a variety of different parallel computer architectures.

The structured approach discussed here has also been called the "algorithmic skeleton" approach. The general idea is that specific combinations of computation and data access recur in many different algorithms. A system that supports the specification and composition of "good" patterns can guide the developer towards the construction of well-structured and reliable, yet high-performance programs. Patterns can be domain-specific, but there are also general-purpose patterns that occur in algorithms from many different domains. A system only has to support a small number of patterns in order to be universal, that is, capable of implementing any algorithm. However, it may be useful for efficiency to support more than the minimal universal subset, since different patterns can also expose different types of data coherency that can be used to optimize performance. It may also be useful to support specific combinations of "fused" patterns.

We will first survey previous work in parallel patterns, algorithmic skeletons, and some systems based on these. The use of patterns in parallel programming bears a strong resemblance to the use of structured control flow in serial programming. Both for reasons of analogy and because serial computation is an important sub-component of parallel computation, some basic patterns for supporting serial computation will be presented and discussed, along with some serial programming models based on universal subsets of these patterns. A useful set of structured and deterministic parallel patterns will then be presented and discussed.

Background

The concept of "algorithmic skeleton" was introduced by Cole [1989,2004] and elaborated by Skillicorn [1998]. It is similar to the modern idea of design pattern [Gamma 1994, Mattson 2004], and so we will use the term "parallel pattern". We will define a parallel pattern as specific recurring configuration of computation and data access. In the View from Berkeley [Asanovic 2006] some characteristic workloads called dwarves or motifs are identified. These are workloads that consist primarily of one type of pattern. In most applications, however, a variety of patterns are composed in complicated ways. Programming systems can be based entirely on composition of pattern-oriented operators [Bromling 2002, Tan 2003, Sérot 2002] .

In the 1970s, it was noted that serial programs could be made easier to understand and reason about if they were built by composing their control flow out of only a small number of specific control flow patterns: sequence, selection, and iteration (and/or recursion). This structured programming approach has led to the elimination of goto from most programs, although not without a lot of controversy [Dijkstra1968]. Now, however, the use of structured control flow is so widely accepted that goto is either omitted from or deprecated in most modern programming languages.

In the same way, structured parallel patterns can eliminate the need for explicit threading and synchronization while making programs easier to understand. In particular, one desirable property that structured parallel patterns should possess is deterministic semantics that are consistent with a specific serial ordering of the program. In contrast, threads share a strong resemblance to goto in their flexibility but also their lack of structure [Lee 2006]. Like goto, use of threads (at least when combined with global random access to data) can make it difficult to do local reasoning about a program, since it becomes impossible to isolate the effect of one part of a program's code from another.

One alternative is to use functional programming. However, most mainstream programming languages are not functional, and some algorithms, especially graph and matrix problems, are difficult to express efficiently in purely functional languages. However, we can interpret functional programming as one instance of our structured parallel programming, just using a restricted set of patterns.

There is also a large class of collective languages that express parallelism explicitly through operations over entire collections of data. Collective languages can be imperative or functional, and collective operations are often available through libraries or as built-in operations in mainstream languages. NESL is an example of a collective pure functional language [Blelloch 1990, 1993, 1996], while FORTRAN 2003 is an example of an imperative language with built-in collective operations. RapidMind [McCool 2006] and Ct are examples of imperative languages based on collective operations. Deterministic imperative collective languages can be given a consistent sequential interpretation that makes it straightforward to reason about their behavior.

We do not have space to discuss the implementation or specification of patterns in programming models in detail. However, patterns can be implemented or supported in a variety of ways: as conventions; using code generators [Herrington 2003]; supported explicitly in new language designs; or implemented in libraries. The "library" approach can be made as efficient as a compiled language if dynamic code generation is used to support optimization and fusion [McCool 2002, McCool 2006] .

In the following I will first discuss serial patterns. This is important for three reasons: