The Design of Parallel Programming Languages

Richard P. Gabriel

published in 1991

Prelude

"Show me." John held up the chalk, holding it by the top, the bottom pointed at his feet. His smile was slight but visible, visible to everyone in the lecture hall; his smile was aimed at me. Who was more scared, the students, or me - the TA? That’s something no one can ever know, but there were 50 of them and one of me; if my answer is wrong, their answers would have been wrong.

Three months earlier, in my first quarter as graduate student at Stanford, in the Lab at the top of the hill, just before a volleyball game, I asked John McCarthy - the John McCarthy - whether I could have an office at the Lab and be supported by it. The Lab being SAIL, the Stanford AI Lab, and I having been assigned elsewhere, in that dusty summer of 1975, with the sharp fragrance of volleyball-courtside tarweed splattered on a background of eucalyptus; hot and dry. Hazy and dusty.

"Sure," if I TAed 206, the Lisp course. OK, I knew Lisp, had learned it from the blue book - the Lisp 1.5 Programmer’s Manual. His book. His language.

Some call his classes "Uncle John’s Mystery Hour," in which John McCarthy can and will lecture on the last thing he thought of before rushing late through the door and down the stairs to the front of the lecture hall.

But this class started out like anything but a mystery hour: John was reviewing the answers to the midterm. The first few problems were routine, but this one maybe not, and he said, "Ah, this problem was intended to show you that there are some problems that Lisp is not good for."

I can hear the worried thoughts behind me: "Er, but multiplying two polynomials seems so easy. I musta got it wrong." And I, maybe I had graded it wrong.

John: "It turns out you need to pass global information, and the control structure is not regular. Here are the six functions and two globals you need."

So now you’re thinking that the problem was stated funny. Well, here it is: Let a 0 +a 1 x+a 2 x2+...+a n xn be represented by the list (a0 a1 ... an); write a function that multiplies two such polynomials. The solution is easy too: if add-poly adds two such polynomials, and constant-multiply-poly multiplies a polynomial by a constant, then this is just:

(defun multiply-poly (p1 p2) (cond ((null p1) nil) (t (add-poly (constant-multiply-poly (car p1) p2) (cons 0 (multiply-poly (cdr p1) p2))))))

And those two helper functions are an obvious one or two lines of code.

John finishes his complicated explanation to a wash of silence.

I, lowly first-year grad student TA, peep: "I think Lisp can do better than that." And hope I haven’t blown it. "Show me," the smile begins with his eyes and spreads downward.

John’s world is a world of ideas, a world in which ideas don’t belong to anyone, and when an idea is wrong, just the idea - not the person - is wrong. A world in which ideas are like young birds, and we catch them and proudly show them to our friends. The bird’s beauty and the hunter’s are distinct.

And when John catches a bird on the way to class, we see it, and just the way he does.

So I write the few lines of code on the board, and John wonders at it: "Well, it looks like you’re right, but you see, I was working on this other problem, talking to Gosper, and it threw me off track on this one...."

Sometimes a bird flying past in a blaze of pretty song will distract a bird-catcher and maybe confuse him, too.

"That’s OK John," I say, my own smile flicking, "I would have given you partial credit."

"I would expect that you’d be able to see my solution is correct and give me full credit."

"Nah. I would have taken off 5 points for poor programming style."

Some people won’t show you the birds they’ve caught until they are sure, certain, positive that they - the birds, or themselves - are gorgeous, or rare, or remarkable. When your mind can separate yourself from your bird, you will share it sooner, and the beauty of the bird will be sooner enjoyed. And what is a bird but for being enjoyed?

And if John’s mystery hour is because he found a new bird, well, heck, let’s just love him for it.

Ever since that day I’ve patterned myself after him, and we’ve had a long, strange road together - catching birds. Ha, that’s it!

Introduction

In 1984, John McCarthy and I proposed a parallel dialect of Lisp, which we called Qlambda - later, the name was changed to Qlisp. A number of papers have been written about Qlisp, several implementations have been made, and generally Qlisp is considered by researchers to be one of the standard parallel Lisp languages.

The initial motivation for the language was to study medium-grained parallelism within the context of a relatively small multiprocessor (up to 16 processors) with a shared memory system (a single address space). Two of the key problems of parallel programming we wished to solve were controlling excessive parallelism - spawning too many processes and thereby wasting resources - and determining reasonable semantics for Lisp functions that were now operating in a parallel context.

Qlisp also defined a number of constructs for speculative computation. Speculative computation is the practice of spawning a number of processes to explore a set of possible approaches to computing a result, and the process that succeeds in computing the result deactivates or kills the others.

As Qlisp evolved, a number of higher-level constructs were defined and other language features were adopted and modified from other parallel dialects. Examples of adopted features are futures and locks. Examples of higher-level constructs are heavyweight futures and partially, multiply invoked functions.

Nevertheless, the problem has remained that it is difficult to write parallel programs in Qlisp and parallel languages like it. Devising efficient means of execution appears an easier task than devising intuitive parallel languages; that is, despite the several higher-level constructs added to Qlisp, Qlisp remains a low-level language, as discovered by programmers trying to use it.

It is here that another, very much more obscure bit of John’s work has provided inspiration for new directions - samefringe.

Samefringe

In 1977, in the pages of the ACM SIGART Newsletter, a debate raged over whether samefringe is the simplest problem that requires multiprocessing or coroutines to easily solve. A number of people contributed solutions - including John McCarthy. John’s solution was startlingly simple. The samefringe problem is this: two binary trees have the same fringe if they have exactly the same leaves reading from left to right.

(defun leaves (tree) (cond ((atom tree) (list tree)) (t (append (leaves (car tree)) (leaves (cdr tree)))))) (defun samefringe (tree1 tree2) (equal (leaves tree1) (leaves tree2)))

The problem, as stated in 1976, was to write a samefringe program that does not consume a lot of storage, which the above solution will.

John’s solution was this:

(defun samefringe (tree1 tree2) (or (eq tree1 tree2) (and (not (atom tree1)) (not (atom tree2)) (same (gopher tree1) (gopher tree2))))) (defun same (x y) (and (eq (car x) (car y)) (samefringe (cdr x) (cdr y)))) (defun gopher (tree) (cond ((atom (car tree)) tree) (t (gopher (cons (caar tree) (cons (cdar tree) (cdr tree)))))))

A key ingredient to this solution is the fact that gopher creates a small amount of additional storage, assuming that the Lisp has tail recursion removal; but this solution is difficult to understand, and it seems as though it should be possible to write a parallel version of this program that is transparent to understand, even though it might not consume minimal storage. Such a solution would have three processes, one each to traverse the trees and a third to do the comparisons.

That is, a solution should look like this:

(defun samefringe (tree1 tree2) (qflet t ((compare (x y) (unless (eq x y) (return-from samefringe nil)))) (labels ((traverse (tree) (cond ((atom tree) <report>) (t (traverse (car tree)) (traverse (cdr tree)))))) (qprogn t (traverse tree1) (traverse tree2)) <test>)))

where <report> represents a mechanism that reports tree to compare as one of its arguments, and <test> tests whether the same number of leaves were reported.

The program below, for contrast, is the simplest Qlisp program that is structured as envisioned. This solution is actually quite simple except that the control and synchronization mechanisms are mingled with the meat of the program, to an extent where it is nearly impossible to understand these essential parts of the program; and this illustrates why Qlisp is a low level language.

(defun samefringe (tree1 tree2) (let ((s1 (make-semaphore :count 0)) (s2 (make-semaphore :count 1)) (sentinel (list nil)) (v nil)) (flet ((compare (leaf1 leaf2) (unless (eq leaf1 leaf2) (return-from samefringe nil)))) (qlabels (:parallelp t) ((accept-leaf1 (leaf) (wait-semaphore s2) (setq v leaf) (signal-semaphore s1)) (accept-leaf2 (leaf) (wait-semaphore s1) (compare v leaf) (signal-semaphore s2))) (labels ((traverse (tree report) (cond ((atom tree) (funcall report tree)) (t (traverse (car tree) report) (traverse (cdr tree) report)))) (traverse-tree (tree report) (traverse tree report) (funcall report sentinel))) (qlet t ((x (traverse-tree tree1 #'accept-leaf1)) (y (traverse-tree tree2 #'accept-leaf2))) (declare (ignore x y)) t))))))

Design

This problem has inspired my research on the design of parallel programming languages. Almost everyone who designs parallel languages is focused on the problem of writing programs that run fast, and the bulk of research in parallelism is aimed at the efficient use of processors.

The primary language feature that addresses ease of programming is the future, but this feature does not go very far towards rendering problems like samefringe easily solved.

The remaining sections introduce some new programming language concepts, each designed to help write the samefringe program.

Inherited Brands

The first concept is called inherited brands. A brand is like a class, but instances have no brand-defined structure (local storage). Any object can be made an instance of a given brand, unless there would be a conflict. Brands are thus like opaque types. Brands plus structure definition yields classes. All classes are brands.

Brands are organized into domains, where each domain can have its own notions of inheritance and method applicability. Furthermore, some domains - the ones most interesting to us - permit objects to acquire and shed brands during execution; such domains are called dynamic domains. Domains determine brand conflicts.

The primary use of brands is to provide a means to dynamically tag objects during execution, and objects will generally accrete brands. As we will see, it is useful to be able to tag an object with the process that produced it. Processes are instances of brands in the process domain, and an object is an instance of a brand representing a process if that object was created or manipulated by that process.

Brands can be used to select methods exactly as classes can.

We will use the syntax x:c to indicate that the object named by x has the brand c . The syntax x:c&d indicates that the object named by x has the brands c and d .

Process Environments

A process environment is a set of processes, each of which is an instance of the brand named process . The set of these processes is first class. Subsets of the processes can be distinguished by their brands, and control code can be written whose behavior depends on the state of the set of processes or the states of any of the processes in a process environment. When certain important states are achieved or events occur within the environment, specific generic functions are invoked, and these generic functions may be specialized. For example, when a process terminates, the generic function terminate is invoked by the system; its default method returns t . It is possible to distinguish various brands of termination: the brand completed indicates the process terminated by completion, killed means it was killed prematurely, and dead is a super-brand of completed and killed . An important sub-brand of completed is quiescent . A process is quiescent only when all processes spawned by it are also complete, including any spawned by implicit calls to terminate .

The following code creates a process environment:

(brand-let ((p1 (process)) (p2 (process))) (spawn p1 (traverse tree1)) (spawn p2 (traverse tree2)))

This code fragment creates two sub-brands of process , which is a class. The variables p1 and p2 have lexical scope, and the processes they represent have indefinite extent. spawn takes a brand and an expression, and creates an instance of the brand that will execute the expression.

A value returned from a process environment is branded by the process that originated it.

Furthermore, if an object passes through a process - by being bound to a variable within that process, for example - the object is also branded by that process. The effect is that an object retains the history of its creation and manipulation by retaining its brands.

Functions also brand objects exactly as if they were processes. Super-brands can be created to categorize functions, so that, for example, a scheduler can be written that will guarantee that no two functions of the same brand can be executed at the same time.

In some domains, a function brand is retained exactly during the dynamic extent of its invocation, and, in other domains, a function brand is permanently retained.

Improved PMI Functions

The basic idea of partially, multiply invoked functions (PMI functions) is to separate the process of coordinating the arrival of arguments from the process of executing the function on those arguments.

All calls to a function go through an interface to the actual implementation. The implementation of the function receives arguments by position, while the interface accepts only named arguments, provides for all defaulting, and coordinates the arrival of arguments for the function from multiple sources.

This technique is similar to currying functions, but because all arguments to the interface are named, one does not need to curry in any particular order. One could term it dynamic currying. All calls to a PMI function that supply arguments to the same invocation receive the same future as their value. A future is returned whenever some required arguments to the function have not been supplied to the interface by a function call. If a particular invocation of a function has returned a future, the value returned when all required arguments have been supplied is a realized future. This preserves the identity of all values returned for a particular invocation.

The improvement is to use brands and classes to distinguish arguments instead of explicit names. The following is an example:

(defbrand p1 (process) ...) (defbrand p2 (process) ...)

(pmi-qdefun add-up (x:p1 y:p2) (+ x y))

(add-up (spawn p1 1)) => <future> (add-up (spawn p2 2)) => <the same future> = 3

The first call to add-up supplies the argument of brand p1 , and the second the argument of brand p2 . These two are added to produce the result.

Discriminating on Callers

In CLOS, method applicability is determined by the classes of the (required) arguments. As we will see, it can be convenient to discriminate on the class or brand of the calling function or process. The following program prints p1 when the traversal of tree1 terminates, and p2 when the traversal of tree2 terminates. Note that this uses the automatic invocation of terminate in process environments.

(brand-let ((p1 (process)) (p2 (process))) (gf-labels ((terminate:p1 () (print 'p1)) (terminate:p2 () (print 'p2))) (spawn p1 (traverse tree1)) (spawn p2 (traverse tree2))))

There are two methods for terminate , one invoked when called by p1 and one when called by p2 . gf-labels defines methods and is similar in spirit to generic-labels in CLOS.

Waiting for Methods to Be Applicable

We have seen it is possible to write methods that are invoked if arguments are instances of process or function brands. In a multiprocessing environment, it can happen that a generic function might be invoked that will have an applicable method at some point, but not at the time of invocation.

In CLOS, a method is applicable if the arguments to the generic function are of the right classes. In the domain of processes - in which the brand of a process changes dynamically - we extend the concept of applicability as follows. Suppose a generic function is invoked that is made up of methods, some of which are specialized on brands within dynamic domains. If no method is applicable at invocation time, the generic function will wait until some method is applicable. When a method becomes applicable, the processes that changed the brands of the arguments making the method applicable are paused until the generic function terminates. Method combination provides a mechanism to define complex behavior based on the changing states of processes. For example, one type of method combination would wait for the all arguments to be of the right brand at the same time, while a second type would freeze each as it becomes the right brand.

Here is an example:

(brand-let ((p1 (process)) (p2 (process))) (gf-labels ((complete :wait (p:dead q:dead)(print 'both-dead))) (complete (spawn p1 (traverse tree1)) (spawn p2 (traverse tree2)))))

When a process terminates, it becomes an instance of the brand dead (and possibly also an instance of one of its sub-brands). Here the method for complete will eventually be applicable, but possibly not at the moment it is invoked.

Now we can write a complete solution to samefringe, which is shown below. The special form gf-flet is similar to gf-labels .

1 (defun samefringe (t1 t2) 2 (let ((unique (list nil))) 3 (brand-let ((p1 (process)) 4 (p2 (process))) 5 (pmi-qflet ((compare (x:p1 y:p2) 6 (unless (eq x y) (return-from samefringe nil)))) 7 (gf-flet ((terminate:process () (compare unique)) 8 (complete :wait (p:quiescent q:quiescent) t)) 9 (labels ((traverse (tree) 10 (cond ((atom tree) (compare tree)) 11 (t (traverse (car tree)) 12 (traverse (cdr tree)))))) 13 (complete 14 (spawn p1 (traverse t1)) 15 (spawn p2 (traverse t2)))))))))

Because there are two trees being traversed, there are two sub-brands of process , one for each tree, called p1 and p2 . The function compare is a PMI function that takes one argument from each. When each process terminates, terminate provides a unique end marker to compare .

The main program spawns two processes, one an instance of p1 and the other an instance of p2 . Then the main program applies complete to those two processes, where the only method for complete is applicable when both its arguments are quiescent processes. Therefore, complete and hence samefringe wait until both processes have terminated - including the calls to terminate and compare - unless the fringes are different.

Line 2 defines the unique object. Lines 3-4 define two distinct sub-brands of process . These sub-brands only exist during the dynamic extent of the brand-let .

Lines 5-6 define the comparison function, which compares an object produced by p1 to an object produced by p2 . Line 7 defines the termination procedure for the processes. When a process of brand p1 terminates, the system invokes terminate as if by p1 and then p1 becomes an instance of quiescent . The characteristic of being of brand p1 is inherited by this invocation of terminate , which will pass on that brand to unique when it invokes compare .

Line 8 defines a generic function that returns t when invoked on two terminated processes. Furthermore, if this function is applied to two processes, either one of which is not terminated, invocation will be blocked until they both are terminated.

Lines 9-12 are the simple traversal routine.

Lines 13-15 are the main body of the function. Lines 14-15 spawn two processes, the first an instance of p1 and the second an instance of p2 . The function complete will be invoked when the two processes terminate, and this function will return t in that event. However, recall that compare might terminate the processes early.

Triggered Functions

Sometimes we wish to write some code that will run when certain events take place, but we wish the description of these events to be more than a simple interrupt or trap.

Such a facility will prove useful when modifying existing code. For example, suppose there is an existing function called traverse that will visit all the nodes in a tree in some particular order, and we wish to compute samefringe on the leaves of two trees in the order traversed by traverse . Here is an example traversal function:

(defun traverse (tree) (operate tree) (unless (atom tree) (traverse (car tree)) (traverse (cdr tree))))

This code traverses trees exactly as required for classical samefringe, but it invokes operate at every node whether internal or leaf. We can modify this program without touching operate by using triggered functions.

A triggered function continually waits for the original program to achieve a complex state:

(defbrand atomic-tree (predicated-class) () :predicate #'atom) (defgeneric watch-visit :method-combination :continuous) (defmethod watch-visit (p:(operate tree:atomic-tree))(compare tree))

Here the method combination type indicates continuous invocation. The function watch-visit is invoked on the tree traversal process, and the method shown is applicable when the argument (a process) to operate is an atomic tree. The code below illustrates its use.

(defun samefringe (traverse t1 t2) (let ((unique (list nil))) (brand-let ((p1 (process)) (p2 (process))) (pmi-qflet ((compare (x:p1 y:p2) (unless (eq x y)(return-from samefringe nil)))) (gf-flet ((terminate:process () (compare unique)) (complete :wait (p:quiescent q:quiescent) t)) (complete (watch-visit (spawn p1 (traverse t1))) (watch-visit (spawn p2 (traverse t2)))))))))

By "continually" we mean that when watch-visit is applicable, it is invoked and then immediately reinvoked; however, there is only one invocation of watch-visit per triggering event. That is, if an event invokes watch-visit , watch-visit cannot be reinvoked by that same event - the event must terminate first, at which point an event exactly like the first one can invoke watch-visit .

The domain brand that makes this work is one that retains brands only during the dynamic extent of invocation, which is the default domain.

This behavior can be implemented as follows. The process that caused it to be applicable is paused and watch-visit is completed. Once watch-visit is completed, the previously paused process is continued until the method would no longer be applicable. At that point watch-visit is reinvoked. For example, the order of events is this: process p1 begins to invoke operate on an atomic tree; p1 is paused and watch-visit runs. When watch-visit ends, p1 is continued until it exits operate . At that point watch-visit is re-invoked (and p1 continues as well).

The odd syntax means this: watch-visit will be called when the process p is within operate with a first argument of brand atomic-tree . The argument to operate is available in the watch-visit method, as shown.

watch-visit could have been written without using continuous method combination as follows:

(defmethod watch-visit (p:(operate tree:atomic-tree)) (compare tree) (watch-visit p))

Samefringe could have been written without explicit invocation of watch-visit by writing watch-visit as a generic function that will be triggered when operate is invoked with a first argument of brand atomic-tree , as follows:

(defmethod watch-visit:(operate tree:atomic-tree) :triggered () (compare tree))

samefringe would be exactly as above except there is no explicit invocation of watch-visit .

Triggered functions are but abstractions: we identified a common pattern of behavior and defined a stylized means to express that pattern.

Naming Variable Names

When piecing together programs from existing parts, we need to be able to refer to the variables of one program from another program. This requires a means to additionally name variables to distinguish them from variables of the same name in other programs. We can implement program-specific variables by qualifying their names with the process in which the program defining them is being executed. Similarly if the need arose to introduce new variables, the same mechanism could be used. Here is an example:

(brand-let ((p1 (process)) (p2 (process))) (let ((n:p1 0)(n:p2 0)) (flet ((count (l) (dolist (i l) (incf n)))) (spawn p1 (count ...)) (spawn p2 (count ...)) ...)))

This program computes the length of two lists into two copies of the variable named n .

Improved Monotonic Variables

The concept of monotonic variables can be improved by using brands. A monotonic variable is one whose value always increases or always decreases. There is a brand monotonic and two sub-brands, monotonic-increasing and monotonic-decreasing .

Here are a few methods for an interesting generic function:

(defmethod = (x:dead y:dead) (= (unbrand x) (unbrand y))) (defmethod = (x:dead: y:monotonic-increasing) (if (< x y) nil (= x y))) (defmethod = (x:monotonic-increasing y:dead) (if (< y x) nil (= x y)))

The relation = can be decided before both values are computed if the process computing one is dead and the other is monotonic increasing. This can then be used to terminate processes early. We can complete the definition of = by defining similar methods for other combinations of dead , monotonic-increasing , and monotonic-decreasing .

We can write a different version of samefringe by using monotonic variables:

(defun samefringe (t1 t2) (brand-let ((p1 (process)) (p2 (process))) (pmi-qflet ((compare (x:p1 y:p2) (unless (eq x y) (return-from samefringe nil)))) (let ((n:p1&monotonic-increasing 0) (n:p2&monotonic-increasing 0)) (gf-flet ((count:compare :trigger () (incf n))) (labels ((traverse (tree) (cond ((atom tree) (compare tree)) (t (traverse (car tree)) (traverse (cdr tree)))))) (spawn p1 (traverse t1)) (spawn p2 (traverse t2)) (= n:p1 n:p2)))))))

Here the function count is a triggered function: when some process becomes an instance of an invocation of compare , that process is paused while the body of count is executed. That invocation of count retains the brands of the process that triggered it, so that incf increases the right variable.

Temporal or Historical Abstraction

Some of the problems in programming have to do with modifying a program to keep track of the history of a data structure or to answer questions about what things happened at what time. We believe there is a new type of abstraction, called temporal or historical, that can make such changes simpler. Again, this basic idea originated with John McCarthy.

The idea is to be able to refer to the computational past. Here is some code suggested by some remarks by Vladimir Lifschitz:

(defun f (n) (let ((x:0 nil)) (dotimes (i:0 n) (setq x:0 (read)))) (dotimes (i n) (print x:0[when (= i:0 (- n i))])))

There are two named variables, x:0 and i:0 . They are qualified names because later code will refer to them. This code is read as follows: Create a variable x (indexed by 0). Run a loop for i (indexed by 0) from 0 to n-1, assigning x to the value of read . Run another loop from 0 to n-1. In this loop print the value of x:0 when the value of i:0 was (- n i) ; that is, print the values that were read, backwards. No intermediate data structure needs to be created by the programmer.

Brand-based description mechanisms are a step in the direction of temporal abstractions. For example, a triggered function defines a temporal or historical abstraction, though in a primitive way: we can define a pattern of function invocation and argument spectrum such that instances of the pattern cause recordkeeping actions.

A better approach would be to specify a language of temporal patterns, so that historical events can be succinctly described.

Correspondences

A good use of historical abstractions is the correspondence. The key problem in samefringe is to determine whether there is a 1-1 correspondence between the leaves of two trees such that corresponding leaves are the same.

A correspondence is a mapping between two processes. Each process defines a set of objects; we use variable naming to define those things that correspond. The correspondence takes an element in the one set to the corresponding element of the second set. A correspondence also has a failure function, which is invoked if the correspondence is not one-to-one. This function is invoked whenever the failure is detected.

In the code shown below, map-correspondence takes a function and a correspondence c:D->R. It enumerates the elements of D, passing them to the function.

(defun samefringe (t1 t2) (with-names (node) (labels ((traverse (x) (cond ((atom x) x[named node]) (t (traverse (left x)) (traverse (right x)))))) (brand-let ((p1 (process) ()) (p2 (process) ())) (spawn p1 (traverse t1)) (spawn p2 (traverse t2)) (let ((c (correspondence (-> node:p1 node:p2) ;domain, range #'(lambda () ;fail function (return-from samefringe nil))))) (map-correspondence #'(lambda (node) ;map function (unless (eq node (c node)) (return-from samefringe nil))) c) t))))

Conclusions

Too large a proportion of effort has gone into designing programming languages so that programs written in them run fast. A much more precious commodity - if it can even be called a commodity - is the human, who must devise, design, code, debug, and maintain those programs.

Programming language designers argue that a small set of orthogonal features is best, yet the version of samefringe written in raw Qlisp using simple, orthogonal features is nearly incomprehensible. It would seem to make sense that as our sophistication increases with programming and software developments, so should the sophistication of our programming languages.

Bibliography