Coroutines

by

M. Douglas McIlroy

Bell Telephone Laboratories, Incorporated

Murray Hill, New Jersey

Abstract

Unlike subroutines, coroutines may be connected, and reconnected, in nonhierarchical arrangements. Coroutines are particularly useful for generating and processing data streams. Semantics for coroutines are developed and examples are given.

1. Introduction

Coroutines were christened by Conway in 1963 [1], but they have been used in various special applications at least since 1959. Well known as the idea as become [2, 3, 4], it remains to be clothed in generality and espoused by general purpose programming languages.(1)

The essential novelty of coroutines arises from discarding the hierarchical relationship that holds among subroutines. Whereas members of a linked pair of subroutines stand to each other as superior and inferior, the ``caller'' and the ``called,'' coroutines may be linked in a more flexible manner out of which may be constructed democratic as well as hierarchic, and rearrangeable as well as fixed, interrelationships.

More abstractly, coroutines decouple scope, connection and flow of control. The lifetimes of an identifiable ``generation'' of a coroutine and its local data are not arbitrarily bounded between ``entry to'' and ``return from'' the routine, and communication of parameters between coroutines may be established otherwise than by invocation. List processing has already decoupled data lifetimes from flow of control; coroutines do the same for procedures.

Coroutines clarify some obscure areas of programming, especially stream processes such as input-output or sequence generation. As coroutine arrangements of any complexity are extremely awkward to mimic by traditional methods, coroutines deserve to become a basic feature of general purpose programming languages. This paper has been written to bring a coherent development of coroutines to the attention of language designers.

Although the paper does not propose coroutines for any specific language, it uses a rough formalism for the sake of example. The lexical style is that of ALGOL, but the syntax is more nearly that of PL/I, whose list processing and declarative facilities need little extension to encompass coroutines. In the interest of clarity the style deviates from pure PL/I [5] in self-evident ways:

The assignment operator is written `:='

A ` repeat ;... end ; ' grouping replaces ` do while ('1'b);... end ; '

' grouping replaces ` ' One-bit constants ` true ' and ` false ' replace '1' b and '0' b .

' and ` ' replace and . Procedures of no arguments have an empty argument list.

Data attributes that can reasonably be inferred from context or are irrelevant to the presentation are often left undeclared.

Ambiguous partial qualification is resolved by appropriateness of attributes in context.

DATA STREAMS

Many computing processe manipulate streams of data. Items are obtained in sequence from one or more input streams, some information is gleaned, and one or more output streams are constructed. A stream may pass through a cascade of processes, each of which transforms it further.

A program acting ion a data stream may be likened to a bucket brigade. A bucketeer in the middle of the line sees neither neighbor as dominant. He simply minds his own pile of buckets, calling upon his left neighbor to replenish it and on his right neighbor to dispose of it. The only special discipline is that at any instant action is restricted to one bucketeer, who may subsequently pass control left or right as he wishes. There are no parallel operations. Once the stream has been established, our man in the middle cannot discern whether the stream is driven by an overflowing fountain at one end or by an unfillable well at the other. In programming parlance, the bucketeer is behaving as a coroutine.

Merely by rearranging the connections among a set of coroutines, one may adjust the order of processing a data stream. In particular it is easy to insert new processes into an existing cascade simply by opening connections and dropping them in. Equally easily a stream may be diverted through alternative processes and switched back without disrupting the overall chain in the slightest.

Input-output programming, the type example of stream processing, is customarily force-fit into the subroutine mold. A computational main routine is pictured as dominant, calling occasionally on a read routine to deliver some input or a print routine to absorb some output. On the other hand, the read routine, especially if it does buffering, steadily accepts input from the outside world in a cycle that depends more on the form of the data than on the sequence of calls from the main program. From the standpoint of the reaqd routine control is seen passing to the main routine whenever a sufficient quantity of input has been accumulated. Indeed, it makes sense to regard neither the main routine nor the read routine as boss. Each follows its own relatively autonomous path, suspending occasionally while the other works, but always preserving its internal state while control is with its partner.

When several input streams simultaneously traverse the ``same'' read routine, it becomes even less satisfactory to regard it as a single subroutine. But thinking of the read routine as a coroutine existing in separate generations for each stream, one can understand exactly how it works and how it keeps book independently on the state of each stream, and one can explain the curious operations of ``opening'' and ``closing'' input streams as activation and deactivation of separate generations of the read routine (see below).

2. LINKAGE

Subroutine linkage is probably the most complex ``primitive'' notion in standard programming languages, and must be dissected into a combination of lesser primitives for coroutines. To this end, let us partition the mechanism three ways, into

Activation

Connection

Passing control

ACTIVATION

In ALGOL or PL/I a routine is said to be active while its local data remains accessible, or in the terminology of PL/I, while its generation of automatic storage persists. This definition of activity has been chosen instead of the customary definition in terms of control remaining inside a block or its descendants, so that it may be carried across to coroutines. The physical manifestation of a single continuous period of activity, or activation, will be called a generation of a routine. Where no ambiguity can arise, this may be shortened to ``generation.''

Allocation and initialization of automatic storage occurs immediately upon activation, regardless of whether control is to continue directly in the activated generation. Parameters upon which allocation and initialization depend must be supplied at this time. Other parameters, which are intended to be fixed for the entire activation, may also be passed. The prologue (allocation and initialization phase) of a coroutine thus behaves as a subroutine executed as part of activation. Moreover the entire generation should behave as a subroutine with respect to name or reference parameters and nonlocal variables used elsewhere than in prologue,(2) for it should not outlive these quantities.

Formalisms for dynamic storage allocation as in PL/I's list processing or Wirth and Hoare's proposal [6] may be adapted for activation of coroutines. Extending PL/I terminology, a coroutine is a based routine, a generation of which is a based structure. The based routine, or strictly speaking its prologue, will be understood to be a pointer-valued function. Activation will be specified by a simple function reference that returns a pointer to the new generation:

pointer := procedure(arguments);

CONNECTION

A connection is the bond that exists between two generations analogous to the bond between a subroutine and its caller. A connection is a communication path with at dual purpose - passage of control and passage of arguments.

The period of connection in subroutine linkage coincides with the period of activity of the called generation and with the interval that control is with that generation or its descendants. Coroutines may be connected more freely, subject only to the restrictions that activation precede connection, and connection precede passing control or accessing formal parameters. There is no preferred direction of passing control (i.e. no distinguishable caller) and no implication of hierarchy nor nesting of generations. Coroutines may exist in a comparatively anarchic arrangement; whatever discipline obtains among a particular collection of coroutines must be imposed by their author; it is not built in.

In general two pieces of information are required at the end of a connection, with complementary information at the far end. A resumption label tells where execution must start when control passes in through the connection. An argument list gives the actual parameters to which correspond formals of the far end.

CONNECTORS

Connectors are the portals of routines. Connection is made by plugging two connectors together, and broken by pulling them apart. Connectors hold all the information pertinent to a connection.

For definiteness let us suppose a connector to be like (in the technical sense of PL/I's like attribute) the following structure defined in pseudo-PL/I notation:

1 Connector connector, 2 Resumption label, 3 Generation pointer, 3 Address pointer, 2 Mate pointer, 2 Arguments

Connector.Generation

Connector.Mate

A connector declaration specifies parameters of the connector. If the connector is to be accessed from outside its own scope (in particular to cause connection) the declaration must also specify its place in the generation structure. Notations such as the following modifications of PL/I procedure statements and entry declarations suffice:

a: procedure(x) based 2 b connector(y), 2 c connector(z); declare 1 a entry based , 2 b connector, 2 c connector;

A MECHANICAL PICTURE

Coroutine generations and connectors may be visualized as in Fig. 1. A generation appears as a box, and connectors as protruding plugs. In the figure two generations have been connected. The left hand generation has three connectors and the right, two.

Fig. 1. Two connected generations

Conway emphasized the separability of coroutines, that is the possibility of breaking and redirecting established connections without disturbing the generations at either end. For instance, to change input code conversion, one might interpose a new code converter between input and processing simply by opening a connector and dropping the converter in.

OPERATIONS

Certain operations and accessing specifications apply to connectors:

Connection Initializing the argument list Otherwise setting the argument list Initializing the resumption label Otherwise setting the resumption label Passing control via the connection Indirectly addressing the mated connection Indirectly addressing the mated generation Connector assignment

In the examples connection, (a), will be specified by

connect Connector1 to Connector2;

Connector1.Mate := Addr(Connector2); Connector2.Mate := Addr(Connector1);

Resumption is a compound of operations (c), (e) and (f), whose syntax is parallel to that of a call:

resume connector(arguments);

resume

Resumption may also be specified in the style of a function reference:

variable := connector(arguments);

L

g

declare f(x) connector, g(x) connector; connect f to g; a := f(1); b := x; stop; L: c := y+1; resume g(c,3);

a = 3

b = 2

c = 2

1

f

f

g

L

y

1

L

c

2

g

f

f

g

f

a := f(1)

x

c

f

3

a

3

b

2

To make syntactic distinction between connector access (as in connector assignment) and resumption of a function connector having no arguments, resumption without arguments will be indicated by an empty argument list in parentheses.

3. COROUTINES IN USE

PROBLEMS OF INITIALIZATION

Example 1 shows the body of a simple coroutine. The routine has two connectors and operates in an unending loop, accepting input via connector Left , transforming it according to the function Map , and sending data on via Right . The declarations state that Left has one formal parameter In ; it has no actual parameters as shown by usage in the first resume . An opposite situation holds for connector Right . Indeed Left and Right of two generations of this routine could be connected to produce the product transform Map(Map(...)) .

Example 1

declare Left connector(In), Right connector(); repeat; R: resume Left(); L: Out := Map(In); resume Right(Out); end;

If control were to come in first through Left , then processing should begin at L ; if through Right , then at R . With only one appearance of each connector, this is self evident, but starting points would have to be specified if there were more. Further difficulty arises upon writing the main loop this way:

repeat; resume Right(Map(Left())); end;

Left

Right

resume ^Right(Map(^Left()));

1+Left()

Another aspect of initialization is illustrated by Example 2, where a stream of values x comes from Left with a repetition factor n applied to each. The process sends the same stream on to Right with a repetition count whose magnitude is limited to 2 or less. Should control first come in from Right then the very next quantity to be accessed (in the while clause) would be the formal parameter n from Left . In order for this parameter to have meaning there must be an argument list available from Left even though control had never come in from there. Suppose for simplicity that another generation B of this limiting routine had been connected at Left . An argument list (m,y) would accordingly have to be available at connector Right in B immediately upon connection. It is necessary therefore to have argument lists at the starting points initialized as a part of activating prologue.

Example 2

declare Left connector(n,x), Right connector(), m initial(0); Loop: repeat; resume ^Left(); y := x; do while(n!=0); m := Min(2,n); n := n-m; resume ^Right(m,y); end Loop;

In some cases argument list initialization may be infeasible. Such a case occurs in the previous example

resume ^Right(Map(^Left()));

Right

Left

Left

COMPLETE COROUTINES

We are now in a position to write an entire program in coroutines. Example 3 is a prime number sieve that works on a novel principle. The sieve is a pile of ``meshes,'' where each mesh is a coroutine that screens out multiples of a particular value. If an integer dumped in at the top of the pile falls through to the bottom, it is relatively prime to the divisor in each mesh. If integers are dropped in sequence, and if a new mesh is added whenever an integer gets through the whole sieve, then this scheme becomes a prime number sieve.

Figure 3 shows the sieve in operation. At the top is ``dump,'' where integers are poured in:

dump: do j := 2 by 1; resume source(j); end;

i

m

repeat; resume ^top; if (i mod m) != 0 then resume bottom(i); end;

repeat; resume sink; put list (i); p := mesh(i); connect p->mesh.top to sink.mate->mesh.bottom; connect p->mesh.bottom to sink; end

Fig. 3

seive: procedure based, 2 source connector, 2 sink connector(i); mesh: procedure(n) based, 2 top connector(i), 2 bottom connector; declare m initial(n); repeat; resume top; if (i mod m) != 0 then resume bottom(i); end; end mesh; start: connect source to sink; dump: do j := 2 by 1; resume sink(j), end; hopper: repeat; resume sink; put list(i); p := mesh(i); connect p->mesh.top to sink.mate->mesh.bottom; connect p->mesh.bottom to sink; end; end seive;

COROUTINE-SUBROUTINE MIXTURES

Subroutine automatic storage is usually allocated from a last-in-first-out stack, but coroutine automatic storage need not be nested and so must be allocated independently for each generation. When coroutines and subroutines are intermixed, so that coroutines call subroutines which in turn resume coroutines, each coroutine generation must effectively head a private stack for dynamically descendant subroutine generations.(5)

To see that such mixtures of routines can actually be useful, consider example 4, a permuation generator (for illustrative use only, not recommended for efficiency). At each resumption, the generator will construct another permutation and pass it back. A true-false switch tells when the job is done. Each level of recursion works on an initial portion of the array A and produces all permutations of that subarray having each possible last element.

The generator might be used this way

declare Next connector (A) bit (1), 1 Perm entry based, 2 Next connector, A(*); p := Perm(5); connect Next to p->perm.next; do while (next()); ... process permutation A ... end;

Example 4

perm: procedure(n) based; 2 Next connector; declare A(n); resume ^Next(); do i := 1 to n; A(i) := i; end; call Perm1(n); resume Next(A, false); Perm1: procedure(n); declare i; if n=1 then resume Next(A, true); else do i := 1 to n; call Exchange(A(i), A(n)); call Perm1(n-1); call Exchange(A(n), A(i)); end Perm;

OTHER TECHNIQUES

Referring to Example 4 let us examine methods that might have been used to render a stream generator in present day ALGOL or PL/I.

Method I. Supply a processing routine as a parameter of revery new permutation. Can be done in ALGOL or PL/I.

Method II. Maintain all memory required to get from one permutation to the next outside the routine and pass the remembered quantities as arguments at each invocation of the generator. Can be done in ALGOL or PL/I.

Method III. Allocate memory dynamically and keep a pointer to it outside the routine as in method II. Can be done in PL/I only.

Method IV. Maintain internal memory in own storage. Can be done in ALGOL only.

Method V. A functional assignment method showed to me by D. M. R. Park needs detailed discussion and will be put off to the following section. It works in neither ALGOL nor PL/I.

Each method has its drawbacks.

Method I fails if interleaved streams from two generators are required. Interleaved streams of permutations may seem unlikely, but interleaved streams of random numbers have been used.

Method II only works well when stream elements are produced at the outermost block leve, otherwise block nesting records (upon which example 4 depends) will have to be simulated by some unwieldy curcumlocution. Moreover, because its memory is outside, a Method II routine can not be transparent to algorithm changes that alter memory requirements.

Method III shares the first objection to Method II, that results ought to be obtained at the top block level. In PL/I, but not in the record proposal for ALGOL [6], a change in algorithm would not be apparent outside, since a single pointer can designate any sort of memory. This transparency is not above criticism, for it depends on a property of PL/I often considered unhygienic.

Method IV shares in objections to Methods I and II. Only one steram can be accomodated by own variables, and the technique is difficult if results naturally appear in other than the top block. The needed dynamic own facility is not often implemented in ALGOL.

None of the above mentioned deficiencies is exhibited by coroutines. The price of coroutines lies in the extra complexity of nonnested storage allocation. This price applies too to Methods II and IV, but not so seriously, as they require only one dynamic allocation per stream whose extent is thereafter unchanged, whereas coroutines need a new stack, whose extend is unknown by definition.

FUNCTIONAL ASSIGNMENT

Own variables (Method IV) may be dispensed with by a mechanism of functional assignment abetted by certain novel rules concerning persistence of variables. For instance this random number generator:

Random: procedure(); declare n static initial(1); n := (n*r)mod s; return(n); end random;

Newrandom: procedure() entry; declare n initial(1); Random: procedure(); n := (n*r) mod s; return(n); end random; return(random); end Newrandom;

Gen := Newrandom(); x := Gen();

Many stream generators may be implemented with this style of procedure assignment. However I know of no way to imitate two-sided coroutines (see section 5). The permuation generator of Example 4 can be so rendered, but it is too complicated to show here.

4. LOCALIZATION

Argument lists and resumption labels have been said to belong at the ``transmitting'' end of a connection, in direct opposition to many implementations of subroutines, where these items are kept with the called generation. Separability makes ours the natural choice -- argument lists and resumption labels localized in the generation to which they refer need no adjustment in response to reconnection.

A function value has been said to be a hidden parameter that appears as an argument when resuming from the function coroutine to the receiver of the value. This convention opposes an implementation practice prevailing in PL/I, where a function value is taken instead to be a hidden formal of the function. The convention of the present paper disposes of the rather repugnant idea that all functions set their value as a side effect, but more importantly makes it possible to use an initialized function value even before resuming the function.

To their disfavor, the conventions about function values and argument lists rule out ``suicides'' among function coroutines, contrary to customary practice with function subroutines, which deactivate themselves upon return. A function coroutine must survive, even after control has left for good, until its value has been used.

Own variables, whose ambivalent nature as locally known globabl variables has caused much heated discussion, can be supplanted by coroutines. What had been implemented as own variables may become automatic variables and kept where they belong with a persistent generation. For straightforward programs, the idea of static storage à la PL/I should probably be kept, but ALGOL's thicket of dynamic own could be swept away in favor of coroutines.

5. CONTROL DISCIPLINE

Coroutines may be connected in innumerable configurations; even looped arrangements are possible. However certain discipline must be observed on passing control around a loop. Envision a string unwinding behind as control passes from generation to generation, and being rolled up again as control comes back. If string is ever laid along a connector through which string already passes, then the old string must be broken hthere and can no longer be traced back across. One might forbid laying string into a generation through which string already passes,but that seems overly restrictive. FORTRAN users have long enjoyed the convenience of a set of routines that call each other in a circle; also one can easily envision uses of connectors mated right back to their own generation. Given a system where both are possible, we might as well permit such things.

In terms of control discipline two species of coroutines can be distinguished -- one-sided and two-sided. In a one-sided routine each connector has a preferred direction; the control string is always unwinding when control crosses that way and winding up in the opposite direction. No such preference exists in two sided coroutines. To my knowledge there have been no implementations of two-sided coroutines, but we have seen (cf. Example 1) that the concept is reasonable and useful.

6. SUMMARY

Couroutine generations may be activated with no requirement for nesting. The fundamental operations necessary for coroutines are activation (allocation), connection and passing control. Connection between coroutines is symmetrical in all respects -- passage of control, passage of parameters, and passage of function values can occur in either direction or both. Coroutines are particularly useful in processing data streams. Stream processes can be redirected simply by reconnecting coroutines. Since certain straightforward coroutine applications cannot be cleanly accomplished with traditional language facilities, coroutines are indeed a novel feature for programming languages.

Oxford, May 1968

References

Footnotes

(1)

(2)

(3)

(4)

sink.mate->mesh.bottom

(5)