Introduction

Cilk is an algorithmic multithreaded language. The philosophy behind Cilk is that a programmer should concentrate on structuring the program to expose parallelism and exploit locality, leaving Cilk's runtime system with the responsibility of scheduling the computation to run efficiently on a given platform. Thus, the Cilk runtime system takes care of details like load balancing, paging, and communication protocols. Unlike other multithreaded languages, however, Cilk is algorithmic in that the runtime system guarantees efficient and predictable performance.

To give you an idea of how simple it is to write parallel programs in Cilk, here is a Cilk implementation of the familiar recursive Fibonacci program in which the recursive calls are executed in parallel:

cilk int fib (int n) { if (n < 2) return n; else { int x, y; x = spawn fib (n-1); y = spawn fib (n-2); sync ; return (x+y); } }

Notice that if you elide the three Cilk keywords (shown in red , you obtain a C program, called the serial elision or C elision of the Cilk program. Cilk is a faithful extension of C in that the C elision of any Cilk program is always a valid implementation of the semantics of the Cilk program.

The current Cilk release is Cilk-5.3. This release is designed for symmetric multiprocessors (SMP's), and the release can be configured for several multiprocessor platforms, including Sun Microsystems Ultra SMP's, SGI Origin 2000, Linux PCs, etc. Earlier Cilk prototypes run on a wider variety of platforms, including massively parallel computers, symmetric multiprocessors, and networks of workstations. The Cilk distribution contains a runtime system and the cilk2c compiler, a type-checking preprocessor that translates Cilk into C. The runtime system should be fairly easy to port to most SMP's. An effort to design a distributed version of Cilk that spans clusters of SMP's has lead to a prototype implementation of distributed Cilk-5.1. The cilk2c compiler accepts the Cilk language (a superset of ANSI C), and it generates portable C code with hooks for the runtime system.

A highly successful Cilk application is the Cilkchess computer chess program, which won first prize in the Dutch Open Computer Chess Championship in November 1996, and almost defended its title in 1997. Its predecessor *Socrates parallel chess program placed second in the ICCA 8th Computer Chess World Championship in Hong Kong, May 1995.

Cilk grew out of work in both theory and implementation. The theoretical input to Cilk comes from a study of scheduling multithreaded computations, and especially of the performance of work-stealing, which provided a scheduling model that has since been the central theme of Cilk development. These results led to the development of a performance model that accurately predicts the efficiency of a Cilk program using two simple parameters: work and critical-path length. More recent research has included page faults as a measure of locality. An overview of the Cilk model of computation and of its theory can be found in Scheduling Multithreaded Computations by Work Stealing, by Robert D. Blumofe and Charles E. Leiserson. Experimental results and details on the implementation can be found in Cilk: An Efficient Multithreaded Runtime System, by Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, and Yuli Zhou.

The earlier Cilk-3.0 release featured a novel coherence model for shared memory called ``dag consistency''. In dag consistency, the memory model is defined only in terms of the computation dag, rather than of actions of physical processors (which are not part of Cilk's model). Dag consistency and its implementation are described in Dag-Consistent Distributed Shared Memory, by Robert D. Blumofe, Matteo Frigo, Christopher F. Joerg, Charles E. Leiserson, and Keith H. Randall. A paper describing the theory of dag consistency appeared in the 1996 ACM Symposium on Parallel Algorithms and Architectures (SPAA). A generalization of dag consistency has led to the theory of computation-centric memory models.

For the current release Cilk-5, we have completely reimplemented the runtime system for speed and portability. A paper describing the Cilk-5 implementation appears in the 1998 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). The overhead in spawning of a parallel thread in Cilk-5 is typically about 4 times the cost of an ordinary procedure call, so Cilk programs ``scale down'' to run on one processor with nearly the efficiency of analogous C programs.

Cilk features a novel debugging tool called the ``Nondeterminator'' which finds data races in program executions. The Nondeterminator is unique in that it guarantees to find bugs quickly and efficiently. A paper describing the original Nondeterminator appears in the 1997 ACM Symposium on Parallel Algorithms and Architectures (SPAA). The more recent Nondeterminator-2 finds race bugs in Cilk programs that use locks.