[email protected]

I've always been fascinated by stack-oriented languages because of their simplicity and elegance. Instructions take input off of the stack, do something with it, and push the output back onto the stack. Thus, there are no named variables or arguments, and the order of execution is left-to-right without any need for parentheses to denote precedence. As a result, it doesn't take long for people to learn the basics of programming in a stack language.

Another appeal of stack languages is that it is relatively easy to create reasonably efficient implementations for them, especially where memory considerations are at a premium. Because of this, we find stack languages in multicore processors (SeaForth, www.dspdesignline.com/188701424), virtual machines (Java Virtual Machine language), and embedded devices.

In general, when we think of stack languages, we often think of imperative languages: Each instruction does something to a shared stack. An alternative and equally accurate view of stack languages is that each instruction is a function that takes a stack as input and returns a stack as output. This is the principle upon which Manfred von Thun based the Joy language (www.latrobe.edu.au/philosophy/phimvt/joy.html).

The language that Joy most closely resembles is PostScript in that explicit control structures (branches, gotos, jumps) are replaced with higher order instructions (instructions that execute other instructions). Joy, however, introduced explicit function literals (PostScript uses a delayed execution operator) and eliminated the concept of the definition dictionary; in other words, you can't dynamically define or redefine new operations. This yielded a new breed of stack language that shares the advantages of pure functional languages (for example, it is expressive and easy to reason about and manipulate formally). It is interesting to note that despite its similarities to other stack-based languages, Joy evolved independently from Schoenfinkel and Curry's combinatory logic and the FP language by John Backus (www.vector.org.uk/archive/v203/vonthun203.htm).

My interest in Joy was primarily motivated by my search for an intermediate language that could be easily targeted by imperative and functional languages, could be easily optimized, and could be statically verified. Joy relies heavily on dynamic checking, so I created a more restricted, statically typed language based on Joy and called it "Cat."

Introducing Cat

In the Cat specification, instructions are referred to as "functions," regardless of whether they have side effects. New functions are defined using the define keyword, and have global scope. Functions cannot be redefined, and are only visible after they are defined. Example 1 presents some simple examples.

define myFirstCatProgram { "what is your name?" writeln readln "Hello " swap strcat writeln } define addone : (int -> int) { 1 + } define swapd : ('a 'b 'c -> 'b 'a 'c) { [swap] dip } define dupd : ('a 'b -> 'a 'a 'b) { [dup] dip } define bury : ('a 'b 'c -> 'c 'a 'b) { swap swapd } define fib // compute fibonnacci { dup 1 <= [dup dec fib swap fib +] [] if } define fact // compute factorial { dup <= 1 [dup dec fact *] [pop 1] if }

At the core of the Cat language are primitive instructions for:

Mathematical operations: add, mul, sub, div, mod, rem, inc, dec.

Logical operations: and, or, not, xor.

Function construction: c ompose, papply, quote.

Executing functions: apply, dip, if, while.

Manipulating the stack: swap, dup, pop.

Dealing with lists: list, cons, uncons, head, tail, count, nth.

Comparing values: eq, lt, gt, lteq, gteq, neq.

In Cat, a literal (for example, 42, 'q', "Hello Christopher

", 3.14) pushes a value onto the stack. Additionally, you can also write literal functions, by enclosing an expression in square braces ([1 +]). This has the effect of pushing a function onto the stack without executing it. In Joy parlance, this is called a "quotation," but I think of it as an anonymous function. Anonymous functions are first class values: They can be constructed dynamically and treated like any other primitive value (you can dup them, swap them, pop them, and so on). You can execute anonymous functions using higher order functions such as apply, if, and while.

Closures (functions bound to values in the local environment) can be constructed in Cat using the papply primitive instruction. This has the effect of binding the top value on the stack to the function, a process known as "partial application" (partial application is frequently mislabeled as "currying"). An example of partial application is that if you wrote 1 [<=] papply, it has the same effect as if you wrote [1 <=].

The quicksort algorithm in Example 2 is a more sophisticated example of Cat. The algorithm relies on a binary recursion instruction (bin_rec) that provides a general implementation of a binary recursive process (also called "tree recursion"). Example 3 is a possible definition of bin_rec.

define qsort : (list -> list) {{ desc: This is a simple implementation of a quick-sort algorithm. test: in: [5 4 2 1 3 2 4] list qsort out: [5 4 4 3 2 2 1] list }} { // Does list have 0 or 1 elements? [small] // Base case do nothing [] // Argument-relation // Split the list using the head as a pivot // storing the pivot under for later use [uncons under [lt] papply split] // Append the pivot to the first list // then concatenate the two lists. [[swap cons] dip cat] bin_rec }

define binrec(input, cond, term, unfold, fold) { input cond [term] // termination transformation [input unfold // split input [cond term unfold fold binrec] // self call with same arguments app2 // call function twice: // once with second value removed // once with top value removed ] if }

The bin_rec function is an example of a hylomorphism (see citeseer.ist.psu.edu/meijer91functional.html)the composition of an anamorphism (an unfolding function) and a catamorphism (a folding function). Hylomorphisms are interesting because they can be used to eliminate the construction of intermediate data structures (citeseer.ist.psu.edu/launchbury95warm.html).

The quicksort algorithm in Example 2 also demonstrates an extended feature of Cat called "metadata"a form of structured comment that can associate additional data with a Cat function that can be used by tools. For example, metadata can be used to document functions and perform automatic unit tests. The format is based on YAML (Yet Another Markup Language) and uses significant whitespace to denote hierarchical structure.

To demonstrate how compact Cat can be, Example 4 includes an implementation of the Google MapReduce algorithm (labs.google.com/papers/mapreduce.html). The general idea of the Google MapReduce algorithm is to define a task in terms of subtasks that can be executed (in this example, counting instances of words), and a function to combine the results of the subtasks (called the "reduce function"). While my implementations of Cat do not execute MapReduce concurrently, you can easily develop an implementation of Cat that automatically executes map and self_join to take advantage of available parallelism in the executing environment. This leads to an important point about Cat: As an implementor, you decide how to implement the primitive instructions and whether to implement standard library functions as library functions or built-in functions. This opens lots of opportunities for high-performance implementations.