Analogically, when a function is total, we can say that it produces a valid result for all possible inputs. For example, the function not is obviously defined for both True and False , which are the only possible values of the bool type. Total functions allow a programmer to reason about programs using the types alone; a function with the type A→ B implies that it is always possible to get a B when you have an A.

1.2. Predicates

A predicate is a function whose codomain (set of all possible outputs) consists only of truth values (i.e., truth or falsehood). A common convention is to denote truth with t and falsehood with f (in the original paper McCarthy used uppercase letters). Simply put, a predicate must return a Boolean.

1.3. Pure functions

One of the most important advantages of purely functional approach over imperative one (and by extension OOP) is referential transparency, which guarantees that for a specific set of arguments a function (or generally, an expression) will always return the same result (it is entirely predictable). In other words, an evaluation of an expression can have no effect other than to compute its value. As a result, one can freely replace expressions by their values. Functions with this property are called pure. Almost all functions in imperative environments are impure.

1.4. Conditional Expressions

The function above is a well-known absolute value. Because its value depends on the condition (whether x is greater-or-equal to zero), we can say that it is conditional in its behavior. However, this notation is not suited for expressing symbolically the dependence of quantities on truth values — it has to rely on English words.

A better notation for representing conditional expressions was proposed by McCarthy (now referred to as McCarthy formalism):

(p₁ → e₁, … , pₙ → eₙ)

Here, the p’s are propositional expressions and the e’s are expressions of any kind. It may be read as:

If p₁ then e₁ otherwise if p₂ then e₂, … , otherwise, if pₙ then eₙ.

Notice how we are assuming here the evaluation order — we consider pᵢ from left to right. With this notation under our belt, we can express the absolute value function simply as:

(x ≥ 0 → x, x < 0 → -x)

More importantly, however, this notation is sufficient to express any if-then statement. For example:

if n % 3 == 0 and n % 5 == 0:

return 'FizzBuzz'

elif n % 3 == 0:

return 'Fizz'

elif n % 5 == 0:

return 'Buzz'

else:

return n

…can be expressed as:

(3 ∣ n ∧ 5 ∣ n → 'Fizzbuzz',

3 ∣ n → 'Fizz',

5 ∣ n → 'Buzz',

t → n)

m | n means that n is divisible by m , or m divides n . ∧ is standard logical conjunction, or AND in Boolean algebra. Notice how by exploiting the left-to-right evaluation order we can encode the else clause using the t value.

Conditionals were introduced by Lisp. In effect, if statements, as we know them today, are thanks to McCarthy. As of 1958, Fortran only had a conditional goto which was based on the branch instruction in the underlying hardware. McCarthy also got conditionals into Algol 60, whence they spread to most other languages⁸.

1.5. Recursive functions and label

Thanks to conditional expressions given above, we can now define functions recursively. For example, we can define factorial as:

n! = (n = 0 → 1, t → n · (n − 1)!)

Note, however, that we are making here one very important assumption — the body of the function can reference the function itself by its name (or symbol — ! in this case).

In most actual languages this is not a problem, but remember — we do not have any language yet!

In general, we can solve this problem with something called Y combinator, but that’s a topic for another time. As McCarthy most likely was not aware of Y combinator at the time of writing his paper, he introduced his notation instead.

Fig 1. McCarthy in his AI laboratory at Stanford

label(a, e) denotes the expression e , with the special property that all occurrences of a within e are to be interpreted as referring to the expression e itself. Therefore, by combining label with conditional expressions, we can define Euclid’s algorithm for calculating the greatest common divisor as follows:

label(gcd, (m > n → gcd(n, m),

m | n → m,

t → gcd(n % m, m)))

It is important to note that although recursion existed as a mathematical concept before Lisp, Lisp was the first programming language to support it.

2. S-expressions

First, we define a symbolic expression (s-expression, or just expression for short). An s-expression is either:

an atom, which is a sequence of characters, e.g. spam , ham , eggs or f*&o$o . An atom cannot start with ( , which is reserved for lists.

, , or . An atom cannot start with , which is reserved for lists. a list of zero or more s-expressions which are separated by whitespace and enclosed by parentheses. For example: (abc) , () , (a (bc)) .

In simplified extended Backus-Naur form (commonly used for describing the syntax of a language), s-expressions can be represented as:

Fig. 2 Simplified EBNF for s-expressions

— which is the exact restatement of the definition given above (except that atoms are assumed to be alphanumeric).

The above definition is recursive, as we are referring to s-expressions while defining lists of s-expressions. Some examples of valid s-expressions include:

()

foo

(bar)

(foo bar)

((spam) (ham ()) eggs ())

— but also:

(+ 1 2 3 5 8 13)

(> 24 512 12)

(and true false true)

3. Abstract machine

“The most practical way to think about the meaning of a program is what it does — when we run the program, what do we expect to happen? How do different constructs in the programming language behave at run time, and what effect do they have when they’re plugged together to make larger programs? This is the basis of operational semantics, a way of capturing the meaning of a programming language by defining rules for how its programs execute on some kind of device. This device is often an abstract machine: an imaginary, idealized computer that is designed for the specific purpose of explaining how the language’s programs will execute. Different kinds of a programming language will usually require different designs of an abstract machine in order to neatly capture their runtime behavior.”⁷ In order to construct Lisp, we first construct an imaginary machine on which it will operate.

Fig. 3 ST µA741 op-amp, or: imaginary Lisp machine

3.1. Elementary functions

We start by defining seven (The Magnificent Seven, in fact) elementary functions on s-expressions. We call them elementary because we assume they exist on the abstract machine (just like we assume axioms to be true in some mathematical system). A function on s-expression receives an s-expression and returns a new one. By convention, we will refer to such functions as operators.

To keep our notation simple, we represent operators using atoms and lists — if an s-expression is a list, the first element is an operator, and remaining ones are arguments. This means that functions on s-expressions are encoded using s-expressions themselves. For example, the car operator expects one argument, and so its invocation will take the form:

(car x)

Atom “ car " on its own refers to the function itself. If car expected an argument which is a function (it doesn’t), we could write (car car) .

atom

atom is a very simple operator; it just returns t if the argument passed is an atom, and otherwise.

> (atom f)

t

> (atom foo)

t

> (atom atom)

t

quote

quote is even simpler, it returns whatever was passed into it.

> (quote x)

x > (quote (atom x))

(atom x) > (quote (1 2 3 4 5))

(1 2 3 4 5)

It is crucial to understand how quote behaves and why is useful. The second example — (quote (atom x)) —will reduce to the list containing atom and x . (atom x) is not “evaluated” further (at this point we do not know what evaluation even is, but let us ignore that for a second). In essence, quote is used to stop the evaluation — normally in (1 2 3) , 1 would be assumed to be an operator but we want to treat the list as a regular list. This is why we would use quote . It is usually abbreviated with ' , which gives us:

> '(1 2 3)

(1 2 3)

Now that we have quote , we can go back to atom for one more example:

> (atom '(foo bar baz))

f

If the code would eventually be evaluated by some kind of program capable of doing that, quote would prevent evaluation of any s-expression given as its argument, which is very important when passing data around.

eq

eq returns t if the values of its arguments are the same atom and f otherwise.

Some languages, such as Scheme, use eq? . Clojure uses = .

car

car expects a list as an argument and returns its first element. It's name, similar to cdr , is a historical legacy.

> (car '(1 2 3 4))

1

Certain languages, such as Clojure, rename car with something more intuitive, e.g. first . The “car” and “cdr” names are derived from the machine architecture that Lisp was designed for and nowadays languages use more familiar names.

cdr

cdr expects a list as an argument and returns its back without the first element. For example:

> (cdr '(1 2 3 4))

(2 3 4)

In Clojure, cdr behavior is provided via rest .

cons

cons (construct) takes two arguments, second of which must be a list, and prepends the list with the first argument:

> (cons 1 '(2 3))

(1 2 3)

Why would we prepend to lists instead of appending them is a topic on its own. In short, lists constructed as the head and rest are simpler to manipulate recursively, which is a huge benefit for a language such as Lisp.

cond

We will now translate conditional expressions into s-expressions using cond . cond has the following form:

(cond (p₁ e₁) ... (pₙ eₙ))

The value is evaluated as follows. We evaluate p expressions from left to right. If any predicate evaluates to t , the value of the cond expression is taken to be the value of the corresponding expression e . For example:

> (cond (t 1) (f 2))

1

Moreover, because we evaluate from left to right:

> (cond (t 1) (t 2))

1

3.2. Denoting non-elementary functions

With basic functionality in place, we define a general scheme for defining new functions on the abstract machine. A function is expressed as

(lambda (p₁ … pₙ) e)

where p₁ -pₙ are called parameters and e is an expression. If you ever tinkered with lambda calculus (or with lambdas/anonymous functions in pretty much any modern programming language), you will recognize this notation.

An expression whose first element is a lambda, i.e., is of the form:

— is called a function call and its value is computed as follows. Each expression aᵢ is evaluated. Then e is evaluated. During the evaluation of e, the value of any occurrence of one of the pᵢ is the value of the corresponding aᵢ in the most recent function call.

Do you remember label notation? We can now translate it to s-expression form and use it on our abstract machine to obtain a general method for defining functions. The label s-expression will have the following form:

(label name lambda)

— where name is an atom representing the name of the function and lambda is the function body, also called to form.

Using label and the Magnificent Seven our expressive power is limitless.

For example, we can define the Boolean operator NOT:

(label not (lambda (x)

(cond (x f) (t t)))) > (not t)

f > (not f)

t

As you can see, NOT expects one argument — x. In the body of the function, we perform a conditional check of x. If the value is equal to t , we return f and otherwise t ( t in the last positional of the cond acts as the else clause in modern languages, since the condition is always true).

Next, we can define a bit more involved Boolean AND:

(label and (lambda (x y)

(cond (x (cond (y t) (t f)))

(t f)))) > (and t f)

f > (and t t)

t

This body can be summarized as: If x is true, check if y is true and if so, return t . If y is not true, or x is not true, return f (we have two else clauses).

We can write functions to manipulate lists, Boolean expressions, pair elements, rewrite expressions, and so on. For example:

(not x) returns t if its argument returns f and t otherwise.

returns if its argument returns and otherwise. (pair x y) takes two lists of the same length and returns a third one containing two-element lists of corresponding elements in argument lits. Nowadays we would more likely call this function zip .

takes two lists of the same length and returns a third one containing two-element lists of corresponding elements in argument lits. Nowadays we would more likely call this function . (append x y) takes two lists and concatenates them.

takes two lists and concatenates them. (assoc x y) takes an atom x and a y of the form return by pair and returns a second-element from a pair in y whose first element is x . This function emulates a lookup into a dictionary (alternatively called a hash table or associative array).

It is important to note that in Lisp, functions are first class objects — they’re data, just like numbers and strings. Therefore, functions can be stored in variables, passed around as arguments, and so on. The idea of a first-class function was pioneered by Lisp.

3.3. Evaluation

More importantly, however, we can write a function that takes arbitrary Lisp expression and returns its value:

That is a bit of mouthful. If you want a step by step description, which is beyond the scope of this article, I recommend reading Recursive Functions… by McCarthy³ and Paul Graham’s The Roots of Lisp², which take a thorough look at eval ‘s inner-workings.

The important thing is this: eval is a Lisp interpreter, and thus, implements the language. Any valid Lisp code can be passed to it for evaluation and we don’t need any other infrastructure. With only seven functions and a bit of notation, we were able to implement the whole language and allow it to define new functions. That’s neat and very elegant, but so what? Now that we understand how Lisp operates, let’s step back and consider what it means.

If you’re curious how Lisp can be implemented in a few hundred lines of code, check out my toy Lisp implementation: Kite.

4. Chemical X: Homoiconicity

In Lisp, as we saw, both code and data are represented in the same form: as s-expressions (forming trees) which are available for unrestricted mutation and execution. This means that the Lisp code is first-class. It can be explicitly constructed from scratch, passed around, modified and executed. In other words, in Lisp everything is data. A language in which the internal «representation matches the external representation is called homoiconic (from the Greek words homo meaning “the same” and icon meaning “representation”). Other examples of a language with this property include Prolog (logic programming) and Julia (numerical analysis and computational science). One of the immediate consequences of homoiconicity is that by using just a few functions: quote , atom , eq , car , cdr , cons , and cond , we can define eval , that fully implements the language, and with label and eval we can define any additional function we need. In fact, it is a famous fact that the core of the Lisp can be written in about 20–30 lines of Lisp. In (an unfair) comparison, the average person who writes a C compiler or interpreter requires about 20,000 lines of C to do so and must be moderately expert about compilers or interpreters (of course, we rely on meta-circularity in doing so, but that’s a topic for another post). At this point, you may be asking yourself — what’s the big deal?

The big deal, as it turns out, that when the representation of the program (code) is the same as the representation for that data, you can manipulate code like you would manipulate data. This allows, for example, for syntactic extensibility, which is the ability to extend (or completely) change the syntax of the language. For example, in a normal language, if you wished that you had the for-each construct, you would have to wait until creators of the language add it in the next version. In Lisp, you can write a macro in a matter of minutes. In many Lisp languages, program transformation is an explicit part of the evaluation strategy of the language.

Moreover, in Lisp, programs composed of expressions. Lisp programs are trees of expressions, each of which returns a value. At the time, this was a revolutionary idea, in contrast to languages such as Fortran which distinguish between expressions and statements. This was largely dictated by the limitations of the hardware available in the late 1950s.

Finally, what McCarthy discovered is not just a programming language but a very elegant model of computation, just a notch above lambda calculus. We can say that Lisp was discovered, and not invented, because McCarthy built it from axioms so fundamental that it would be very hard to make an argument for the otherwise.

Core Lisp corresponds to an untyped, call-by-value lambda calculus extended with constants. As it turned out, lambda calculus* corresponds to natural deduction in logic. Specifically, by Curry-Howard isomorphism, types in a program are propositions, programs are proofs and evaluation of programs is the simplification of proofs. This discovery is phenomenal. Among others, it means that we can reason about programs using logic and that we get theorems in logic for free in programming (and vice versa).

The big difference between Lips and lambda calculus is that lambda calculus doesn’t contain any non-functional data. In lambda calculus, everything is a function, even a number (if you’re curious how this is achieved see Church encodings or Mogensen-Scott encodings). Lisp, on the other hand, operates on atoms and numbers (hence, constants). This is most likely what makes Lisp readable to mortal humans — a property that lambda calculus lacks.

The adherence to mathematically sound principles of computation allows Lisp to reap the benefits in the form of simple semantics, clarity, consistency, and other nice things such as referential transparency (original Lisp formulation is completely value-based). In contrast, almost all imperative languages are “just” tools (sometimes very good tools nonetheless) with undefined formal semantics, which makes reasoning about them rigorously very hard or almost impossible.