First published Wed Dec 12, 2012; substantive revision Wed Mar 21, 2018

The \(\lambda\)-calculus is, at heart, a simple notation for functions and application. The main ideas are applying a function to an argument and forming functions by abstraction. The syntax of basic \(\lambda\)-calculus is quite sparse, making it an elegant, focused notation for representing functions. Functions and arguments are on a par with one another. The result is a non-extensional theory of functions as rules of computation, contrasting with an extensional theory of functions as sets of ordered pairs. Despite its sparse syntax, the expressiveness and flexibility of the \(\lambda\)-calculus make it a cornucopia of logic and mathematics. This entry develops some of the central highlights of the field and prepares the reader for further study of the subject and its applications in philosophy, linguistics, computer science, and logic.

The \(\lambda\)-calculus is an elegant notation for working with applications of functions to arguments. To take a mathematical example, suppose we are given a simple polynomial such as \(x^2 -2\cdot x+5\). What is the value of this expression when \(x = 2\)? We compute this by ‘plugging in’ 2 for \(x\) in the expression: we get \(2^2 -2\cdot 2+5\), which we can further reduce to get the answer 5. To use the \(\lambda\)-calculus to represent the situation, we start with the \(\lambda\)-term

\[ \lambda x[x^2 -2\cdot x+5]. \]

The \(\lambda\) operators allows us to abstract over \(x\). One can intuitively read ‘\(\lambda x[x^2 -2\cdot x+5]\)’ as an expression that is waiting for a value \(a\) for the variable \(x\). When given such a value \(a\) (such as the number 2), the value of the expression is \(a^2 -2\cdot a+5\). The ‘\(\lambda\)’ on its own has no significance; it merely binds the variable \(x\), guarding it, as it were, from outside interference. The terminology in \(\lambda\)-calculus is that we want to apply this expression to an argument, and get a value. We write ‘\(Ma\)’ to denote the application of the function \(M\) to the argument \(a\). Continuing with the example, we get:

\[\begin{align} (\lambda x[x^2 -2\cdot x+5])2 \rhd 2^2& -2\cdot 2+5 &\langle \text{Substitute 2 for } x\rangle \\ &= 4-4+5 &\langle\text{Arithmetic}\rangle \\ &= 5 &\langle\text{Arithmetic}\rangle \end{align}\]

The first step of this calculation, plugging in ‘2’ for occurrences of \(x\) in the expression ‘\(x^2 - 2\cdot x + 5\)’, is the passage from an abstraction term to another term by the operation of substitution. The remaining equalities are justified by computing with natural numbers.

This example suggests the central principle of the \(\lambda\)-calculus, called \(\beta\)-reduction, which is also sometimes called \(\beta\)-conversion:

\[ \tag{\(\beta\)} (\lambda x[M])N \rhd M[x := N] \]

The understanding is that we can reduce or contract \((\rhd)\) an application \((\lambda xM)N\) of an abstraction term (the left-hand side, \(\lambda xM)\) to something (the right-hand side, \(N)\) by simply plugging in \(N\) for the occurrences of \(x\) inside \(M\) (that’s what the notation ‘\(M[x := N]\)’ expresses). \(\beta\)-reduction, or \(\beta\)-conversion, is the heart of the \(\lambda\)-calculus. When one actually applies \(\beta\)-reduction to reduce a term, there is an important proviso that has to be observed. But this will be described in Section 2.1, when we discuss bound and free variables.

What about functions of multiple arguments? Can the \(\lambda\)-calculus represent operations such as computing the length of the hypotenuse of a right triangle:

Hypotenuse of a right triangle with legs of length \(x\) and \(y \Rightarrow \sqrt{x^2 + y^2}\).

The length-of-hypotenuse operation maps two positive real numbers \(x\) and \(y\) and to another positive real number. One can represent such multiple-arity operations using the apparatus of the \(\lambda\)-calculus by viewing the operation as taking one input at a time. Thus, the operation can be seen as taking one input, \(x\), a positive real number, and producing as its value not a number, but an operation: namely, the operation that takes a positive real number \(y\) as input and produces as output the positive real number \(\sqrt{x^2 + y^2}\). One could summarize the discussion by saying that the operation, hypotenuse-length, that computes the length of the hypotenuse of a right triangle given the lengths \(a\) and \(b\) of its legs, is:

hypotenuse-length \(:= \lambda a[\lambda b[\sqrt{a^2 + b^2}]]\)

By the principle of \(\beta\)-reduction, we have, for example, that hypotenuse-length 3, the application of hypotenuse-length to 3, is \(\lambda b[\sqrt{3^2 + b^2}]\), which is a function of that is ‘waiting’ for another argument. The \(\lambda\)-term hypotenuse-length 3 can be viewed as a function that computes the length of the hypotenuse of a right triangle one of whose legs has length 3. We find, finally, that (hypotenuse-length 3)4—the application of hypotenuse-length to 3 and then to 4—is 5, as expected.

Another way to understand the reduction of many-place functions to one-place functions is to imagine a machine \(M\) that initially starts out by loading the first \(a\) of multiple arguments \(a, b,\ldots\) into memory. If one then suspends the machine after it has loaded the first argument into memory, one can view the result as another machine M\(_a\) that is awaiting one fewer input; the first argument is now fixed.

An important philosophical issue concerning the \(\lambda\)-calculus is the question of its underlying concept of functions. In set theory, a function is standardly understood as a set of argument-value pairs. More specifically, a function is understood as a set \(f\) of ordered pairs satisfying the property that \((x,y) \in f\) and \((x,z) \in f\) implies \(y = z\). If \(f\) is a function and \((x,y) \in f\), this means that the function f assigns the value \(y\) to the argument \(x\). This is the concept of functions-as-sets. Consequently, the notion of equality of functions-as-sets is equality qua sets, which, under the standard principle of extensionality, entails that two functions are equal precisely when they contain the same ordered pairs. In other words, two functions are identical if and only if they assign the same values to the same arguments. In this sense, functions-as-sets are extensional objects.

In contrast, the notion of a function at work in \(\lambda\)-calculus is one where functions are understood as rules: a function is given by a rule for how to determine its values from its arguments. More specifically, we can view a \(\lambda\)-term \(\lambda x[M]\) as a description of an operation that, given \(x\), produces \(M\); the body \(M\) of the abstraction term is, essentially, a rule for what to do with \(x\). This is the conception of functions-as-rules. Intuitively, given rules \(M\) and \(N\), we cannot in general decide whether \(\lambda x[M]\) is equal to \(\lambda x[N]\). The two terms might ‘behave’ the same (have the same value given the same arguments), but it may not be clear what resources are needed for showing the equality of the terms. In this sense, functions-as-rules are non-extensional objects.

To distinguish the extensional concept of functions-as-sets from the non-extensional concept of functions-as-rules, the latter is often referred to as an ‘intensional’ function concept, in part because of the ostensibly intensional concept of a rule involved. This terminology is particularly predominant in the community of mathematical logicians and philosophers of mathematics working on the foundations of mathematics. But from the perspective of the philosophy of language, the terminology can be somewhat misleading, since in this context, the extensional-intensional distinction has a slightly different meaning.

In the standard possible-worlds framework of philosophical semantics, we would distinguish between an extensional and an intensional function concept as follows. Let us say that that two functions are extensionally equivalent at a world if and only if they assign the same values to the same arguments at that world. And let us say that two functions are intensionally equivalent if and only if they assign the same values to the same arguments at every possible-world. To illustrate, consider the functions highest-mountain-on-earth and highest-mountain-in-the-Himalayas, where highest-mountain-on-earth assigns the highest mountain on earth as the value to every argument and highest-mountain-in-the-Himalayas assigns the highest mountain in the Himalayas as the value to every argument. The two functions are extensionally equivalent (at the actual world), but not intensionally so. At the actual world, the two functions assign the same value to every argument, namely Mt. Everest. Now consider a world where Mt. Everest is not the highest mountain on earth, but say, Mt. Rushmore is. Suppose further that this is so, just because Mt. Rushmore is 30.000 feet/9.100 m higher than it is at the actual world, while Mt. Everest, with its roughly 29.000 feet/8.800 m, is still the highest mountain in the Himalayas. At that world, highest-mountain-on-earth now assigns Mt. Rushmore as the value to every argument, while highest-mountain-in-the-Himalayas still assigns Mt. Everest to every object. In other words, highest-mountain-on-earth and highest-mountain-in-the-Himalayas are extensionally equivalent (at the actual world) but not intensionally equivalent.

A function concept may now be called extensional if and only if it requires functions that are extensionally equivalent at the actual world to be identical. And a function concept may be classified as intensional if and only if it requires intensionally equivalent functions to be identical. Note that these classifications are conceptually different from the distinctions commonly used in the foundations of mathematics. On the terminology used in the foundations of mathematics, functions-as-sets are classified as extensional since they use the axiom of extensionality as their criterion of identity, and functions-as-rules are classified as intensional because they rely on the ostensibly intensional concept of a rule. In the present possible-worlds terminology, function concepts are classified as extensional or intensional based of their behavior at possible-worlds.

An issue from which conceptual confusion might arise is that the two terminologies potentially pass different verdicts on the function concept at work in the \(\lambda\)-calculus. To see this, consider the following two functions:

\[\begin{align} \addone &:= \lambda x[x+1] \\ \addtwosubtractone &:= \lambda x[[x+2]-1] \end{align}\]

These two functions are clearly extensionally equivalent: they assign the same value to the same input at the actual world. Moreover, given standard assumptions in possible worlds semantics, the two functions are also intensionally equivalent. If we assume that mathematical facts, like facts about addition and subtraction, are necessary in the sense that they are the same at every possible world, then we get that the two functions give the same value to the arguments at every possible world. So, an intensional function concept would require the two functions to be identical. In the \(\lambda\)-calculus, however, it’s not clear at all that we should identify the two functions. Formally speaking, without the help of other principle, we cannot show that the two \(\lambda\)-terms denote the same function. Moreover, informally speaking, on the conception of functions-as-rules, it’s not even clear that we should identify them: the two terms involve genuinely different rules, and so we might be tempted to say that they denote different functions.

A function concept that allows for intensionally equivalent functions to be distinct is called hyperintensional. The point is that in possible-worlds terminology, the function concept at work in the \(\lambda\)-calculus may be regarded not as intentional but hyperintensional—in contrast to what the terminology common in the foundations of mathematics says. Note that it’s unclear how an intensional semantic framework, like the possible-worlds framework, could even in principle account for a non-intensional function concept. On the semantics of the \(\lambda\)-calculus, see section 7. The point here was simply to clarify any conceptual confusions that might arise from different terminologies at play in philosophical discourse.

The hyperintensionality of the \(\lambda\)-calculus is particularly important when it comes to its applications as a theory of not only functions, but more generally \(n\)-ary relations. On this, see section 9.3. It is effectively the hyperintensionality of the \(\lambda\)-calculus that makes it an attractive tool in this context. It should be noted, however, that the \(\lambda\)-calculus can be made extensional (as well as intensional) by postulating additional laws concerning the equality of \(\lambda\)-terms. On this, see section 5.

The official syntax of the \(\lambda\)-calculus is quite simple; it is contained in the next definition.

Definition For the alphabet of the language of the \(\lambda\)-calculus we take the left and right parentheses, left and right square brackets, the symbol ‘\(\lambda\)’, and an infinite set of variables. The class of \(\lambda\)-terms is defined inductively as follows: Every variable is a \(\lambda\)-term. If \(M\) and \(N\) are \(\lambda\)-terms, then so is \((MN)\). If \(M\) is a \(\lambda\)-term and \(x\) is a variable, then \((\lambda x[M])\) is a \(\lambda\)-term. By ‘term’ we always mean ‘\(\lambda\)-term’. Terms formed according to rule (2) are called application terms. Terms formed according to rule (3) are called abstraction terms.

As is common when dealing with formal languages that have grouping symbols (the left and right parenthesis, in our case), some parentheses will be omitted when it is safe to do so (that is, when they can be reintroduced in only one sensible way). Juxtaposing more than two \(\lambda\)-terms is, strictly speaking, illegal. To avoid the tedium of always writing all needed parentheses, we adopt the following convention:

Convention (association to the left): When more than two terms \(M_1 M_2 M_3 \ldots M_n\) are juxtaposed we can recover the missing parentheses by associating to the left: reading from left to right, group \(M_1\) and \(M_2\) together, yielding \((M_1 M_2)M_3 \ldots M_n\); then group \((M_1 M_2)\) with \(M_3\): \(((M_1 M_2)M_3)\ldots M_n\), and so forth.

The convention thus gives a unique reading to any sequence of \(\lambda\)-terms whose length is greater than 2.

The function of \(\lambda\) in an abstraction term \((\lambda x[M]\)) is that it binds the variable appearing immediately after it in the term \(M\). Thus \(\lambda\) is analogous to the universal and existential quantifiers \(\forall\) and \(\exists\) of first-order logic. One can define, analogously, the notions of free and bound variable in the expected way, as follows.

Definition The syntactic functions \(\mathbf{FV}\) and \(\mathbf{BV}\) (for ‘free variable’ and ‘bound variable’, respectively) are defined on the set of \(\lambda\)-terms by structural induction thus: For every variable \(x\), term \(M\), and term \(N\): \[\begin{array}{lll} &\text{Free} &\text{Bound} \\ (1) &\mathbf{FV}(x) = \{ x \} \quad &\mathbf{BV}(x) = \{ \varnothing \} \\ (2) &\mathbf{FV}(MN) = \mathbf{FV}(M) \cup \mathbf{FV}(N) &\mathbf{BV}(MN) = \mathbf{BV}(M) \cup \mathbf{BV}(N) \\ (3) &\mathbf{FV}(\lambda x[M]) = \mathbf{FV}(M) - \{ x \} &\mathbf{BV}(\lambda x[M]) = \mathbf{BV}(M) \cup \{ x \} \end{array}\] \[\begin{array}{lll} &\text{Free} &\text{Bound} \\ (1) &\mathbf{FV}(x) = \{ x \} \quad &\mathbf{BV}(x) = \{ \varnothing \} \\ (2) &\mathbf{FV}(MN) = \mathbf{FV}(M) \cup \mathbf{FV}(N) &\mathbf{BV}(MN) = \mathbf{BV}(M) \cup \mathbf{BV}(N) \\ (3) &\mathbf{FV}(\lambda x[M]) = \mathbf{FV}(M) - \{ x \} &\mathbf{BV}(\lambda x[M]) = \mathbf{BV}(M) \cup \{ x \} \end{array}\] If \(\mathbf{FV}(M) = \varnothing\) then \(M\) is called a combinator.

Clause (3) in the two definitions supports the intention that \(\lambda\) binds variables (ensures that they are not free). Note the difference between \(\mathbf{BV}\) and \(\mathbf{FV}\) for variables.

As is typical in other subjects where the concepts appear, such as first-order logic, one needs to be careful about the issue; a casual attitude about substitution can lead to syntactic difficulties.[1] We can defend a casual attitude by adopting the convention that we are interested not in terms themselves, but in a certain equivalence class of terms. We now define substitution, and then lay down a convention that allows us to avoid such difficulties.

Definition (substitution) We write ‘\(M[x := N]\)’ to denote the substitution of \(N\) for the free occurrences of \(x\) in \(M\). A precise definition[2] by recursion on the set of \(\lambda\)-terms is as follows: for all terms \(A\), \(B\), and \(M\), and for all variables \(x\) and \(y\), we define \(x[x := M] \equiv M\) \(y[x := M] \equiv y\) (\(y\) distinct from \(x)\) \((AB)[x := M] \equiv A[x := M]B[x := M]\) \((\lambda x[A])[x := M] \equiv \lambda x[A]\) \((\lambda y[A])[x := M] \equiv \lambda y[A[x := M]]\) (\(y\) distinct from \(x)\)

Clause (1) of the definition simply says that if we are to substitute \(M\) for \(x\) and we are dealing simply with \(x\), then the result is just \(M\). Clause (2) says that nothing happens when we are dealing (only) with a variable different from \(x\) but we are to substitute something for \(x\). Clause (3) tells us that substitution unconditionally distributes over applications. Clauses (4) and (5) concern abstraction terms and parallel clauses (1) and (2) (or rather, clauses (2) and (1), in opposite order): If the bound variable \(z\) of the abstraction term \(\lambda z[A]\) is identical to the variable \(x\) for which we are to do a substitution, then we do not perform any substitution (that is, substitution “stops”). This coheres with the intention that \(M[x := N]\) is supposed to denote the substitution of \(N\) for the free occurrences of \(x\) in \(M\). If \(M\) is an abstraction term \(\lambda x[A]\) whose bound variable is \(x\), then \(x\) does not occurr freely in \(M\), so there is nothing to do. This explains clause 4. Clause (5), finally, says that if the bound variable of an abstraction term differs from \(x\), then at least \(x\) has the “chance ” to occur freely in the abstraction term, and substitution continues into the body of the abstraction term.

Definition (change of bound variables, \(\alpha\)-convertibility). The term \(N\) is obtained from the term \(M\) by a change of bound variables if, roughly, any abstraction term \(\lambda x[A]\) inside \(M\) has been replaced by \(\lambda y[A[x := y]]\). Let us say that terms \(M\) and \(N\) are \(\alpha\)-convertible if there is a sequence of changes of bound variables starting from \(M\) and ending at \(N\). Axiom. \(\beta\)-conversion (stated with a a no-capture proviso): \( (\lambda x[M])N \rhd M[x := N]\),

provided no variable that occurrs free in \(N\) becomes bound after its substitution into \(M\).

Roughly, we need to adhere to the principle that free variables ought to remain free; when an occurrence of a variable is threatened to become bound by a substitution, simply perform enough \(\alpha\)-conversions to sidestep the problem. If we keep this in mind, we can work with \(\lambda\)-calculus without worrying about these nettlesome syntactic difficulties. So, for example, we can't apply the function \(\lambda x[\lambda y[x(y-5)]]\) to the argument \(2y\) because upon substitution of “\(2y\)” for “\(x\)”, the “\(y\)” in “\(2y\)” would be captured by the variable-binding operator “\(\lambda y\)”. Such a substitution would yield a function different from the one intended. However, we can first transform \(\lambda x[\lambda y[x(y-5)]]\) to \(\lambda x[\lambda z[x(z-5)]]\) by \(\alpha\)-conversion, and then apply this latter function to the argument \(2y\). So whereas the following is not a valid use of \(\beta\)-conversion: \[ (\lambda x[\lambda y[x(y-5)]])2y \rhd \lambda y[2y(y-5)]\] we can validly use \(\beta\)-conversion to conclude: \[ (\lambda x[\lambda z[x(z-5)]])2y \rhd \lambda z[2y(z-5)]\] This example helps one to see why the proviso to \(\beta\)-conversion is so important. The proviso is really no different from the one used in the statement of an axiom of the predicate calculus, namely: \(\forall x\phi \to \phi^{\tau}_x\), provided no variable that is free in the term \(\tau\) before the substitution becomes bound after the substitution.

The syntax of \(\lambda\)-calculus is quite flexible. One can form all sorts of terms, even self-applications such as \(xx\). Such terms appear at first blush to be suspicious; one might suspect that using such terms could lead to inconsistency, and in any case one might find oneself reaching for a tool with which to forbid such terms. If one were to view functions and sets of ordered pairs of a certain kind, then the \(x\) in \(xx\) would be a function (set of ordered pairs) that contains as an element a pair \((x,y)\) whose first element would be \(x\) itself. But no set can contain itself in this way, lest the axiom of foundation (or regularity) be violated. Thus, from a set theoretical perspective such terms are clearly dubious. Below one can find a brief sketch of one such tool, type theory. But in fact such terms do not lead to inconsistency and serve a useful purpose in the context of \(\lambda\)-calculus. Moreover, forbidding such terms, as in type theory, does not come for free (e.g., some of the expressiveness of untyped \(\lambda\)-calculus is lost).

As defined earlier, a combinator is a \(\lambda\)-term with no free variables. One can intuitively understand combinators as ‘completely specified’ operations, since they have no free variables. There are a handful of combinators that have proven useful in the history of \(\lambda\)-calculus; the next table highlights some of these special combinators. Many more could be given (and obviously there are infinitely many combinators), but the following have concise definitions and have proved their utility. Below is a table of some standard \(\lambda\)-terms and combinators.

Name Definition & Comments \(\bS\) \(\lambda x[\lambda y[\lambda z[xz(yz)]]]\)

Keep in mind that ‘\(xz(yz)\)’ is to be understood as the application \((xz)(yz)\) of \(xz\) to \(yz. \bS\) can thus be understood as a substitute-and-apply operator: \(z\) ‘intervenes’ between \(x\) and \(y\): instead of applying \(x\) to \(y\), we apply \(xz\) to \(yz\). \(\mathbf{K}\) \(\lambda x[\lambda y[x]]\)

The value of \(\mathbf{K}M\) is the constant function whose value for any argument is simply \(M.\) \(\mathbf{I}\) \(\lambda x[x]\)

The identity function. \(\mathbf{B}\) \(\lambda x[\lambda y[\lambda z[x(yz)]]]\)

Recall that ‘\(xyz\)’ is to be understood as \((xy)z\), so this combinator is not a trivial identity function. \(\mathbf{C}\) \(\lambda x[\lambda y[\lambda z[xzy]]]\)

Swaps an argument. \(\mathbf{T}\) \(\lambda x[\lambda y[x]]\)

Truth value true. Identical to \(\mathbf{K}\). We shall see later how these representations of truth values plays a role in the blending of logic and \(\lambda\)-calculus. \(\mathbf{F}\) \(\lambda x[\lambda y[y]]\)

Truth value false. \(\boldsymbol{\omega}\) \(\lambda x[xx]\)

Self-application combinator \(\boldsymbol{\Omega}\) \(\boldsymbol{\omega \omega}\)

Self-application of the self-application combinator. Reduces to itself. \(\mathbf{Y}\) \(\lambda f[(\lambda x[f(xx)])(\lambda x[f(xx)]\))]

Curry’s paradoxical combinator. For every \(\lambda\)-term \(X\), we have: \[\begin{align} \mathbf{Y}X &\rhd (\lambda x[X(xx)])(\lambda x[X(xx)]) \\ &\rhd X((\lambda x[X(xx)])(\lambda x[X(xx)])) \end{align}\] The first step in the reduction shows that \(\mathbf{Y}\)X reduces to the application term \((\lambda x[X(xx)])(\lambda x[X(xx)]\)), which is recurring in the third step. Thus, \(\mathbf{Y}\) has the curious property that \(\mathbf{Y}\)X and X\((\mathbf{Y}\)X) reduce to a common term. \(\boldsymbol{\Theta}\) \((\lambda x[\lambda f[f(xxf)]])(\lambda x[\lambda f[f(xxf)]]\))

Turing’s fixed-point combinator. For every \(\lambda\)-term \(X\), \(\boldsymbol{\Theta}X\) reduces to \(X(\boldsymbol{\Theta}X)\), which one can confirm by hand. (Curry’s paradoxical combinator \(\mathbf{Y}\) does not have this property.)

Below is a table of notational conventions employed in this entry.

Notation Reading & Comments \(MN\) The application of the function \(M\) to the argument \(N\). Usually, parentheses are used to separate the function from the argument, like so: ‘\(M(N)\)’. However, in \(\lambda\)-calculus and kindred subjects the parentheses are used as grouping symbols. Thus, it is safe to write the function and the argument adjacent to one other. \(PQR\) The application of the function \(PQ\)—which is itself the application of the function \(P\) to the argument \(Q\)—to \(R\). If we do not use parentheses to separate function and argument, how are we to disambiguate expressions that involve three or more terms, such as ‘\(PQR\)’? Recall our convention that we are to understand such officially illegal expressions by working from left to right, always putting parentheses around adjacent terms. Thus, ‘\(PQR\)’ is to be understood as \((PQ)R\). ‘\(PQRS\)’ is \(((PQ)R)S\). The expression ‘\((PQ)R\)’ is disambiguated; by our convention, it is identical to \(PQR\). The expression ‘\(P(QR)\)’ is also explicitly disambiguated; it is distinct from \(PQR\) because it is the application of \(P\) to the argument \(QR\) (which is itself the application of the function \(Q\) to the argument \(R)\). \((\lambda x[M])\) The \(\lambda\) term that binds the variable \(x\) in the \(\boldsymbol{body}\) term \(M\). The official vocabulary of the \(\lambda\)-calculus consists of the symbol ‘\(\lambda\)’, left ‘(’and right ‘)’ parentheses, and a set of variables (assumed to be distinct from the three symbols ‘\(\lambda\)’, ‘(’, and ‘)’ lest we have syntactic chaos). Alternative notation. It is not necessary to include two kinds of grouping symbols (parentheses and square brackets) in the syntax. Parentheses or square brackets alone would obviously suffice. The two kinds of brackets are employed in this entry for the sake of readability. Given the two kinds of grouping symbols, we could economize further and omit the parentheses from abstraction terms, so that ‘\((\lambda x[M]\))’ would be written as ‘\(\lambda x[M]\)’. Some authors write ‘\(\lambda x.M\)’ or ‘\(\lambda x\cdot M\)’, with a full stop or a centered dot separating the bound variable from the body of the abstraction term. As with the square brackets, these devices are intended to assist reading \(\lambda\)-terms; they are usually not part of the official syntax. (One sees this device used in earlier works of logic, such as Principia Mathematica, where the function of the symbol . in expressions such as ‘\(\forall x\).\(\phi\)’ is to get us to read the whole of the formula \(\phi\) as under the scope of the \(\forall x\).) Some authors write abstraction terms without any device separating the bound variable from the body: such terms are crisply written as, e.g., ‘\(\lambda xx\)’, ‘\(\lambda yx\)’. The practice is not without its merits: it is about as concise as one can ask for, and permits an even simpler official syntax of the \(\lambda\)-calculus. But this practice is not flawless. In ‘\(\lambda xyz\)’, is the bound variable \(x\) or is it \(xy\)? Usually the names of variables are single letters, and theoretically this is clearly sufficient. But it seems unduly restrictive to forbid the practice of giving longer names to variables; indeed, such constructions arise naturally in computer programming languages. For the sake of uniformity, we will adopt the square bracket notation in this entry. (Incidentally, this notation is used in (Turing, 1937).) \(M[x := A]\) The \(\lambda\)-term that is obtained by substituting the \(\lambda\)-term A for all free occurrences of \(x\) inside \(M\). A bewildering array of notations to represent substitution can be found in the literature on \(\lambda\)-calculus and kindred subjects: \[ M[x/A], M[A/x], M_{x}^A, M_{A}^x, [x/A]M,\ldots \] Which notation to use for substitution seems to be a personal matter. In this entry we use a linear notation, eschewing superscripts and subscripts. The practice of representing substitution with ‘:=’ comes from computer science, where ‘:=’ is read in some programming languages as assigning a value to a variable. As with the square brackets employed to write abstraction terms, the square brackets employed to write substitution are not officially part of the syntax of the \(\lambda\)-calculus. \(M\) and A are terms, \(x\) is a variable; \(M[x := A]\) is another term. \(M \equiv N\) The \(\lambda\)-terms \(M\) and \(N\) are identical: understood as sequences of symbols, \(M\) and \(N\) have the same length and corresponding symbols of the sequences are identical. The syntactic identity relation \(\equiv\) is not part of the official syntax of \(\lambda\)-calculus; this relation between \(\lambda\)-terms belongs to the metatheory of \(\lambda\)-calculus. It is clearly a rather strict notion of equality between \(\lambda\)-terms. Thus, it is not the case (if \(x\) and \(y\) are distinct variables) that \(\lambda x[x] \equiv \lambda y[y]\), even though these two terms clearly ‘behave’ in the same way in the sense that both are expressions of the identity operation \(x \Rightarrow x\). Later we will develop formal theories of equality of \(\lambda\)-terms with the aim of capturing this intuitive equality of \(\lambda x[x]\) and \(\lambda y[y]\).

\(\lambda\)-calculus arose from the study of functions as rules. Already the essential ingredients of the subject can be found in Frege’s pioneering work (Frege, 1893). Frege observed, as we did above, that in the study of functions it is sufficient to focus on unary functions (i.e., functions that take exactly one argument). (The procedure of viewing a multiple-arity operation as a sequence of abstractions that yield an equivalent unary operation is called currying the operation. Perhaps it would be more historically accurate to call the operation fregeing, but there are often miscarriages of justice in the appellation of mathematical ideas.) In the 1920s, the mathematician Moses Schönfinkel took the subject further with his study of so-called combinators. As was common in the early days of the subject, Schönfinkel was interested in the kinds of transformations that one sees in formal logic, and his combinators were intended to be a contribution to the foundations of formal logic. By analogy with the reduction that one sees in classical propositional logic with the Sheffer stroke, Schöfinkel established the astonishing result that the all functions (in the sense of all transformations) could be given in terms of the combinators \(\mathbf{K}\) and \(\bS\); later we will see the definition of these combinators.

Theorem For every term \(M\) made up of \(\mathbf{K}\) and \(\bS\) and the variable \(x\), there exists a term \(F\) (built only from \(\mathbf{K}\) and \(\bS)\) such that we can derive \(Fx = M\).

(The proof that these two suffice to represent all functions is beyond the scope of this entry. For further discussion, see the entry on combinatory logic.) One can prove the theorem constructively: there is an algorithm that, given \(M\), produces the required \(F\). Church called this \(F\) ‘\(\lambda x[M]\)’ (Church, 1932).[3] From this perspective, the \(\beta\)-rule can be justified: if ‘\(\lambda x[M]\)’ is to be a function \(F\) satisfying \(Fx = M\), then \(\lambda x[M]\)x should transform to \(M\). This is just a special case of the more general principle that for all \(N, (\lambda x[M])N\) should transform to \(M[x := N]\).

Although today we have more clearly delimited systems of abstraction and rewriting, in its early days \(\lambda\)-calculus and combinatory logic (à la Schönfinkel) were bound up with investigations of foundations of mathematics. In the hands of Curry, Church, Kleene, and Rosser (some of the pioneers in the subject) the focus was on defining mathematical objects and carrying out logical reasoning inside the these new systems. It turned out that these early attempts at so-called illative \(\lambda\)-calculus and combinatory logic were inconsistent. Curry isolated and polished the inconsistency; the result is now known as Curry’s paradox. See the entry on Curry’s paradox and appendix B of (Barendregt, 1985).

The \(\lambda\)-calculus earns a special place in the history of logic because it was the source of the first undecidable problem. The problem is: given \(\lambda\)-terms \(M\) and \(N\), determine whether \(M = N\). (A theory of equational reasoning about \(\lambda\)-terms has not yet been defined; the definition will come later.) This problem was shown to be undecidable.

Another early problem in the \(\lambda\)-calculus was whether it is consistent at all. In this context, inconsistency means that all terms are equal: one can reduce any \(\lambda\)-term \(M\) to any other \(\lambda\)-term \(N\). That this is not the case is an early result of \(\lambda\)-calculus. Initially one had results showing that certain terms were not interconvertible (e.g., \(\mathbf{K}\) and \(\bS)\); later, a much more powerful result, the so-called Church-Rosser theorem, helped shed more light on \(\beta\)-conversion and could be used to give quick proofs of the non-inter-convertibility of whole classes of \(\lambda\)-terms. See below for more detailed discussion of consistency.

The \(\lambda\)-calculus was a somewhat obscure formalism until the 1960s, when, at last, a ‘mathematical’ semantics was found. Its relation to programming languages was also clarified. Till then the only models of \(\lambda\)-calculus were ‘syntactic’, that is, were generated in the style of Henkin and consisted of equivalence classes of \(\lambda\)-terms (for suitable notions of equivalence). Applications in the semantics of natural language, thanks to developments by Montague and other linguists, helped to ‘spread the word’ about the subject. Since then the \(\lambda\)-calculus enjoys a respectable place in mathematical logic, computer science, linguistics (see, e.g., Heim and Kratzer 1998), and kindred fields.

Various notions of reduction for \(\lambda\)-terms are available, but the principal one is \(\beta\)-reduction, which we have already seen earlier. Earlier we used the notation ‘\(\rhd\)’; we can be more precise. In this section we discuss \(\beta\)-reduction and some extensions.

Definition (one-step \(\beta\)-reduction \(\rhd_{\beta ,1})\) For \(\lambda\)-terms \(A\) and \(B\), we say that \(A\) \(\beta\)-reduces in one step to \(B\), written \(A \rhd_{\beta ,1} B\), just in case there exists an (occurrence of a) subterm \(C\) of \(A\), a variable \(x\), and \(\lambda\)-terms \(M\) and \(N\) such that \(C \equiv(\lambda x[M])N\) and \(B\) is \(A\) except that the occurrence of \(C\) in \(A\) is replaced by \(M[x := N]\).

Here are some examples of \(\beta\)-reduction:

The variable \(x\) does not \(\beta\)-reduce to anything. (It does not have the right shape: it is simply a variable, not an application term whose left-hand side is an abstraction term.) \((\lambda x[x])a \rhd_{\beta ,1} a\). If \(x\) and \(y\) are distinct variables, then \((\lambda x[y])a \rhd_{\beta ,1} y\). The \(\lambda\)-term \((\lambda x[(\lambda y[xy])a])b]\) \(\beta\)-reduces in one step to two different \(\lambda\)-terms: \[ (\lambda x[(\lambda y[xy])a])b \rhd_{\beta ,1} (\lambda y[by])a \] and \[ (\lambda x[(\lambda y[xy])a])b \rhd_{\beta ,1} (\lambda x[xa])b \] Moreover, one can check that these two terms \(\beta\)-reduce in one step to a common term: \(ba\). We thus have: \((\lambda y[by])a\) \(

earrow\) \(\searrow\) \((\lambda x[(\lambda y[xy])a])b\) \(ba\) \(\searrow\) \(

earrow\) \((\lambda x[xa])b\)

As with any binary relation, one can ask many questions about the relation \(\rhd_{\beta ,1}\) holding between \(\lambda\)-terms, and one can define various derived notions in terms of \(\rhd_{\beta ,1}\).

Definition A \(\beta\)-reduction sequence from a \(\lambda\)-term \(A\) to a \(\lambda\)-term \(B\) is a finite sequence \(s_1 , \ldots s_n\) of \(\lambda\)-terms starting with \(A\), ending with \(B\), and whose adjacent terms \((s_k,s_{k+1})\) satisfy the property that \(s_k \rhd_{\beta ,1} s_{k+1}\). More generally, any sequence \(s\)—finite or infinite—starting with a \(\lambda\)-term \(A\) is said to be a \(\beta\)-reduction sequence commencing with \(A\) provided that the adjacent terms \((s_k,s_{k+1})\) of \(s\) satisfy the property that \(s_k \rhd_{\beta ,1} s_{k+1}\).

Continuing with \(\beta\)-reduction Example 1, there are no \(\beta\)-reduction sequences at all commencing with the variable \(x\). Continuing with \(\beta\)-reduction Example 2, the two-term sequence \[ (\lambda x[x])a, a \] is a \(\beta\)-reduction sequence from \((\lambda x[x])a\) to \(a\). If \(a\) is a variable, then this \(\beta\)-reduction sequence cannot be prolonged, and there are no other \(\beta\)-reduction sequences commencing with \((\lambda x[x])a\); thus, the set of \(\beta\)-reduction sequences commencing with \((\lambda x[x])a\) is finite and contains no infinite sequences. The combinator \(\boldsymbol{\Omega}\) has the curious property that \(\Omega \rhd_{\beta ,1} \Omega\). Every term of every \(\beta\)-reduction sequence commencing with \(\boldsymbol{\Omega}\) (finite or infinite) is equal to \(\boldsymbol{\Omega}\). Consider the term \(\mathbf{K}a\boldsymbol{\Omega}\). There are infinitely many reduction sequences commencing with this term: \(\bK a\boldsymbol{\Omega} \rhd_{\beta ,1} a\)

\(\bK a\boldsymbol{\Omega} \rhd_{\beta ,1} \bK a\boldsymbol{\Omega} \rhd_{\beta ,1} a\)

\(\bK a\boldsymbol{\Omega} \rhd_{\beta ,1} \bK a\boldsymbol{\Omega} \rhd_{\beta ,1} \bK a\boldsymbol{\Omega} \rhd_{\beta ,1} a\)

\(\bK a\boldsymbol{\Omega} \rhd_{\beta ,1} \bK a\boldsymbol{\Omega} \rhd_{\beta ,1} \bK a\boldsymbol{\Omega} \ldots\) If \(a\) is a variable, one can see that all finite reduction sequences commencing with \(\bK a\boldsymbol{\Omega}\) end at \(a\), and there is exactly one infinite reduction sequence.

Definition A \(\beta\)-redex of a \(\lambda\)-term \(M\) is (an occurrence of) a subterm of \(M\) of the form \((\lambda x[P])Q\). (‘redex’ comes from ‘reducible expression.) A \(\beta\)-redex is simply a candidate for an application of \(\beta\)-reduction. Doing so, one contracts the \(\beta\)-redex. A term is said to be in \(\beta\)-normal form if it has no \(\beta\)-redexes.

(Can a term have multiple \(\beta\)-normal forms? The answer is literally ‘yes’, but substantially the answer is ‘no’: If a \(M\) and \(M'\) are \(\beta\)-normal forms of some term, then \(M\) is \(\alpha\)-convertible to \(M'\) Thus, \(\beta\)-normal forms are unique up to changes of bound variables.)

So far we have focused only on one step of \(\beta\)-reduction. One can combine multiple \(\beta\)-reduction steps into one by taking the transitive closure of the relation \(\rhd_{\beta ,1}\).

Definition For \(\lambda\)-terms \(A\) and \(B\), one says that \(A\) \(\beta\)-reduces to \(B\), written \(A \rhd_{\beta} B\), if either \(A \equiv B\) or there exists a finite \(\beta\)-reduction sequence from \(A\) to \(B\). Definition A term \(M\) has a \(\beta\)-normal form if there exists a term \(N\) such that \(N\) is in \(\beta\)-normal form an \(M \rhd_{\beta} N\).

Reducibility as defined is a one-way relation: it is generally not true that if \(A \rhd_{\beta} B\), then \(B \rhd_{\beta} A\). However, depending on one’s purposes, one may wish to treat \(A\) and \(B\) as equivalent if either \(A\) reduces to \(B\) or \(B\) reduces to \(A\). Doing so amounts to considering the reflexive, symmetric, and transitive closure of the relation \(\rhd_{\beta ,1,}\).

Definition For \(\lambda\)-terms \(A\) and \(B\), we say that \(A =_{\beta} B\) if either \(A \equiv B\) or there exists a sequence \(s_1 , \ldots s_n\) starting with \(A\), ending with \(B\), and whose adjacent terms \((s_k,s_{k+1})\) are such that either \(s_k \rhd_{\beta ,1} s_{k+1}\) or \(s_{k+1} \rhd_{\beta ,1} s_k\).

We have thus far developed the theory of \(\beta\)-reduction. This is by no means the only notion of reduction available in the \(\lambda\)-calculus. In addition to \(\beta\)-reduction, a standard relation between \(\lambda\)-terms is that of \(\eta\)-reduction:

Definition (one-step \(\eta\)-reduction) For \(\lambda\)-terms \(A\) and \(B\), we say that \(A \beta \eta\)-reduces in one step to \(B\), written \(A \rhd_{\beta \eta ,1} B\), just in case there exists an (occurrence of a) subterm \(C\) of \(A\), a variable \(x\), and \(\lambda\)-terms \(M\) and \(N\) such that either \(C \equiv(\lambda x[M])N\) and \(B\) is \(A\) except that the occurrence of \(C\) in \(A\) is replaced by \(M[x := N]\) or \(C \equiv(\lambda x[Mx]\)) and \(B\) is \(A\) except that the occurrence of \(C\) in \(A\) is replaced by \(M\).

The first clause in the definition of \(\rhd_{\beta \eta ,1}\) ensures that the relation extends the relation of one-step \(\beta\)-reduction. As we did for the relation of one-step \(\beta\)-reduction, we can replay the development for \(\eta\)-reduction. Thus, one has the notion of an \(\eta\)-redex, and from \(\rhd_{\eta ,1}\) one can define the relation \(\rhd_{\eta}\) between \(\lambda\)-terms as the symmetric and transitive closure of \(\rhd_{\eta ,1}\), which captures zero-or-more-steps of \(\eta\)-reduction. Then one defines \(=_{\eta}\) as the symmetric and transitive closure of \(\rhd_{\eta}\).

If \(A \rhd_{\eta ,1} B\), then the length of \(B\) is strictly smaller than that of \(A\). Thus, there can be no infinite \(\eta\)-reductions. This is not the case of \(\beta\)-reduction, as we saw above in \(\beta\)-reduction sequence examples 3 and 4.

One can combine notions of reduction. One useful combination is to blend \(\beta\)- and \(\eta\)-reduction.

Definition (one-step \(\beta \eta\)-reduction) \(\lambda x[Mx] \rhd_{\beta \eta ,1} M\) and \((\lambda x[M]N)) \rhd_{\beta \eta ,1} M[x := N]\). A \(\lambda\)-term \(A\) \(\beta \eta\)-reduces in one step to a \(\lambda\)-term \(B\) just in case either \(A\) \(\beta\)-reduces to \(B\) in one step or \(A\) \(\eta\)-reduces to \(B\) in one step.

Again, one can replay the basic concepts of reduction, as we did for \(\beta\)-reduction, for this new notion of reduction \(\beta \eta\).

Recall that a term is said to be in \(\beta\)-normal form if it has no \(\beta\)-redexes, that is, subterms of the shape \((\lambda x[M]\))N. A term has a \(\beta\)-normal form if it can be reduced to a term in \(\beta\)-normal form. It should be intuitively clear that if a term has a \(\beta\)-normal form, then we can find one by exhaustively contracting all all \(\beta\)-redexes of the term, then exhaustively contracting all \(\beta\)-redexes of all resulting terms, and so forth. To say that a term has a \(\beta\)-normal form amounts to saying that this blind search for one will eventually terminate.

Blind search for \(\beta\)-normal forms is not satisfactory. In addition to be aesthetically unpleasant, it can be quite inefficient: there may not be any need to exhaustively contract all \(\beta\)-redexes. What is wanted is a strategy—preferably, a computable one—for finding a \(\beta\)-normal form. The problem is to effectively decide, if there are multiple \(\beta\)-redexes of a term, which ought to be reduced.

Definition A \(\beta\)-reduction strategy is a function whose domain is the set of all \(\lambda\)-terms and whose value on a term \(M\) not in \(\beta\)-normal form is a redex subterm of \(M\), and whose value on all terms M in \(\beta\)-normal form is simply \(M\).

In other words, a \(\beta\)-reduction strategy selects, whenever a term has multiple \(\beta\)-redexes, which one should be contracted. (If a term is in \(\beta\)-normal form, then nothing is to be done, which is why we require in the definition of \(\beta\)-reduction strategy that it does not change any term in \(\beta\)-normal form.) One can represent a strategy \(S\) as a relation \(\rhd_S\) on \(\lambda\)-terms, with the understanding that \(M \rhd_S N\) provided that \(M\) is obtained from \(M\) in one step by adhering to the strategy S. When viewed as relations, strategies constitute a subrelation of \(\rhd_{\beta ,1}\).

A \(\beta\)-reduction strategy may or may not have the property that adhering to the strategy will ensure that we (eventually) reach a \(\beta\)-normal form, if one exists.

Definition A \(\beta\)-reduction strategy \(S\) is normalizing if for all \(\lambda\)-terms \(M\), if \(M\) has a \(\beta\)-normal form \(N\), then the sequence \(M, S(M), S(S(M)),\ldots\) terminates at \(N\).

Some \(\beta\)-reduction strategies are normalizing, but others are not.

The rightmost strategy , whereby we always choose to reduce the rightmost \(\beta\)-redex (if there are any \(\beta\)-redexes) is not normalizing. Consider, for example, the term KI\(\omega\) . This term has two \(\beta\)-redexes: itself, and \(\omega\) (which, recall, is the term \((\lambda\)x[\(xx])(\lambda\)x[\(xx]\))). By working with left-hand \(\beta\)-redexes, we can \(\beta\)-reduce KI\(\omega\) to \(\mathbf{I}\) in two steps. If we insist on working with the rightmost \(\beta\)-redex \(\omega\) we reduce KI\(\omega\) to \(\mathbf{KI}\)( \(\omega \omega\) ), then \(\mathbf{KI}\)( \(\omega \omega \omega\) ), ….

, whereby we always choose to reduce the rightmost \(\beta\)-redex (if there are any \(\beta\)-redexes) is not normalizing. Consider, for example, the term . This term has two \(\beta\)-redexes: itself, and (which, recall, is the term \((\lambda\)x[\(xx])(\lambda\)x[\(xx]\))). By working with left-hand \(\beta\)-redexes, we can \(\beta\)-reduce to \(\mathbf{I}\) in two steps. If we insist on working with the rightmost \(\beta\)-redex we reduce to \(\mathbf{KI}\)( ), then \(\mathbf{KI}\)( ), …. The leftmost strategy, whereby we always choose to reduce the leftmost \(\beta\)-redex (if there are any \(\beta\)-redexes) is normalizing. The proof of this fact is beyond the scope of this entry; see (Barendregt, 1985, section 13.2) for details.

Once we have defined a reduction strategy, it is natural to ask whether one can improve it. If a term has a \(\beta\)-normal form, then a strategy will discover a normal form; but might there be a shorter \(\beta\)-reduction sequence that reaches the same normal form (or a term that is \(\alpha\)-convertible to that normal form)? This is the question of optimality. Defining optimal strategies and showing that they are optimal is generally considerably more difficult than simply defining a strategy. For more discussion, see (Barendregt, 1984 chapter 10).

For the sake of concreteness, we have discussed only \(\beta\)-reduction strategies. But in the definitions above the notion of reduction \(\beta\) is but one possibility. For any notion \(R\) of reduction we have the associated theory of \(R\)-reduction strategies, and one can replay the problems of normalizability, optimality, etc., for \(R\).

We discussed earlier how the \(\lambda\)-calculus is a non-extensional theory of functions. If, in the non-extensional spirit, we understand \(\lambda\)-terms as descriptions, how should we treat equality of \(\lambda\)-terms? Various approaches are available. In this section, let us treat the equality relation = as a primitive, undefined relation holding between two \(\lambda\)-terms, and try to axiomatize the properties that equality should have. The task is to identity axioms and formulate suitable rules of inference concerning the equality of \(\lambda\)-terms.

Some obvious properties of equality, having nothing to do with \(\lambda\)-calculus, are as follows:

\[\tag{Reflexivity} \frac{}{X=X} \] \[\tag{Symmetry} \frac{X=Y}{Y=X} \] \[\tag{Transitivity} \frac{X=Y \quad Y=Z}{X=Z} \]

As is standard in proof theory, the way to read these rules of inference is that above the horizontal rule \(\frac{}{\phantom{X=X}}\) are the premises of the rule (which are equations) and the equation below the horizontal rule is the conclusion of the rule of inference. In the case of the reflexivity rule, nothing is written above the horizontal rule. We understand such a case as saying that, for all terms \(X\), we may infer the equation \(X = X\) from no premises.

The three rules of inference listed in the previous section governing equality have nothing to do with the \(\lambda\)-calculus. The following lists rules of inference that relate the undefined notion of equality and the two term-building operations of the \(\lambda\)-calculus, application and abstraction.

\[ \frac{M=N}{AM=AN} \quad \frac{M=N}{MA=NA} \quad \frac{M=N}{\lambda x[M] = \lambda x[N]} \]

These rules of inference say that = is a congruence relation on the set of \(\lambda\)-terms: it ‘preserves’ both the application and abstraction term-building operations

The final rule of inference, \(\beta\)-conversion, is the most important:

\[\tag{\(\boldsymbol{\beta}\)} \frac{}{(\lambda x[M])A = M[x := A]} \]

As before with the reflexivity rule, the rule \(\boldsymbol{\beta}\) has no premises: for any variable \(x\) and any terms \(M\) and \(A\), one can infer the equation \((\lambda x[M])A = M[x := A]\) at any point in a formal derivation in the theory \(\boldsymbol{\lambda}\).

A number of extensions to \(\boldsymbol{\lambda}\) are available. Consider, for example, the rule (\(\boldsymbol{\eta}\)), which expresses the principle of \(\eta\)-reduction as a rule of inference:

\[\tag{\(\boldsymbol{\eta}\)} \frac{}{\lambda x[Mx] = M} \text{ provided } x

ot\in \mathbf{FV}(M) \]

Rule \(\boldsymbol{\eta}\) tells us that a certain kind of abstraction is otiose: it is safe to identify \(M\) with the function that, given an argument \(x\), applies \(M\) to \(x\). Through this rule we can also see that all terms are effectively functions. One can intuitively justify this rule using the principle of \(\beta\)-reduction.

\[\tag{\(\mathbf{Ext}\)} \frac{Mx=Nx}{M=N}\text{ provided } x

ot\in \mathbf{FV}(M) \cup \mathbf{FV}(N) \]

One can view rule \(\mathbf{Ext}\) as a kind of generalization principle. If we have derived that \(Mx = Nx\), but \(x\) figures in neither \(M\) nor \(N\), then we have effectively shown that \(M\) and \(N\) are alike. Compare this principle to the principle of universal generalization in first-order logic: if we have derived \(\phi(x)\) from a set \(\Gamma\) of hypotheses in which \(x\) is not free, then we can conclude that \(\Gamma\) derives \(\forall x\phi\).

Another productive principle in the \(\lambda\)-calculus permits us to identify terms that ‘act’ the same:

\[\tag{\(\boldsymbol{\omega}\)} \frac{\text{For all term }x, Mx=Nx}{M=N} \]

The rule \(\boldsymbol{\omega}\) has infinitely many hypotheses: on the assumption that \(Mx = Nx\), no matter what \(x\) may be, then we can conclude that \(M = N\). The \(\boldsymbol{\omega}\) rule is an analogue in the \(\lambda\)-calculus of the rule of inference under the same name in formal number theory, according to which one can conclude the universal formula \(\forall x\phi\) provided one has proofs for \(\phi(x := \mathbf{0}), \phi(x := \mathbf{1}),\ldots\) . Note that unlike the rule \(\mathbf{Ext}\), the condition that \(x\) not occur freely in \(M\) or \(N\) does not arise.

Is the \(\lambda\)-calculus consistent? The question might not be well-posed. The \(\lambda\)-calculus is not a logic for reasoning about propositions; there is no apparent notion of contradiction \((\bot)\) or a method of forming absurd propositions (e.g., \(p \wedge

eg p)\). Thus ‘inconsistency’ of the \(\lambda\)-calculus cannot mean that \(\bot\), or some formula tantamount to \(\bot\), is derivable. A suitable notion of ‘consistent’ is, however, available. Intuitively, a logic is inconsistent if it permits us to derive too much. The theory \(\lambda\) is a theory of equations. We can thus take inconsistency of \(\lambda\) to mean: all equations are derivable. Such a property, if it were true of \(\lambda\), would clearly show that \(\lambda\) is of little use as a formal theory.

Early formulations of the idea of \(\lambda\)-calculus by A. Church were indeed inconsistent; see (Barendregt, 1985, appendix 2) or (Rosser, 1985) for a discussion. To take a concrete problem: how do we know that the equation \(\bK = \mathbf{I}\) is not a theorem of \(\lambda\)? The two terms are obviously intuitively distinct. \(\bK\) is a function of two arguments, whereas \(\mathbf{I}\) is a function of one argument. If we could show that \(\bK = \mathbf{I}\), then we could show that \(\mathbf{KK} = \mathbf{IK}\), whence \(\mathbf{KK} = \bK\) would be a theorem of \(\lambda\), along with many other equations that strike us as intuitively unacceptable. But when we’re investigating a formal theory such as \(\lambda\), intuitive unacceptability by no means implies underivability. What is missing is a deeper understanding of \(\beta\)-reduction.

An early result that gave such an understanding is known as the Church-Rosser theorem:

Theorem (Church-Rosser) If \(P \rhd_{\beta} Q\) and \(P \rhd_{\beta}\) R, then there exists a term \(S\) such that both \(Q \rhd_{\beta} S\) and \(R \rhd_{\beta} S\).

(The proof of this theorem is quite non-trivial and is well-beyond the scope of this entry.) The result is a deep fact about \(\beta\)-reduction. It says that no matter how we diverge from \(P\) by \(\beta\)-reductions, we can always converge again to a common term.

The Church-Rosser theorem gives us, among other things, that the plain \(\lambda\)-calculus—that is, the theory \(\lambda\) of equations between \(\lambda\)-terms—is consistent, in the sense that not all equations are derivable.

As an illustration, we can use the Church-Rosser theorem to solve the earlier problem of showing that the two terms \(\bK\) and \(\mathbf{I}\) are not identified by \(\lambda\). The two terms are in \(\beta\)-normal form, so from them there are no \(\beta\)-reduction sequences at all. If \(\bK = \mathbf{I}\) were a theorem of \(\lambda\), then there would be a term \(M\) from which there is a \(\beta\)-reduction path to both \(\mathbf{I}\) and \(\bK\). The Church-Rosser theorem then implies the two paths diverging from \(M\) can be merged. But this is impossible, since \(\bK\) and \(\mathbf{I}\) are distinct \(\beta\)-normal forms.

The Church-Rosser theorem implies the existence of \(\beta\)-reduction sequences commencing from \(\bK\) and from \(\mathbf{I}\) that end at a common term. But there are no \(\beta\)-reduction sequences at all commencing from \(\mathbf{I}\), because it is in \(\beta\)-normal form, and likewise for \(\bK\).

Theorem \(\lambda\) is consistent, in the sense that not every equation is a theorem.

To prove the theorem, it is sufficient to produce one underivable equation. We have already worked through an example: we used the Church-Rosser theorem to show that the equation \(\bK = \mathbf{I}\) is not a theorem of \(\lambda\). Of course, there’s nothing special about these two terms. A significant generalization of this result is available: if \(M\) and \(N\) in \(\beta\)-normal form but \(M\) is distinct from \(N\), then the equation \(M = N\) is not a theorem of \(\lambda\). (This simple condition for underivability does not generally hold if we add additional rules of inference to \(\lambda\).)

The theories \(\lambda \eta\) and \(\lambda \omega\) are likewise consistent. One can prove these consistency results along the lines of the consistency proof for \(\lambda\) by extending the Church-Rosser theorem to the wider senses of derivability of these theories.

Although the \(\lambda\)-calculus is ‘about’ calculating with functions by substituting values for arguments, this simple point of view cannot support a semantics for the (untyped) \(\lambda\)-calculus if by ‘function’ we understand, as is standard in set theory, a relation \(R\) such that for every pair \((x,y)\) and \((x,z)\) in \(R\) with the same first component \(x\) we have \(y = z\). For sets \(X\) and \(Y\), let \(X^Y\) denote the set of functions whose domain is \(Y\) and with values in \(X\). Intuitively, if \(X\) is the domain of an interpretation of \(\lambda\)-calculus, then \(X\) should be, in some sense, isomorphic to \(X^X\) because the domain should be closed under abstraction (as well as application). Taken literally, though, this isomorphism is impossible, because the cardinality of \(X\) always is of strictly smaller than the cardinality of \(X^X\).

If one is interested in simply the existence of some kind of model of the \(\lambda\)-calculus—one whose domain not necessarily consist of functions—one can find them by various well-known ‘syntactic’ constructions involving the theory \(\lambda\), not unlike the well-known Henkin constructions . These so-called term models, though, are an unsatisfactory solution to the question of whether there are ‘mathematical’ models of the \(\lambda\)-calculus.

The cardinality argument shows that if we are to have a semantics for \(\lambda\)-calculus, the interpretation of \(\lambda\)-terms cannot simply be functions in the set-theoretic sense of the term. There are, however, interpretations of \(\lambda\)-calculus. The first model, \(D_{\infty}\), was found by D. Scott; other models were found later. These models solve the cardinality problem by restricting the domain \(X\) of interpretation, so that, in them, \(X\) is in a suitable sense isomorphic to the ‘function space’ \(X^X\).

One of the advantages of having different interpretations is that one sees different aspects of equality: each of these models takes a different view on what \(\lambda\)-terms get identified. The definitions of \(D_{\infty}\) and other interpretations, the verifications that they are indeed models of \(\lambda\)-calculus, and the characterizations of the \(\lambda\)-theories of these models, are beyond the scope of this entry; see (Barendregt, 1985, chapter 18) or (Meyer 1982) for details. In recent years, there is a renewed interest in the models of \(\lambda\)-calculus from the perspective of category theory and categorical logic, focusing mainly on typed \(\lambda\)-calculus (see sections 8.2 and 9.1.2 below) but also dealing with category theoretic models of the untyped \(\lambda\)-calculus discussed in this article. See, for example, (Hyland, 2017) for details.

A sister formalism of the \(\lambda\)-calculus, developed slightly earlier, deals with variable-free combinations. Combinatory logic is indeed even simpler than the \(\lambda\)-calculus, since it lacks a notion of variable binding.

The language of combinatory logic is built up from combinators and variables. There is some flexibility in precisely which combinators are chosen as basic, but some standard ones are \(\mathbf{I}, \bK , \bS, \mathbf{B}\) and \(\mathbf{C}\). (The names are not arbitrary.)

As with the \(\lambda\)-calculus, with combinatory logic one is interested in reducibility and provability. The principal reduction relations are:

Combinator Reduction Axiom \(\bI\) \(\bI x = x\) \(\bK\) \(\bK xy = x\) \(\bS\) \(\bS xyz = xz(yz)\) \(\bB\) \(\bB xyz = x(yz)\) \(\bC\) \(\bC xyz = xzy\)

There is a passage from \(\lambda\)-calculus to combinatory logic via translation. It turns out that although combinatory logic lacks a notion of abstraction, one can define such a notion and thereby simulate the \(\lambda\)-calculus in combinatory logic. Here is one translation; it is defined recursively.

Rule Expression Translation Condition 1 \(x\) \(x\) (unconditional) 2 \(MN\) M\(^*\)N\(^*\) (unconditional) 3 \(\lambda x[M]\) \(\bK\)M \(x\) does not occur freely in M 4 \(\lambda x[x]\) \(\bI\) (unconditional) 5 \(\lambda x[Mx]\) M \(x\) does not occur freely in M 6 \(\lambda x[MN]\) \(\bB M(\lambda x[N)]^*\) \(x\) does not occur freely in M 7 \(\lambda x[MN]\) \(\bC (\lambda x[M])^*\)N \(x\) does not occur freely in \(N\) 8 \(\lambda x[MN]\) \(\bS M^*N^*\) \(x\) occurs freely in both \(M\) and \(N\)

This translation works inside-out, rather than outside-in. To illustrate:

The translation of the term \(\lambda y[y]\), a representative of the identity function, is mapped by this translation to the identity combinator \(\bI\) (because of Rule 4), as expected. The \(\lambda\)-term \(\lambda x[\lambda y[x]]\) that we have been calling ‘\(\bK\)’is mapped by this translation to: \[\begin{align} \lambda x[\lambda y[x]] &\equiv \lambda x[\bK x] &\langle \text{Rule 1}\rangle \\ &\equiv \bK &\langle \text{Rule 3} \rangle \end{align}\] The \(\lambda\)-term \(\lambda x[\lambda y[yx]]\) that switches its two arguments is mapped by this translation to: \[\begin{align} \lambda x[\lambda y[yx]] &\equiv \lambda x[\bC(\lambda y[y])^* x] &\langle\text{Rule 8}\rangle \\ &\equiv \lambda x[\bC\bI x] &\langle\lambda y[y] \equiv \bI,\text{ by Rule 4}\rangle \\ &\equiv \bB\bC\bI)(\lambda x[x])^* &\langle\text{Rule 7}\rangle \\ &\equiv \bB(\bC\bI)\bI &\langle(\lambda x[x])^* \equiv \bI,\text{ by Rule 4}\rangle \end{align}\] We can confirm that the \(\lambda\)-term \(\lambda x[\lambda y[yx]]\) and the translated combinatory logic term \(\bB(\bC\bI)\bI\) have analogous applicative behavior: for all \(\lambda\)-terms \(P\) and \(Q\) we have \[ (\lambda x[\lambda y[yx]])PQ \rhd (\lambda y[yP]) \rhd QP; \] likewise, for all combinatory logic terms \(P\) and \(Q\) we have \[ \bB(\bC\bI)\bI PQ \rhd (\bC\bI)(\bI P)Q \rhd \bI Q(\bI P) \rhd Q(\bI P) \rhd QP \]

We can give but a glimpse of combinatory logic; for more on the subject, consult the entry on combinatory logic. Many of the issues discussed here for \(\lambda\)-calculus have analogues in combinatory logic, and vice versa.

In many contexts of reasoning and computing it is natural to distinguish between different kinds of objects. The way this distinction is introduced is by requiring that certain formulas, functions, or relations accept arguments or permit substitution only of some kinds of objects rather than others. We might require, for example, that addition + take numbers as arguments. The effect of this restriction is to forbid, say, the addition of 5 and the identity function \(\lambda x.x\).(4). Regimenting objects into types is also the idea behind the passage from (unsorted, or one-sorted) first-order logic to many-sorted first-order logic. (See (Enderton, 2001) and (Manzano, 2005) for more about many-sorted first-order logic.) As it stands, the \(\lambda\)-calculus does not support this kind of discrimination; any term can be applied to any other term.

It is straightforward to extend the untyped \(\lambda\)-calculus so that it discriminates between different kinds of objects. This entry limits itself to the type-free \(\lambda\)-calculus. See the entries on type theory and Church’s type theory for a detailed discussion of the extensions of \(\lambda\)-calculus that we get when we add types, and see (Barendregt, Dekkers, Statman, 2013) for a book length treatment of the subject.

Here are two senses in which \(\lambda\)-calculus is connected with logic.

9.1.1 Terms as logical constants

In the table of combinators above, we defined combinators \(\bT\) and \(\bF\) and said that they serve as representations in the \(\lambda\)-calculus of the truth values true and false, respectively. How do these terms function as truth values?

It turns out that when one is treating \(\lambda\)-calculus as a kind of programming language, one can write conditional statements “If \(P\) then \(A\) else \(B\)” simply as \(PAB\), where of course \(P, A\), and \(B\) are understood as \(\lambda\)-terms. If \(P \rhd \bT\), that is, P is ‘true’, then we have

\[ \text{if-}P\text{-then-}A\text{-else-}B := PAB \rhd \bT AB \rhd A, \]

(recall that, by definition, \(\bT \equiv \bK\)) and if \(P \rhd \bF\), that is, \(P\) is ‘false’, then

\[ \text{if-}P\text{-then-}A\text{-else-}B := PAB \rhd \bF AB \rhd B, \]

(recall that, by definition, \(\mathbf{F} \equiv \mathbf{KI})\) which is just what we expect from a notion of if-then-else. If \(P\) reduces neither to \(\mathbf{T}\) nor \(\mathbf{F}\), then we cannot in general say what \(\text{if-}P\text{-then-}A\text{-else-}B\) is.

The encoding we’ve just sketched of some of the familiar truth values and logical connectives of classical truth-table logic does not show that \(\lambda\)-calculus and classical logic are intimately related. The encoding shows little more than embeddibility of the rules of computation of classical truth-table logic in \(\lambda\)-calculus. Logics other than classical truth-table logic can likewise be represented in the \(\lambda\)-calculus, if one has sufficient computable ingredients for the logic in question (e.g., if the logical consequence relation is computable, or if a derivability relation is computable, etc.). For more on computing with \(\lambda\)-calculus, see section 9.2 below. A more intrinsic relationship between logic and \(\lambda\)-calculus is discussed in the next section.

The correspondence to be descried here between logic and the \(\lambda\)-calculus is seen with the help of an apparatus known as types. This section sketches the beginnings of the development of the subject known as type theory. We are interested in developing type theory only so far as to make the so-called Curry-Howard-de Bruijn correspondence visible. A more detailed treatment can be found in the entry on type theory; see also (Hindley, 1997) and (Barendregt, Dekkers, Statman, 2013).

Type theory enriches the untyped \(\lambda\)-calculus by requiring that terms be given types. In the untyped \(\lambda\)-calculus, the application \(MN\) is a legal term regardless of what \(M\) and \(N\) are. Such freedom permits one to form such suspicious terms as \(xx\), and thence terms such as the paradoxical combinator \(\mathbf{Y}\). One might wish to exclude terms like \(xx\) on the grounds that \(x\) is serving both as a function (on the left-hand side of the application) and as an argument (on the right-hand side of the application). Type theory gives us the resources for making this intuitive argument more precise.

Assigning types to terms The language of type theory begins with an (infinite) set of type variables (which is assumed to be disjoint from the set of variables of the \(\lambda\)-calculus and from the symbol ‘\(\lambda\)’ itself). The set of types is made up of type variables and the operation \(\sigma \rightarrow \tau\). Variables in type theory now come with a type annotation (unlike the unadorned term variables of untyped \(\lambda\)-calculus). Typed variables are rendered ‘\(x : \sigma\)’; the intuitive reading is ‘the variable \(x\) has the type \(\sigma\)’. The intuitive reading of the judgment ‘\(t : \sigma \rightarrow \tau\)’ is that the term \(t\) is a function that transforms arguments of type \(\sigma\) into arguments of type \(\tau\). Given an assignment of types to term variables, one has the typing rules: \[ (M : \sigma \rightarrow \tau)(N : \sigma) : \tau \] \[ (M : \sigma \rightarrow \tau)(N : \sigma) : \tau \] and \[ (\lambda x : \sigma[M : \tau]) : \sigma \rightarrow \tau \] \[ (\lambda x : \sigma[M : \tau]) : \sigma \rightarrow \tau \] The above two rules define the assignment of types to applications and to abstraction terms. The set of terms of type theory is the set of terms built up according to these formation rules.

The above definition of the set of terms of type theory is sufficient to rule out terms such as \(xx\). Of course, ‘\(xx\)’ is not a typed term at all for the simple reason that no types have been assigned to it. What is meant is that there is no type \(\sigma\) that could be assigned to \(x\) such that ‘\(xx\)’ could be annotated in a legal way to make a typed term. We cannot assign to \(x\) a type variable, because then the type of the left-hand \(x\) would fail to be a function type (i.e., a type of the shape ‘\(\sigma \rightarrow \tau\)’). Moreover, we cannot assign to \(x\) a function type \(\sigma \rightarrow \tau\), because then then \(\sigma\) would be equal to \(\sigma \rightarrow \tau\), which is impossible.

As a leading example, consider the types that are assigned to the combinators \(\bI\), \(\bK\), and \(\bS\):

Combinator Type[5] \(\bI\) \(a \rightarrow a\) \(\bK\) \(a \rightarrow(b \rightarrow a)\) \(\bS\) \(a \rightarrow(b \rightarrow c) \rightarrow((a \rightarrow b) \rightarrow(a \rightarrow c))\)

(See Hindley (1997) Table of principal types for a more extensive listing.) If we read ‘\(\rightarrow\)’ as implication and type variables as propositional variables, then we recognize three familiar tautologies in the right-hand column of the table. The language used is meager: there are only propositional variables and implication; there are no other connectives.

The table suggests an interesting correspondence between \(\lambda\)-calculus. Could it really be that the types assigned to formulas, when understood as logical formulas, are valid? Yes, though ‘validity’ needs to understood not as classical validity:

Theorem If \(\tau\) is the type of some \(\lambda\)-term, then \(\tau\) is intuitionistically valid.

The converse of this theorem holds as well:

Theorem If \(\phi\) is an intuitionistically valid logical formula whose only connective is implication \((\rightarrow)\), then \(\phi\) is the type of some \(\lambda\)-term.

The correspondence can be seen when one identifies intuitionistic validity with derivability in a certain natural deduction formalism. For a proof of these two theorems, see (Hindley, 1997, chapter 6).

The correspondence expressed by the previous two theorems between intuitionistic validity and typability is known as the Curry-Howard-de Bruijn correspondence, after three logicians who noticed it independently. The correspondence, as stated, is between only propositional intuitionistic logic, restricted to the fragment containing only the implication connective \(\rightarrow\). One can extend the correspondence to other connectives and to quantifiers, too, but the most crisp correspondence is at the level of the implication-only fragment. For details, see (Howard, 1980).

One can represent natural numbers in a simple way, as follows:

Definition (ordered tuples, natural numbers) The ordered tuple \(\langle a_0,\ldots a_n\rangle\) of \(\lambda\)-terms is defined as \(\lambda x[x a_0\ldots a_n]\). One then defines the \(\lambda\)-term \(\ulcorner n\urcorner\) corresponding to the natural number \(n\) as: \(\ulcorner 0\urcorner = \mathbf{I}\) and, for every \(k\), \(\ulcorner k + 1\urcorner = \langle \mathbf{F}, \ulcorner k\urcorner\rangle\).

The \(\lambda\)-term corresponding to the number 1, on this representation, is: \[\begin{align} \ulcorner 1 \urcorner &\equiv \langle\bF,\ulcorner 0\urcorner\rangle \\ &\equiv \langle\bF,\bI\rangle \\ &\equiv \lambda x[x\mathbf{FI}] \end{align}\] The \(\lambda\)-term corresponding to the number 2, on this representation, is: \[\begin{align} \ulcorner 2 \urcorner &\equiv \langle\bF,\ulcorner 1\urcorner\rangle \\ &\equiv \lambda x[x\mathbf{F}\lambda x[x\mathbf{FI}]] \end{align}\] Similarly, \(\ulcorner 3\urcorner\) is \(\lambda x[x\mathbf{F}\lambda x[x\mathbf{F}\lambda x[x\mathbf{FI}]]]\).

Various representations of natural numbers are available; this representation is but one.[6]

Using the ingredients provided by the \(\lambda\)-calculus, one can represent all recursive functions. This shows that the model is exactly as expressive as other models of computing, such as Turing machines and register machines. Priority goes to Turing’s definition of his machine, but Church’s proposal of the \(\lambda\)-calculus was developed at almost exactly the same time.

Theorem For every recursive function \(f\) of arity \(n\), there exists a \(\lambda\)-term \(f^*\) such that for all natural numbers \(a_1,\ldots a_n\): \(f(a_1,\ldots a_n) = y\) iff \(\boldsymbol{\lambda} \vdash f^*\langle \bar{a}_1,\ldots,\bar{a}_n\rangle = \bar{y}\)

For a proof, see the appendix.

Since the class of recursive functions is an adequate representation of the class of all computable (number-theoretic) functions, thanks to the work above we find that all computable (number-theoretic) functions can be faithfully represented in the \(\lambda\)-calculus.

The motivation for the \(\lambda\)-calculus given at the beginning of the entry was based on reading \(\lambda\)-expressions as descriptions of functions. Thus, we have understood ‘\(\lambda x[M]\)’ to be a (or the) function that, given \(x\), gives \(M\) (which generally, though not necessarily, involves x). But it is not necessary to read \(\lambda\)-terms as functions. One could understand \(\lambda\)-terms as denoting relations, and read an abstraction term ‘\(\lambda x[M]\)’ as the unary relation (or property) \(R\) that holds of an argument \(x\) just in case \(M\) does (see Carnap 1947, p. 3). On the relational reading, we can understand an application term \(MN\) as a form of predication. One can make sense of these terms using the principle of \(\beta\)-conversion:

\[ (\lambda x[M])a = M[x := A], \]

which says that the abstraction relation \(\lambda x[M]\), predicated of A, is the relation obtained by plugging in A for all free occurrences of \(x\) inside \(M\).

As a concrete example of this kind of approach to \(\lambda\)-calculus, consider an extension of first-order logic where one can form new atomic formulas using \(\lambda\)-terms, in the following way:

Syntax: For any formula \(\phi\) and any finite sequence \(x_1 , \ldots ,x_n\) of variables, the expression ‘\(\lambda x_1 \ldots x_n [\phi]\)’ is a predicate symbol of arity n. Extend the notion of free and bound variables (using the functions \(\mathbf{FV}\) and \(\mathbf{BV})\) in such a way that \[ \mathbf{FV}(\lambda x_1 \ldots x_n [\phi]) = \mathbf{FV}(\phi) - \{ x_1 , \ldots x_n \} \] \[ \mathbf{FV}(\lambda x_1 \ldots x_n [\phi]) = \mathbf{FV}(\phi) - \{ x_1 , \ldots x_n \} \] and \[ \mathbf{BV}(\lambda x_1 \ldots x_n [\phi]) = \mathbf{BV}(\phi) \cup \{ x_1 , \ldots x_n \} \] \[ \mathbf{BV}(\lambda x_1 \ldots x_n [\phi]) = \mathbf{BV}(\phi) \cup \{ x_1 , \ldots x_n \} \] Deduction Assume as axioms the universal closures of all equivalences \[ \lambda x_1 \ldots x_n [\phi](t_1 ,\ldots t_n) \leftrightarrow \phi[x_1 ,\ldots x_n := t_1,\ldots t_n] \] \[ \lambda x_1 \ldots x_n [\phi](t_1 ,\ldots t_n) \leftrightarrow \phi[x_1 ,\ldots x_n := t_1,\ldots t_n] \] where \(\phi[x_1 ,\ldots x_n := t_1,\ldots t_n]\) denotes the simultaneous substitution of the terms \(t_k\) for the variables \(x_k\) \((1 \le k \le n)\). Semantics For a first-order structure \(A\) and an assignment \(s\) of elements of \(A\) to variables, define \[\begin{align} A \vDash &\lambda x_1 \ldots x_n [\phi](t_1 ,\ldots t_n) [s] \text{ iff } \\ &A \vDash \phi[x_1 ,\ldots x_n := t_1,\ldots t_n] [s] \end{align}\] \[\begin{align} A \vDash &\lambda x_1 \ldots x_n [\phi](t_1 ,\ldots t_n) [s] \text{ iff } \\ &A \vDash \phi[x_1 ,\ldots x_n := t_1,\ldots t_n] [s] \end{align}\]

According to this approach, one can use a \(\lambda\) to treat essentially any formula, even complex ones, as if they were atomic. We see the principle of \(\beta\)-reduction in the deductive and semantic parts. That this approach adheres to the relational reading of \(\lambda\) terms can be seen clearly in the semantics: according to the standard Tarski-style semantics for first-order logic, the interpretation of a formula (possibly with free variables) denotes a set of tuples of elements of the structure, as we vary the variable assignment that assigns elements of the structure to the variables.

One can ‘internalize’ this functional approach. This is done in the case of various property theories, formal theories for reasoning about properties as metaphysical objects (Bealer 1982, Zalta 1983, Menzel 1986, 1993, and Turner 1987). This kind of theory is employed in certain metaphysical investigations where properties are metaphysical entities to be investigated. In these theories, metaphysical relations are (or are among) the objects of interest; just as we add term-building symbols + and \(\times\) in formal theories of arithmetic to build numbers, \(\lambda\) is used in property theory to build relations. This approach contrasts with the approach above. There, \(\lambda\) was added to the grammar of first-order logic by making it a recipe for building atomic formulas; it was a new formula-building operator, like \(\vee\) or \(\rightarrow\) or the other connectives. In the case of property theories, the \(\lambda\) plays a role more like + and \(\times\) in formal theories of arithmetic: it is used to construct relations (which, in this setting, are to be understood as a kind of metaphysical object). Unlike + and \(\times\), though, the \(\lambda\) binds variables.

To give an illustration of how \(\lambda\) is used in this setting, let us inspect the grammar of a typical application (McMichael and Zalta, 1980). One typically has a predication operator (or, more precisely, a family of predication operators) \(p_k (k \ge 0)\). In a language where we have terms \(\mary\) and \(\john\) and a binary relation loves, we can formally express:

John loves Mary: \(\loves(\john ,\mary)\)

The property that John loves Mary: \(\lambda[\loves(\john ,\mary)]\) (note that the \(\lambda\) is binding no variables; we might call this ‘vacuous binding’. Such properties can be understood as propositions.)

The property of an object \(x\) that John loves it: \(\lambda x [\loves(\john,x)]\).

The property that Mary is loved by something: \(\lambda[\exists x(\loves(x,\mary))]\) (another instance of vacuous binding, viz., proposition)

The predication of the property of \(x\) that John loves \(x\) to Mary: \(p_1 (\lambda x[\loves(\john,x)],\mary)\).

The (0-ary) predication of the property that John loves Mary: \(p_0 (\lambda x[\loves(\john,\mary)])\).

The property of objects \(x\) and \(y\) that \(x\) loves \(y\): \(\lambda xy[\loves(x,y)]\).

The property of an objects \(x\) that \(x\) loves itself: \(\lambda x[\loves(x,x)]\).

The predication of the property of objects \(x\) and \(y\) that \(x\) loves \(y\) to John and Mary (in that order): \(p_2 (\lambda xy[\loves(x,y)],\john,\mary)\).

We reason with these \(\lambda\)-terms using a \(\beta\)-conversion principle such as:

\[\begin{align} p_n (\lambda x_1,&\ldots x_n [A], t_1 , \ldots ,t_n) \leftrightarrow \\ &A[x_1 ,\ldots x_n := t_1,\ldots, t_n] \end{align}\]

Formally, the predication operator p\(_k\) is a \((k+1)\)-ary predicate symbol. The first argument is intended to be a \(\lambda\)-term of \(k\) arguments, and the rest of the arguments are intended to be the arguments of the body of the \(\lambda\)-term. The \(\beta\)-principle above says that the predication of an \(n\)-ary \(\lambda\)-term \(L\) to \(n\) terms holds precisely when the body of \(L\) holds of those terms.

It turns out that in these theories, we may or may not be able to be fully committed to the principle of \(\beta\)-conversion. Indeed, in some property theories, the full principle of \(\beta\)-conversion leads to paradox, because one can replay a Russell-style argument when the full principle of \(\beta\)-conversion is in place. In such settings, one restricts the formation of \(\lambda\)-formulas by requiring that the body of a \(\lambda\)-term not contain further \(\lambda\)-terms or quantifiers. For further discussion, see (Orilia, 2000).

One of the reasons why property theories formulated in the \(\lambda\)-calculus are of a particular philosophical importance is the hyperintensional nature of the calculus (see section 1.2). A property concept may be called ‘hyperintensional’ if and only if it does not identify necessarily coextensional properties, i.e. properties that are instanciated by exactly the same objects at every possible world. The properties and relations described by the theories of Bealer, Zalta, Menzel, and Turner have exactly this characteristic. In other words, the theories are hyperintensional property theories. Recent years have seen a significant rise of interest in hyperintensional concepts of properties in metaphysics (Nolan 2014), and correspondingly property theories formulated in the \(\lambda\)-calculus will likely experience a rise of interest as well.

In the context of the foundations of mathematics, Zalta and Oppenheimer (2011) argue for the conceptual priority of the relational interpretation of \(\lambda\)-terms over the functional one.