First published Tue May 21, 1996; substantive revision Wed Mar 27, 2019

This entry briefly describes the history and significance of Alfred North Whitehead and Bertrand Russell’s monumental but little read classic of symbolic logic, Principia Mathematica (PM), first published in 1910–1913. The content of PM is described in a section by section synopsis, stated in modernized logical notation and described following the introductory notes from each of the three volumes. The original notion is presented in a companion article of this Encyclopedia, The Notation of Principia Mathematica . The content of PM is described so as to facilitate a comparison with Gottlob Frege’s Fundamental Laws of Arithmetic which was subject to Russell’s Paradox. To avoid the paradox Whitehead and Russell introduced a complex system now called “the ramified theory of types”. After the introduction of a theory of sets, or “classes” early in the first volume, however, the system of PM can be compared with both Frege and the early development of set theory and found to contain rival accounts, free of contradiction, but differing from the now standard theories in as yet understudied ways.

1. Overview

Principia Mathematica, the landmark work in formal logic written by Alfred North Whitehead and Bertrand Russell, was first published in three volumes in 1910, 1912 and 1913. A second edition appeared in 1925 (Volume I) and 1927 (Volumes II and III). In 1962 an abbreviated issue (containing only the first 56 chapters) appeared in paperback.

Written as a defense of logicism (the thesis that mathematics is in some significant sense reducible to logic), the book was instrumental in developing and popularizing modern mathematical logic. It also served as a major impetus for research in the foundations of mathematics throughout the twentieth century. Along with Aristotle’s Organon and Gottlob Frege’s Grundgesetze der Arithmetik, it remains one of the most influential books on logic ever written.

This entry includes a presentation of the main definitions and theorems used in the development of the logicist project in PM. The entry indicates a path through the whole work presenting the basic results proved in Principia Mathematica (PM) in a somewhat more contemporary notation, so as to make it easy to compare the system of Whitehead and Russell with that of Frege, the other most prominent advocate of logicism in the foundations of mathematics. The aim of that program, as described by Russell in the opening lines of the preface to his 1903 book The Principles of Mathematics, namely to define mathematical notions in terms of logical notions, and to derive mathematical principles, so defined, from logical principles alone:

The present work has two main objects. One of these, the proof that all pure mathematics deals exclusively with concepts definable in terms of a very small number of fundamental concepts, and that all its propositions are deducible from a very small number of fundamental logical principles, is undertaken in Parts II–VII of this work, and will be established by strict symbolic reasoning in Volume II.…The other object of this work, which occupies Part I., is the explanation of the fundamental concepts which mathematics accepts as indefinable. This is a purely philosophical task…. (1903: xv)

Though Frege’s system was subject to Russell’s Paradox, subsequent examination of his system shows how much of the development of arithmetic is possible independently of the paradoxical elements of the system. In particular, recent interest in Frege’s system has led to the isolation of what is called “Frege’s Theorem” as possible in a consistent fragment of Frege’s original system, and from it the goal of deriving arithmetic, as formalized in Peano’s Postulates. See the entry Frege’s theorem and foundations for arithmetic, which presents this aspect of Frege’s system in contemporary notation.

Russell had written The Principles of Mathematics (PoM), which presents the basic elements of his logicist program, before discovering Frege’s similar work in Foundations of Arithmetic and The Fundamental Laws of Arithmetic, in June of 1902. As he describes in the Preface, Russell intended a formal presentation of his account in a “Volume II” of PoM. In 1903 he enlisted Alfred North Whitehead to join him in the writing of this second volume, but soon the project turned into a new work, Principia Mathematica, a massive three volume work, that was not to be published until 1910 (Volume I), 1912 (Volume II) and 1913 (Volume III).

The system of PM differed significantly from Frege’s system, in a large part because of the introduction of the theory of types whose purpose was to avoid the paradox that had affected Frege in a principled fashion. A second important difference from Frege’s system is that PM is based on a logic of relations of various numbers of arguments, whereas Frege’s system was based on the notions of function and object, with even his distinctively logical concepts being seen as functions (from a number of objects to the truth values T and F, which are also objects in Frege’s system.) So, it might be said that PM is based on a theory of ramified types of relations, in contrast to Frege’s second order predicate calculus with concepts. The most important step is to define set expressions in terms of higher-order functions. Thus the paradoxical “Russell set”, the set of all sets which are not members of themselves, \(\{ x \mid x

otin x\}\), is defined by an expression involving functions that will violate the theory of types. The expression for the offending class is ruled out on the basis of the theory of types, as is its seeming innocuous complement, the set of all sets that are members of themselves, \(\{ x \mid x \in x\}\). In contemporary set theory \(\{ x \mid x

otin x\}\) is the universe of sets, which is not itself a set, and because no set is an element of itself, \(\{ x \mid x \in x\}\) is just the empty set. And additional cost of this method is that while for Frege sets are objects of the lowest types, there will be sets in the PM theory in a simple theory of types, which distinguishes individuals and sets of individuals and sets of sets of individuals, etc. Even to derive a hierarchy of sets in the simple theory the axiom of reducibility is needed to guarantee that more complext “impredicative” definitions pick out sets of the same simple type. Thus the “least upper bound” of a closed interval of real numbers will identify a member of that set of a higher order in the ramified theory. That will be of the same simple type requires the axiom of reducibily. The cost of adopting the theory of types to avoid the paradox extends to difficulties in constructing the natural numbers. While Russell follows Frege in many important details, in particular in using Frege’s notion of the ancestral of the successor relation to define the natural numbers, other parts of the construction are importantly different. Frege was able to define the successor of a number by using the set of its predecessors. The number 2 is the set containing 0 and 1, and thus it has two members. They will, however, be of different types in the hierarchy of simple types, and so the whole set of natural numbers cannot be defined within the theory of simple types. Since each step from 0 to 1, to 2, etc, raises the simple types from 0 to 1 to 2, there will be no simple type of all the natural numbers, so defined. Instead PM adopts the axiom of infinity which assures the existence of an infinite number of individuals, allowing for the construction of the natural numbers for each type above a lower bound of 3 or so (as numbers will be sets of equinumerous sets of individuals…).

With this turn to the ramified theory of types, along with the extra axioms of reducibility, and infinity, it is possible for PM to define a version of Frege’s construction of the Natural Numbers so that the “Peano axioms” can be proved from logic alone. This takes up to section ∗120, well into Volume II. At this point the alternative to “Frege’s Theorem” is completed, in the sense that we are presented with a consistent development of the natural numbers, based on a theory of higher-order logic with a number of additional axioms. Philosophers soon followed Ludwig Wittgenstein (1922) and disputed the idea that these additional axioms, the axioms of reducibility and infinity, are really logical truths, and so denied that the logicist program of reducing arithmetic to logic was any more successful that Frege’s attempt had been.

The survey of PM will proceed through the remainder of Volume II and through Volume III, where the theories of rational and real numbers are developed. The contrast intended here is not with Frege’s theories of rational and real numbers, which are present in Grundgesetze but are not seen as a natural extension of the theory of natural numbers. Instead the contemporary account of natural numbers and real numbers is seen as an elementary extension of the axiomatic Zermelo-Frankel set theory. A contemporary textbook in axiomatic set theory, such as Enderton (1977) or Suppes (1960), shows how to construct rational numbers (and negative integers) as pairs of natural numbers, thus 3/4 is constructed as the pair with the operations of addition and multiplication defined as operations on pairs; thus \(1/2 + 1/3 = 10/12 = 5/6\). These positive rational numbers are extended to the whole set by adding negative integers, and then real numbers are defined as Dedekind cuts in the rational numbers, i.e., the set of partitions of sets of rational numbers. The arithmetic of real numbers is then defined for these constructions, and so with sets of real numbers the whole of analysis can be reduced to arithmetic. PM, however, avoids this “arithmetization” of analysis, but instead defines rational, real and in fact a huge class of “relation numbers” as sets of isomorphic sets of relations. Russell says later that he regrets that the theory of relation numbers was not picked up by later set theorists, even though this was some of his most original work in PM. The brief summary of these later topics that we include below, can therefore be seen as a summary of the interesting consequences of taking a different route to the definition of natural numbers based on a logic of relations and properties, rather than the set theory of contemporary foundations of mathematics. This entry is thus aimed at an explication of the unusual order of presentation of these results, in comparison with both Frege and contemporary set theory, and to illustrate these aspects of the theory of relations that are not investigated by contemporary researchers.

2. History of and Significance of Principia Mathematica

2.1 History of Principia Mathematica

Logicism is the view that (some or all of) mathematics can be reduced to (formal) logic. It is often explained as a two-part thesis. First, it consists of the claim that all mathematical truths can be translated into logical truths or, in other words, that the vocabulary of mathematics constitutes a proper subset of the vocabulary of logic. Second, it consists of the claim that all mathematical proofs can be recast as logical proofs or, in other words, that the theorems of mathematics constitute a proper subset of the theorems of logic. As Russell writes, it is the logicist’s goal “to show that all pure mathematics follows from purely logical premises and uses only concepts definable in logical terms” (1959: 74).

The logicist thesis appears to have been first advocated in the late seventeenth century by Gottfried Leibniz. Later, the idea was defended in much greater detail by Gottlob Frege. During the critical movement of the 1820s, mathematicians such as Bernard Bolzano, Niels Abel, Louis Cauchy, and Karl Weierstrass succeeded in eliminating much of the vagueness and many of the contradictions present in the mathematics of their day. By the mid- to late-1800s, William Hamilton had gone on to introduce ordered couples of reals as the first step in supplying a logical basis for the complex numbers and Karl Weierstrass, Richard Dedekind, and Georg Cantor had all developed methods for founding the irrationals in terms of the rationals. Using work done by H.G. Grassmann and Richard Dedekind, Guiseppe Peano had then gone on to develop a theory of the rationals based on his now famous axioms for the natural numbers. By Frege’s day, it was thus generally recognized that large parts of mathematics could be derived from a relatively small set of primitive notions.

Even so, it was not until 1879, when Frege developed the necessary logical apparatus, that logicism could finally be said to have become technically plausible. After another five years’ work, Frege arrived at the definitions necessary for logicising arithmetic and during the 1890s he worked on many of the essential derivations. However, with the discovery of paradoxes such as Russell’s paradox at the turn of the century, it appeared that additional resources would need to be developed if logicism were to succeed.

By 1902, both Whitehead and Russell had reached this same conclusion. Both men were in the initial stages of preparing second volumes to their earlier books on related topics: Whitehead’s 1898 A Treatise on Universal Algebra and Russell’s 1903 The Principles of Mathematics. Since their research overlapped considerably, they began collaborating on what would eventually become Principia Mathematica. By agreement, Russell worked primarily on the philosophical parts of the project, including the book’s philosophically rich Introduction, the theory of descriptions, and the no-class theory (in which set or class terms become meaningful only when placed in well-defined contexts), all of which can still be read fruitfully even by non-specialists. The two men then collaborated on the technical derivations. As Russell writes,

As for the mathematical problems, Whitehead invented most of the notation, except in so far as it was taken over from Peano; I did most of the work concerned with series and Whitehead did most of the rest. But this only applies to first drafts. Every part was done three times over. When one of us had produced a first draft, he would send it to the other, who would usually modify it considerably. After which, the one who had made the first draft would put it into final form. There is hardly a line in all the three volumes which is not a joint product. (1959: 74)

Initially, it was thought that the project might take a year to complete. Unfortunately, after almost a decade of difficult work on the part of the two men, Cambridge University Press concluded that publishing Principia would result in an estimated loss of 600 pounds. Although the press agreed to assume half this amount and the Royal Society agreed to donate another 200 pounds, this still left a 100-pound deficit. Only by each contributing 50 pounds were the authors able to see their work through to publication. (Whitehead, Russell, & James 1910)

Publication involved the enormous job of type-setting all three volumes by hand. In 1911, the printing of the second volume was interrupted when Whitehead discovered a difficulty with the symbolism. The result was the insertion (on roman numeral pages) of a long “Prefatory Statement of Symbolic Conventions” at the beginning of Volume II.

The initial print run of 750 copies of Volume I and 500 copies of each of Volumes II and II from Cambridge University Press had been sold by 1922 when Rudolf Carnap wrote to Russell asking for a copy. Russell responded by sending Carnap a 35 page handwritten summary of the definitions and some important theorems in the work (Linsky 2011: 14–15). As no plates were available for a second printing, Russell began the work of preparing a second edition that appeared in 1925–27. The first was reset along with a new introduction and three appendices, and Volume II was reset as well. Volume III was reproduced by a photographic process, and so the page numbers from the first edition are the same in this volume. Principia Mathematica is still in print with Cambridge University Press. As with many works in mathematics, the later progress of the field in symbolic logic led to numerous developments in the field. Work in the school of logic started by David Hilbert at Göttingen and in the Polish school of logicians led by S. Leśniewski and J. Łukasiewicz were occupied in their early days by responding to what they saw as defects and gaps in PM. This criticism was immediate, begun by Chwistek (1912) after only the first volume had been published. A series of important new presentations of mathematical logic, in particular Hilbert and Ackermann (1928), Hilbert and Bernays (1934), and Kleene (1952), were adopted as text books by successive generations of logicians. As pointed out in Urquhart (2013) this lead to a slow decline in the number of references to PM in technical work in logic, as well as its gradual replacement by other texts for the Introduction to Symbolic Logic courses that soon became a staple offering of university departments of philosophy. By the 1950s PM was no longer used as a textbook, even in graduate courses. It’s influence, then, was enormous from 1910 to 1950, with it now having the status of a recognized classic that is unfamiliar to students of logic, and even unreadable because of its superseded notation. This entry, together with the entry on the notation in Principia Mathematica, are intended to make the contributions of this monumental work available, and to enable further research on some of the ideas hidden in those three long volumes.

2.2 Significance of Principia Mathematica

Achieving Principia’s main goal proved to be a challenge. An initial response among mathematicians and logicians in Germany and Poland was to decry the decline in standards of formal rigor set by Frege. This complaint was voiced by Frege himself, in a letter to Philip Jourdain in 1912:

…I do not understand the English language well enough to be able to say definitely that Russell’s theory (Principia Mathematica I, 54ff) agrees with my theory of functions of the first, second, etc. levels. It does seem so. But I do not understand all of it. It is not quite clear to me what Russell intends with his designation \(\phi \bang \hat{x}\). I never know for sure whether he is speaking of a sign or of its content. (Frege 1980: 78)

This claim that the notion of “propositional function” is subject to use-mention confusions has persisted to this day. This entry will present a modernized version of the syntax of PM, combined with an account of the notation for types in the works of Alonzo Church (1974, 1976). Modern theories of types allow for a coherent syntax for higher-order languages which many find adequate to meet these objections. The complaint about the formulation of the syntax of PM were repeated and a further difficulty was expressed by Gödel (1944) in his influential survey of PM:

It is to be regretted that this first comprehensive and thorough-going presentation of a mathematical logic and the derivation of mathematics from it [is] so greatly lacking in formal precision in the foundations (contained in ∗1–∗21 of Principia) that it presents in this respect a considerable step backwards as compared with Frege. What is missing, above all, is a precise statement of the syntax of the formalism. Syntactical considerations are omitted even in cases where they are necessary for the cogency of the proofs, in particular in connection with the “incomplete symbols”. These are introduced not by explicit definition, but by rules describing how sentences containing them are to be translated into sentences not containing them. To be sure, however, that (or for what expressions) this translation is possible and uniquely determined and that (or to what extent) the rules of inference apply to the new kind of expressions, it is necessary to have a survey of all possible expressions, and this can be furnished only by syntactical considerations. (Gödel 1944 [1951: 126])

The issue with respect to defined expressions, including the “incomplete symbols” for classes and definite descriptions which are explained below, is still problematic for interpreting PM. The difficulty is that certain defined expressions such as the notation for definite descriptions, class abstracts and even the identity symbol ‘\(=\)’, are not specified in the initial description of the syntax of the theory, nor are they shown to be validly used as instances of the axioms with their apparent syntax. The method of “contextual definition” used in PM is difficult to formulate rigorously and is not used in contemporary logical theories. The modern presentation of PM is this entry includes the symbols for descriptions and classes, thus differing from the completely rigorous presentations of Church (1976), for example, who avoids both definite descriptions and class expressions, and takes identity as an undefined primitive.

Despite these reactions to the rigor of the presentation, PM nevertheless was studied carefully by those interested in the new symbolic logic including David Hilbert and those in his school in Göttingen (see Ewald & Sieg 2013: 3 and Chwistek 1912). Primarily at issue were the kinds of assumptions Whitehead and Russell needed to complete their project. Although Principia succeeded in providing detailed derivations of many major theorems in finite and transfinite arithmetic, set theory, and elementary measure theory, three axioms in particular were arguably non-logical in character: the axioms of infinity, reducibility and the “multiplicative axiom” or Axiom of Choice. The axiom of infinity in effect states that there exists an infinite number of objects. Arguably it makes the kind of assumption generally thought to be empirical rather than logical in nature. The multiplicative axiom, later added to Zermelo’s axioms as the Axiom of Choice, asserts the existence of a certain set containing one element from each member of a given set. Russell objected that without a rule guiding the choice, such an axiom was not a logical principle. The axiom of reducibility was introduced as a means of overcoming the not completely satisfactory effects of the theory of types, the mechanism Russell and Whitehead used to restrict the notion of a well-formed expression, thereby avoiding Russell’s paradox. Although technically feasible, many critics concluded that the axiom was simply too ad hoc to be justified philosophically. Initially at least, Leon Chwistek (1912) believed that it led to a contradiction. Kanamori sums up the sentiment of many readers:

In traumatic reaction to his paradox Russell had built a complex system of orders and types only to collapse it with his Axiom of Reducibility, a fearful symmetry imposed by an artful dodger. (2009: 411)

In the minds of many, the issue of whether mathematics could be reduced to logic, or whether it could be reduced only to set theory, thus remained open.

In response, Whitehead and Russell argued that both axioms were defensible on inductive grounds. As they tell us in the Introduction to the first volume of Principia,

self-evidence is never more than a part of the reason for accepting an axiom, and is never indispensable. The reason for accepting an axiom, as for accepting any other proposition, is always largely inductive, namely that many propositions which are nearly indubitable can be deduced from it, and that no equally plausible way is known by which these propositions could be true if the axiom were false, and nothing which is probably false can be deduced from it. If the axiom is apparently self-evident, that only means, practically, that it is nearly indubitable; for things have been thought to be self-evident and have yet turned out to be false. And if the axiom itself is nearly indubitable, that merely adds to the inductive evidence derived from the fact that its consequences are nearly indubitable: it does not provide new evidence of a radically different kind. Infallibility is never attainable, and therefore some element of doubt should always attach to every axiom and to all its consequences. In formal logic, the element of doubt is less than in most sciences, but it is not absent, as appears from the fact that the paradoxes followed from premisses which were not previously known to require limitations. (1910: 62 [1925: 59])

Whitehead and Russell were also disappointed by the book’s largely indifferent reception on the part of many working mathematicians. As Russell writes,

Both Whitehead and I were disappointed that Principia Mathematica was only viewed from a philosophical standpoint. People were interested in what was said about the contradictions and in the question whether ordinary mathematics had been validly deduced from purely logical premisses, but they were not interested in the mathematical techniques developed in the course of the work.…Even those who were working on exactly the same subjects did not think it worth while to find out what Principia Mathematica had to say on them. I will give two illustrations: Mathematische Annalen published about ten years after the publication of Principia a long article giving some of the results which (unknown to the author) we had worked out in Part IV of our book. This article fell into certain inaccuracies which we had avoided, but contained nothing valid which we had not already published. The author was obviously totally unaware that he had been anticipated. The second example occurred when I was a colleague of Reichenbach at the University of California. He told me that he had invented an extension of mathematical induction which he called ‘transfinite induction’. I told him that this subject was fully treated in the third volume of the Principia. When I saw him a week later, he told me that he had verified this. (1959: 86)

Despite such concerns, PM proved to be remarkably influential in at least three ways. First, it popularized modern mathematical logic to an extent undreamt of by its authors. By using a notation more accessible than that used by Frege, Whitehead, and Russell managed to convey the remarkable expressive power of modern predicate logic in a way that previous writers had been unable to achieve. Second, by exhibiting so clearly the deductive power of the new logic, Whitehead and Russell were able to show how powerful the idea of a modern formal system could be, thus opening up new work in what soon was to be called metalogic. Third, Principia Mathematica re-affirmed clear and interesting connections between logicism and two of the main branches of traditional philosophy, namely metaphysics and epistemology, thereby initiating new and interesting work in both of these areas.

As a result, not only did Principia introduce a wide range of philosophically rich notions (including propositional function, logical construction, and type theory), it also set the stage for the discovery of crucial metatheoretic results (including those of Kurt Gödel, Alonzo Church, Alan Turing and others). Just as importantly, it initiated a tradition of common technical work in fields as diverse as philosophy, mathematics, linguistics, economics and computer science.

Today a lack of agreement remains over the ultimate philosophical contribution of Principia, with some authors holding that, with the appropriate modifications, logicism remains a feasible project. Others hold that the philosophical and technical underpinnings of the project remain too weak or too confused to be of great use to the logicist. (For more detailed discussion, readers should consult Quine 1963, 1966a, 1966b; Landini 1998, 2011; Linsky 1999, 2011; Hale and Wright 2001; Burgess 2005; Hintikka 2009; and Gandon 2012.)

There is also lack of agreement over the importance of the second edition of the book, which appeared in 1925 (Volume I) and (Volumes II and III were directly reprinted from the first edition in 1927). The revisions were done by Russell, although Whitehead was given the opportunity to advise. In addition to the correction of minor errors throughout the original text, changes to the new edition included a new Introduction and three new appendices. (The appendices discuss the theory of quantification, mathematical induction and the axiom of reducibility, and the principle of extensionality respectively.) The book itself was reset more compactly, making page references to the first edition obsolete. Russell continued to make corrections as late as 1949 for the 1950 printing, the year he and Whitehead's widow finally began to receive royalties.

Today there is still debate over the ultimate value, or even the correct interpretation, of some of the revisions, revisions that were motivated in large part by the work of some of Russell’s brightest students, including Ludwig Wittgenstein and Frank Ramsey. Appendix B has been notoriously problematic. The appendix purports to show how mathematical induction can be justified without use of the axiom of reducibility; but as Alasdair Urquhart reports,

The first indication that something was seriously wrong appeared in Gödel’s well known essay of 1944, “Russell’s Mathematical Logic”. There, Gödel points out that line (3) of the demonstration of Russell’s proposition ∗89·16 is an elementary logical blunder, while the crucial ∗89·12 also appears to be highly questionable. It still remained to be seen whether anything of Russell’s proof could be salvaged, in spite of the errors, but John Myhill provided strong evidence of a negative verdict by providing a model-theoretic proof in 1974 that no such proof as Russell’s can be given in the ramified theory of types without the axiom of reducibility. (Urquhart 2012)

Linsky (2011) provides a discussion, both of the Appendix itself and of the suggestion that by 1925 Russell may have been out of touch with recent developments in the quickly changing field of mathematical logic. He also addresses the suggestion, made by some commentators, that Whitehead may have been opposed to the revisions, or at least indifferent to them, concluding that both charges are likely without foundation. (Whitehead’s own comments, published in 1926 in Mind, shed little light on the issue.)

3. Contents of Principia Mathematica

Principia Mathematica originally appeared in three volumes.

Together, the three volumes are divided into six parts. The commentary that follows will go through the sections that follow, indicating in the early parts where a reader can skip ahead to study the unique features of the development of mathematics in the PM system as contrasted with that of Frege and contemporary set theory.

3.1 Volume I

Volume I is divided into the a lengthy Introduction containing three sections, followed by two major Parts I (divided into Sections A–E) and II (also divided into Sections A–E):

Preliminary Explanations of Ideas and Notations

The Theory of Logical Types

Incomplete Symbols

Part I: Mathematical Logic A. The Theory of Deduction ∗1–∗5 B. Theory of Apparent Variables ∗9–∗14 C. Classes and Relations ∗20–∗25 D. Logic of Relations ∗30–∗38 E. Products and Sums of Classes ∗40–∗43

Part II: Prolegomena to Cardinal Arithmetic A. Unit Classes and Couples ∗50–∗56 B. Sub-Classes, Sub-Relations, and Relative Types ∗60–∗65 C. One-Many, Many-One and One-One Relations ∗70–∗74 D. Selections ∗80–∗88 E. Inductive Relations ∗90–∗97



3.2 Volume II

Volume II begins with a preliminary section on notational conventions followed by Parts III (divided into Sections A–C), IV (divided into Sections A–D), and the first half of Part V (Sections A–C):

Prefatory Statement of Symbolic Conventions

Part III: Cardinal Arithmetic A. Definition and Logical Properties of Cardinal Numbers ∗100–∗106 B. Addition, Multiplication and Exponentiation ∗110–∗117 C. Finite and Infinite ∗118–∗126

Part IV: Relation-Arithmetic A. Ordinal Similarity and Relation-Numbers ∗150–∗155 B. Addition of Relations, and the Product of Two Relations ∗160–∗166 C. The Principle of First Differences, and the Multiplication and Exponentiation of Relations ∗170–∗177 D. Arithmetic of Relation-Numbers ∗180–∗186

Part V: Series A. General Theory of Series ∗200–∗208 B. On Sections, Segments, Stretches, and Derivatives ∗210–∗217 C. On Convergence, and the Limits of Functions ∗230–∗234



3.3 Volume III

Volume III contains the remainder of Part V (Sections D–F) and concludes with Part VI (divided into Sections A–D):

Part V: Series (continued) D. Well-Ordered Series ∗250–∗259 E. Finite and Infinite Series and Ordinals ∗260–∗265 F. Compact Series, Rational Series, and Continuous Series ∗270–∗276

Part VI: Quantity A. Generalization of Number ∗300–∗314 B. Vector-Families ∗330–∗337 C. Measurement ∗350–∗359 D. Cyclic Families ∗370–∗375



A fourth volume on geometry was begun but never completed (Russell 1959: 99).

Overall, the three volumes not only represent a major leap forward with regard to modern logic, they are also rich in early twentieth-century mathematical developments. To give one example, Whitehead and Russell were the first to define a series as a set of terms having the properties of being asymmetrical, transitive and connected (1912 [1927: 497]). To give another, it is in Principia that we find the first detailed development of a generalized version of Cantor’s transfinite ordinals, which the authors call “relation-numbers”. The resulting “relation-arithmetic” in turn led to significant improvements in our understanding of the general notion of structure (1912: Part IV).

As T.S. Eliot points out, the book also did a great deal to promote clarity in the use of ordinary language in the early part of the twentieth century:

how much the work of logicians has done to make of English a language in which it is possible to think clearly and exactly on any subject. The Principia Mathematica are perhaps a greater contribution to our language than they are to mathematics. (1927: 291)

The book is also not without some self-deprecating humour. As Blackwell points out (2011: 158, 160), the authors twice poke fun at the length and tedium of the project’s many logical derivations. In Volume I, the authors explain that one cannot list all the non-intensional functions of \(\phi \bang \hat{z}\) “because life is too short” (1910 [1925: 73]); and in Volume 3, after over 1,800 pages of dense symbolism, the authors end Part IV, Section D, on Cyclic Families, with the comment,

We have given proofs rather shortly in this Section, particularly in the case of purely arithmetical lemmas, of which the proofs are perfectly straightforward, but tedious if written out at length. (1913 [1927: 461])

Evidence that the humour originates more with Russell than with Whitehead is perhaps found in not dissimilar remarks that appear in Russell’s other writings. Russell’s comment when discussing the axiom of choice, to the effect that given a collection of sets, it is possible to “pick out a representative arbitrarily from each of them, as is done in a General Election” (1959: 92), is perhaps a case in point.

Readers today (i.e., those who have learned logic in the last few decades of the twentieth century or later) will find the book’s notation somewhat antiquated. Readers wanting assistance are advised to consult the entry on the notation in Principia Mathematica. Even so, the book remains one of the great scientific documents of the twentieth century.

4. Volume I

4.1 Part I: Mathematical Logic

4.1.1 Propositional Logic in PM

The system of propositional logic of PM, can be seen as a system of sentential logic consisting of a language, and rules of inference. PM contains the first presentation of symbolic logic that deals with propositional logic as a separate theory. Frege had involved quantification from the beginning, while Peano’s system was interpretable as about propositions and classes with some different principles holding for each interpretation. The propositional logic of PM is unusual for modern readers, for various reasons having to do with its origins in Russell’s earlier work on logic. One is that the axioms of propositional logic are not stated using only the primitive connectives of the logic, which are \(\lnot\) and \(\lor\), but instead only use \(\lor\), and \(\supset\) which is a defined connective.

In this section we will use A, B, etc as meta-linguistic variables for formulas. The formulas constructed from atomic propositions with the connectives are said to express elementary propositions to distinguish them from propositions involving quantifiers and propositional functions. The system is organized axiomatically, the axioms, called “primitive propositions” or “Pp”, and are presented with the characteristic ‘\(\supset\)’ of material implication, which is defined with \(\lnot\) and \(\lor\). The connectives \(\lamp\) and \(\equiv\) are also defined, but not needed in the statements of the axioms. This peculiarity has its origins in Russell’s view from 1903 that

The propositional calculus is characterized by the fact that all its propositions have as hypothesis and as consequent the assertion of a material implication. (1903: 13)

All of the “primitive propositions” of PoM are stated with only material implication as a primitive connective. The connectives \(\lamp\),\(\lor\) and \(\equiv\), and are defined might be expected. The notion of negation, expressed by \(\lnot\), is defined using a notion of quantification over propositions ( \(\lnot A\) means that A implies all propositions). By 1906 Russell had decided to use \(\lnot\) as a primitive connective, and no longer used propositional quantifiers, allowing \(\supset\) to be defined, while the primitive propositions were still stated with \(\supset\) and \(\lor\). That the system of propositional logic in PM was the result of an evolution of changes in choices of primitives is mirrored in the choice of theorems that are proved in the first chapters. While most are proved because they will be used later in PM, some remain simply as remants of the earlier systems. In particular PM contains several theorems that were primitive propositions in earlier systems, though not used in what follows. In fact one primitive proposition of PoM, known as “Peirce’s Law (\([(p \supset q )\supset p ] \supset p\)) appears to have been proved in an early version of PM as ∗2·7 but simply deleted (and its number not reassigned to another theorem) simply to save space (see Linsky 2016).

The notion of truth-functional semantics for propositional logic, using the familiar truth tables, and the notion of completeness of an axiom system, was not developed until soon after the publication of PM by Bernays (1926). As a result there is no attempt to find a short list of axioms that will be complete, and so at later stages of the work there is no simple appeal to “tautological consequences” which might be easily justified by semantic considerations.

The language of propositional logic in PM consists of a vocabulary consisting of:

Atomic proposition variables: p , q , r , \(p_1\), … (There are no proposition constants.)

, , , \(p_1\), … (There are no proposition constants.) Sentential connectives. Primitive: \(\lnot\) and \(\lor\). Defined: \(\supset\), \(\amp\), \(\equiv\).

Punctuation: \((\), \()\), \([\), \(]\), \(\{\), \(\}\), etc.

The well formed formulas (wffs) are defined as follows:

Atomic proposition variables are wffs.

If A and B are wffs then so are: \(\lnot A\) and \(A \lor B\)

The other familiar connectives are defined:

Definitions \[\begin{align} \tag*{∗1·01} {A \supset B} & \eqdf \lnot {A \lor B}\\ \tag*{∗3·01} {A \lamp B} & \eqdf {\lnot(\lnot A \lor \lnot B})\\ \tag*{∗4·01}({A \equiv B} & \eqdf {(A \supset B) \lamp (B \supset A)}\\ \end{align} \]

Axioms \[\begin{align} \tag*{Pp ∗1·2} (p \lor p ) & \supset p\\ \tag*{Pp ∗1·3} q & \supset (p \lor q )\\ \tag*{Pp ∗1·4} (p \lor q ) & \supset (q \lor p)\\ \tag*{Pp ∗1·5} [p \lor ( q \lor r )] & \supset [q \lor ( p \lor r )]\\ \tag*{Pp ∗1·6} ( q \supset r ) & \supset [ (p \lor q ) \supset (p \lor r ) ]\\ \end{align}\]

In 1926 Paul Bernays showed that this could be reduced by one, as axiom 4 (∗1·5) can be proved from the others.

Rules of inference: Modus ponens ( ∗1·1 ): From \(\vdash A \supset B\) and \(\vdash A\), derive \(\vdash B\) Substitution: From \(\vdash A\) derive \(\vdash A'\) where \(A'\) is the result of substituting some formula B uniformly for any atomic proposition variable that occurs in A .



There is no explicit statement of a rule of substitution in PM. The free variables in the propositional logic of PM may be interpreted as schematic letters, and so the system will require a rule of substitution of formulas. In this article they are be interpreted as real variables ranging over propositions, in which case instances would be derived by instantiation from generalizations over all propositions. The announcement in the Introduction that propositions are not necessary in what follows and so will be avoided suggests the schematic interpretation of the variables. We follow the variable interpretation in this article, however, in part to allow our notation to follow PM, with p’s and q’s rather than a new vocabulary of schematic letters A, B, etc. This interpretation of the letters as variables will also assist in the presentation of quantificational logic in PM below.

As is standard for an axiomatic formulation of logic, a derivation of a formula of sentential logic in PM will consist of an instance of one of the six axioms, the result of a substitution in a preceding line, or the application of modus ponens to two preceding lines. Theorems of PM will be proved in order, allowing the use of (instances of) preceding theorems as lines in later derivations.

The resulting system is complete, in the sense that all and only truth-functionally valid sentences are derivable in the system. This despite the seeming defects of the system by modern standards, including the redundancy of one of the axioms, the use of defined symbols in expressions to which the rules of inference apply, and the use of defined symbols in the axioms. The derivations in ∗2 to ∗5 are abbreviated, but with an indication on the side of each line of what justifies it, and how any abbreviation can be undone. Theorems are proved primarily as needed in later numbers, but some were axioms, or important theorems of earlier versions of propositional logic, going back to The Principles of Mathematics. Aside from historical interest in their actual choices, however, the system of PM can be viewed as based on any standard system of propositional logic.

4.1.2 The “Ramified” Theory of Types

The theory of types in the initial chapters of PM is ramified, so that within a given type, of propositions, or of functions of individuals, and functions of functions of individuals, there will be finer subdivisions. This ramification is necessary for the application of the logic of PM to what what are called “epistemological” paradoxes in the Introduction to PM. The most prominent of these is the (propositional) Liar paradox created by the proposition that all propositions of a certain sort, say asserted by Epimenides, are false, when that very proposition is of that sort, that is the only proposition that Epimenides asserted. The solution in the ramified theory of types requires that a proposition about a sort of first level propositions, say that they are all false, will itself be of the next order.

The paradoxes of the theory of sets are resolved by reducing assertions about sets to assertions about propositional functions. The restriction that a function of one type cannot apply to a function of the same type is enough to block the paradoxes. Thus the distinction between individuals, functions of individuals, and functions of such functions, categorized by what came to be called “simple theory of types” is enough for the purposes of reducing mathematics to classes, and so to logic. The idea that the full theory of types was not needed to resolve the mathematical or set theoretical paradoxes was proposed by Chwistek (1921) and Ramsey (1931), and led to the later introduction of the terms “ramified theory of types” and “simple theory of types” that will be used in this entry.

In the Introduction to PM terminology is introduced for the two ways that variables may appear in formulas. The “apparent variables” are bound variables, whereas “real variables” are free variables. The proper interpretation of higher-order variables in PM is the subject of contemporary dispute among scholars of PM. Landini (1998) and Linsky (1999) offer two rival accounts. Landini holds that higher-order free variables should be interpreted as schematic letters, replaceable by formulas, and that the bound variables are to be interpreted “substitutionally”. The logic of the theory of types in PM can be seen as an extension of a theory of a standard first order logic developed in ∗10. Then the more distinctive notions of PM that depend on the theory of types can be explained. These include the Axiom of Reducibility, in ∗12 which underlies the so called ramification of the theory of types, the division into orders of predicates true of a single type of argument. The Axiom of Reducibility asserts that for an arbitrary function of any order there is an equivalent predicative function, that is, one true of exactly the same range of arguments. Identity is defined, in ∗13, with a version of Leibniz’ notion of the identity of indiscernibles that is consistent with the theory of types. Replacing Leibniz’ notion that x and y are identical just in case they share the same properties, in PM, x and y are identical if and only if they share the same predicative functions. Then using the notion of identity so defined, PM presents Russell’s theory of definite descriptions, precisely as it was defined in in “On Denoting” (1905). This article will use the notation for “r-types” due to Alonzo Church 1976, which is explained in the accompanying article “The Notation of Principia Mathematica” in this Encyclopedia.

Although PM does not single out first order logic from the whole ramified theory of types, the actual deductive apparatus on the page looks exactly like a system of first order logic, and the complications of the logic of higher types can be expressed with an additional apparatus of type indices. In what follows we will use the system of r-types in Church (1976) for type indices, and the use of lambda operators for propositional functions.

Church’s (1976) formulation of the logic of PM with r-types The language of the higher-order quantificational logic of PM is called ramified type theory, and the system of types, following Church (1976) will be called r-types. Note that there are two kinds of variables, but they are all all assigned to an r-type. Individual variables behave as a special case of propositional function variables. (argument) variables: \(x_{\mathbf{\tau}}\), \(y_{\mathbf{\tau}}\), \(z_{\mathbf{\tau}}\), …for each type \(\tau\)

n -place propositional function variables: \(\phi_{\tau}^{n}, \psi_{\tau}^{n}\), …(\(n \geq 1\)), where \(\tau\) is a type symbol. (\(R^n,S^n, \ldots\) (\(n \geq 1\)) for relations in extension.) \(\chi\) is used for a higher-order function of functions, as in \(\chi (\phi)\), and \(\Phi\) for the next order, as in \(\Phi (\chi)\)

-place propositional function variables: \(\phi_{\tau}^{n}, \psi_{\tau}^{n}\), …(\(n \geq 1\)), where \(\tau\) is a type symbol. (\(R^n,S^n, \ldots\) (\(n \geq 1\)) for relations in extension.) \(\chi\) is used for a higher-order function of functions, as in \(\chi (\phi)\), and \(\Phi\) for the next order, as in \(\Phi (\chi)\) connectives: \(\lnot\), \(\lor\)

punctuation: \((\), \()\), \([\), \(]\), \(\{\), \(\}\), etc.

the quantifier symbols: \(\forall\) and \(\exists\).

the lambda symbol: \(\lambda\) The system of symbols for r-types and the assignment of r-types to variables for different entities (individuals and functions) is as follows: \(\iota\) is the r -type for an individual.

-type for an individual. Where \(\tau_1 \ldots, \tau_m\) are any r-types, then \((\tau_1 \ldots, \tau_m) / n\) is the r-type of a propositional function of level n; this is the r-type of any m-ary propositional function of level n, which has arguments of r-types \(\tau_1 \ldots, \tau_m\), respectively. The order of an entity is defined as follows: the order of an individual (of r -type \(\iota\)) is 0

-type \(\iota\)) is 0 the order of a function of r-type \(\tau_1 \ldots, \tau_m / n\) is \(n+N\) where N is the greatest of the order of the arguments \(\tau_1 \ldots, \tau_m\) There are no predicate or individual names in this language. There are, however complex terms for propositional functions, defined together with formulas (with the usual notion of bound and free variables): Let \(\phi_{\tau}\) be a variables ranging over propositional functions of type \(\tau\). We read \(x_{\tau}\) as a metalinguistic variable ranging over variables of r-type \(\tau\). The subscript \(\tau\) will be indicated only with the initial quantifier which governs the variable. We then can define the well formed formulas (wffs) and terms of quantificational logic as follows: Variables (for individuals and propositional functions) are terms.

If \(\phi_{\tau}^{n}\) is an n -place propositional function variable of r -type \((\tau_1 \ldots, \tau_n) /k\) and \(x_{r}^{n}\) are terms of of r -types \(\tau_1 \ldots, \tau_n\), respectively, then \(\phi (x_1, \ldots x_n)\) is a wff. (The variables \(x_n\) are called “argument” variables. They will include individual variables of r -type \(\iota\), but also variables of higher types. The variable \(\phi\) can occur as a predicate in \(\phi(x)\) and as an argument in \(\Psi (\phi)\), and cannot be of type \(\iota\) to occur in a wff.)

If x is a variable and A is a wff then \(\lambda x A\) is a term of r -type \((\tau)/n\) (where n is one more than the highest order of any bound variable in A and at least as high as the order of any free variable in A ).

If x is an individual variable of type \(\tau\) and A is a wff in which x occurs \(free\) then \(\forall x A\) and \(\exists x A\) are wffs.

If A and B are wffs, then so are \(\lnot A\), \(A \lamp B\), \(A \lor B\), \(A \supset B\), and \(A \equiv B\). The conventional precedence ordering of connectives will allow for fewer punctuation signs to indicate scope of connectives, thus \(A \lor B \supset C\) is read as \(( A \lor B) \supset C\) The comprehension principle for a system of higher-order logic, or set theory, states which formulas express a property or set. Within a type theory this allows for what looks like an “unrestricted” comprehension principle, in that for every well formed expression A with a free variable, x, there is a property which is satisfied by precisely the entities satisfying the formula. It is the restrictions of types that block the paradoxes, as the problematic formulas “is not a member of itself” and “does not apply to itself” are ruled out by the system of types. The comprehension principle then is characterized by an infinite set of sentences of the form of: Comprehension: \[\exists \phi \forall x_{\tau} [ \phi (x) \equiv A], \quad (\phi \textrm{ not free in } A)\] \[\exists \phi \forall x_{\tau} [ \phi (x) \equiv A], \quad (\phi \textrm{ not free in } A)\] where \(\phi\) is a functional variable of r-type \((\tau)/n\) and x is a variable of r-type \(\tau\), and the bound variables of A are all of order less than the order of \(\phi\) and the free variables of A are all of order not greater than the order of \(\phi\).

As presented here Church’s seemingly straightforward comprehension principle, with its restrictions on the types of variables, is for Quine a glaring manifestation of the confusion of use and mention of language that he sees infecting PM:

…there is a characteristic give and take between sign and object: the propositional function gets its order from the abstractive expression, and the order of the variable is the order of the values. Exposition is eased by allowing the word ‘order’ a double sense, attributing orders at once to the notations and, in parallel, to their objects. (Quine 1963: 245)

The offense comes from attributing orders (r-types) to propositional functions on the basis of the variables with which they are defined, but also to the functions themselves, as simply values of bound higher-order variables. In response, the defender of type theory must say that any semantic intrepretation of the notion of propositional function will have to attribute to functions these distinctions that are marked in linguistic expressions of some of them, and in particular, the variables involve in their definition.

What follows in PM up to ∗12 is a presentation of quantificational logic in the ramified theory of types. The complications are due to the decision of the authors (surely on Russell’s insistence) to add a new section ∗9 which allows the earlier theory of propositional logic to be incorporated directly into a quantificational logic as is done in contemporary logic. This shows the extent to which the earlier theory is indeed a theory of propositions, not an account of a fragment of quantificational logic allowing open sentences containing free variables.

Quantificational Logic in PM

Section ∗10 formulates quantificational logic as it is currently formulated, namely the axioms and theorems of propositional logic are assumed to hold for all formulas, and not just the elementary propositions of ∗1–5. It appears that Russell became concerned about this assumption, and so a new section ∗9 was introduced to derive the principles of quantification theory from elementary propositions alone. While of interest to scholars of PM, the upshot is the same for later uses of quantificational logic in PM.

Again, the reader interested in what distinguishes the logicist project in PM can skip this section, although passing attention may be paid to the system of higher-order logic that is used, as based here on the ramified theory of types

The extension to functions of more than one variable is obvious, and below, some applications will employ this extension.

The existential quantifier and the other familiar connectives \(\supset\), \(\lamp\) and \(\equiv\) are defined as for propositional logic. (In what follows A is now an arbitrary (possibly quantificational) formula):

Axioms of ∗10 : All instances of propositional theorems where wffs are uniformly substituted for propositional variables.

\[ \tag*{∗10·01} \exists x A \eqdf \lnot \forall x \lnot A \]

The system of PM uses a rule of Universal Generalization and an Axiom which amounts to a rule of Instantiation.

\[ \tag*{∗10·1} \vdash \forall x_{\tau} A \supset A' \]

where \(A'\) is like A except for having a term y of type \(\tau\) substituted for \(x_{\tau}\) in A.

(Note: The notion of suitable “substitution” is much more complicated for logic of higher types than it is for first order logic. In part this is because of the application to an argument of lambda expressions for a propositional function, e.g., \([\lambda x \phi (x)] (

u)\) where \(

u\) may be a complex term involving variables and quantifiers in other lambda expressions.)

\[ \tag*{∗10·11} \textrm{If } \vdash A \textrm{ then } \vdash \forall x_{\tau} A' \]

where \(A'\) is like A except for having a term y of type \(\tau\) substituted for x in A

Other quantifier principles, which govern the move of a quantifier from the inside of a formula to governing the entire formula, so called “quantifier containment principles” are also derived as theorems in ∗10. Some that are often used in later numbers are:

\[ \begin{align} \tag*{∗10·12} \forall x_{\tau} ( A \lor \phi (x ) ) & \supset (A \lor \forall x_{\tau} \phi (x ))\\ \tag*{∗10·21} \forall x_{\tau} [ A \supset \phi (x ) ] & \equiv [A \supset \forall x_{\tau} \phi (x)]\\ \end{align}\]

The introduction to ∗10 in PM begins with:

The chief purpose of the propositions of this number [∗10] is to extend to formal implications (i.e. to propositions of the form \(\forall x (\phi x \supset \psi x)\) as many as possible of the propositions proved previously for material implications, i.e. for propositions of the form \(p \supset q\). (notation updated)

In other words, this section introduces the logic of quantification, in a way that is familiar to contemporary logic. The propositional logic of the preceding sections is interpreted as true only of elementary, first order propositions, and so extended to higher-order logic by showing how sentences can be presented in “prenex form”, that is with quantifers in intial position preceding a quantifier free matrix. These theorems are familiar now as “quantifier confinement” theorems, of the form of:

The Axiom of Reducibility

\[ \tag*{∗10·23} \forall x_{\tau} [ \phi (x ) \supset A ] \equiv \exists x_{\tau} [ \phi ( x ) \supset A ] \]

Given that the system of PM contains a ramified theory of types, however, the move to discussion of classes for the remainder of the work after ∗20 requires a further axiom, the axiom of reducibility, in order to allow a simple theory of types of classes. Consider the fundamental notion from the theory of real numbers of the least upper bound (l.u.b.) of a bounded class of real numbers. Consider the class of all real numbers whose square is less than or equal to 2, i.e., \(\{ x \mid x^2 \leq 2\}\). A class of reals S has an upper bound if and only if \(\exists r \forall s ( s \in S \supset s \leq r)\). If a bounded class S of real numbers has members of some r-type \(\tau\), then the least upper bound must belong to an r-type \(\tau / 1\) because of the quantifier in the definition ranging over the elements s of S. We say that the definition of S is “impredicative” because it involves quantification over a totality to which it is intended to belong. The theory of real numbers, however, requires that sometimes the least upper bound of a class is a member of that class, in this case, the least upper bound of S, namely \(\sqrt{2}\), is an element of S.

The resolution of this in the system of PM is to adopt an axiom which guarantees that any class defined in terms of another class will be of the same type. Thus impredicative definitions of classes are allowed, and do not introduce a class of a higher type. This is accomplished by adopting the Axiom of Reducibility, in ∗12, which guarantees that for any function \(\phi\), there will be a co-extensive predicative function. More precisely, the Axiom of Reducibility asserts that for any function of any number of arguments of an arbitrary level, there is an equivalent function of level 1, ie. one true of the same entities:

Axiom of Reducibility, \[ \tag*{Pp ∗12·1} \forall \psi \exists \phi \forall x_{\tau} [\psi (x) \equiv \phi \bang(x) ] \] \[ \tag*{Pp ∗12·1} \forall \psi \exists \phi \forall x_{\tau} [\psi (x) \equiv \phi \bang(x) ] \] where \(\phi \bang\) is a predicative function.

The exclamation mark “\(\bang\)” is used in PM to indicate predicative functions. In Church’s system of r-types this is expressed by saying that the variable x is of r-type \(\tau\) and \(\phi\) is of r-type \((\tau )/1\) and \(\psi\) is of r-type \((\tau) /n\). In other words, \(\phi\) is of the lowest order compatible with its arguments. This notion of predicative functions is taken from the Introduction. In ∗12 Whitehead and Russell propose a narrower conception of predicative function, by which \(\phi\) must be a matrix, or function in the definition of which no quantifiers at all appear. See the accompanying entry on the notation in Principia Mathematica.

It has seemed to some, beginning with Chwistek (1912) and continuing through Copi (1950) that the Axiom of Reducibility is technically faulty, leading to an inconsistency, or at least redundancy in the system of PM. Ramsey (1931) early on argued that the supposed contradiction in fact demonstrated that certain predicative functions are indefinable. Church (1976), confirms this assessment, and uses the presentation of r-types we describe here to show rigorously the limitations on what functions are definable in the system of PM.

The interaction of this Axiom with the theory of classes in PM will be explained below in connection with ∗20 on classes.

Identity in PM

Contemporary logic follows Frege in treating identity, represented by \(=\), as a logical notion. In PM the notion of identity is defined following Leibniz as indiscernibility, namely indiscernible objects are identical. That is, \(\forall \phi ( \phi x \equiv \phi y) \supset x = y\). But since the axiom of reducibility guarantees that if there is any type of function on which x and y differ, they will differ on some predicative function, PM uses the following definition of identity:

\[ \tag*{∗13·01} x_{\tau} = y_{\tau} \eqdf \forall \phi [ \phi \bang (x) \supset \phi \bang(y) ], \]

for \(\phi \bang\) a predicative function.

In contemporary systems of logic an axiom or rule of inference allows that if \(x = y\), then for any predicate \(\phi\), \(\phi x \equiv \phi y\). In other words, identicals are indiscernible. The given definition of identity only suffices if it is not possible that entities x and y which share all predicative properties, cannot be distinguished by some property of a higher order. The axiom of reducibility guarantees that x and y sharing properties of any given higher order will entail sharing predicative properties, and so by the definition of identity, as in ‘\(x = y\)’.

In the appendix B to the second edition of PM, which was written by Russell, there is a technical discussion of the consequences of abandoning the axiom of reducibility. A faulty proof is proposed to show that the principle of Induction can be derived without using the axiom of reducibilty in a modified theory of types (see Linsky 2011). As Russell points out, however, it is not possible to show that the theory of real numbers as based on “Dedekindian” classes of rational numbers, without assuming the axiom of reducibility. (The thesis that every class of reals with an upper bound has a real number as its least upper bound, discussed above, would not be provable.) As a result, Russell says “analysis would collapse”. In all of this discussion, however, Russell does not indicate what would replace the definition of Identity in ∗13, which so crucially depends on the axiom of reducibility.

Definite Descriptions

Russell presented his theory of definite descriptions in “On Denoting” (1905) and it has probably been the most widely discussed application of the logic of PM. The role of the theory of definite desciptions in PM, however, is exhausted by its use in ∗30 to define what are called “Descriptive functions”. In contemporary logic it is routine to show how the notion of a “functional relation” can be used to justify the introduction of function symbols into a language with only n-place predicates. The theory of definite descriptions is essential for this argument. After ∗30 there are only a handful of occurrences of description operators in PM. What is perhaps Russell’s most valuable contribution to philosophical logic and the philosophy of language, is, here, only a device used for a technical, though programatically important, purpose. The technical purpose, however, does indicate an important distinction between the logicism of Frege and Russell. Frege’s logic is based on the notion of concept, which is a case of a function from objects to truth values. Russell’s logic can be seen as further reducing the mathematical notion of function to his logical notion of propositional function. Some logicians firmly in the tradition of mathematical logic do not find this to be an advance, but it does indicate a significant difference between the approaches of Frege and Russell (see Linsky 2009).

Definite descriptions are expressions of the form “the \(\phi\)” which occur in the position of terms apparently as the arguments of functions. Russell’s example from “On Denoting” (1905) is the expression “The present King of France” which apparently occurs as an argument to the function “is bald” in the sentence “The present King of France is bald”. In general the expression “the \(\phi\) is \(\psi\)” is defined as equivalent to the expression “There is exactly one \(\phi\) and it is \(\psi\)”:

Contextual definition of Definite Descriptions \[ \tag*{∗14·01} \psi x_{\tau} \phi (x) \eqdf \exists x_{\tau} \forall y_{\tau}[ \phi ( y) \equiv y = x \lamp \psi (x)] \] \[ \tag*{∗14·01} \psi x_{\tau} \phi (x) \eqdf \exists x_{\tau} \forall y_{\tau}[ \phi ( y) \equiv y = x \lamp \psi (x)] \]

The use of the expression \(\eqdf\) which makes it appear that both flanking expressions are terms, disguises the fact that in this case of a “contextual definition” what occurs on each side is are formulas, the right hand side replacing the left hand side, thus “eliminating” the definite description.

To distinguish the two readings of the expression “The present King of France is not bald”, according to the “scope” of the description (with respect to negation), PM uses a “scope indicator” \([x_{\tau} \phi ( x )]\) before the formula from which the description is to be eliminated by the definition above. Symbolize “The present King of France” as \(x K(x)\) and “x is bald” as \(B(x)\), the two readings will be symbolized as:

\[[x K(x)] \lnot B(x K(x)),\]

which, eliminating the description by definition, becomes:

\[ \exists x \forall y \{ [K(x) \equiv y = x ] \amp \lnot B(x) \} \]

which is the reading on which there is exactly one present King of France and he is not bald, and:

\[\lnot [x K(x)]B(x K(x)),\]

which, eliminating the description by definition, becomes:

\[ \lnot ( \exists x \forall y \{ [K(x) \equiv y = x ] \amp B(x) \} )\]

The latter is the reading on which it is not the case that there is one and only one present King of France and he is bald. That may be true if there is not exactly one present King of France, as is actually the case, as France has no King. In such a case the description is not “proper”, which is expressed with a special symbol in PM, \(E\bang\), defined as:

proper description \[ \tag*{∗14·02} E\bang (x_{\tau} \phi (x)) \eqdf \exists x_{\tau} \forall y_{\tau}[ \phi (y) \equiv y = x ] \] \[ \tag*{∗14·02} E\bang (x_{\tau} \phi (x)) \eqdf \exists x_{\tau} \forall y_{\tau}[ \phi (y) \equiv y = x ] \]

In theorem ∗14·3 we find one of the rare occurrences of bound variables ranging over propositions p and q of functions that are not predicative. (Suppose that p and q are of some r-type \(()/n\) and f is a function of those propositions, f might have r-type \((()/n)/m\) for \(m, n > 1\)). Here we also see an occurrence of a formula \(x \phi (x)\) in subject position expressing a proposition as an argument of such a function. These expressions do not figure in theorems later in PM and only occasionally in the introductory material of some sections. Theorem ∗14·3 asserts that in truth-functional contexts the scope of a (proper) description does not effect the truth value of a proposition in which it occurs:

\[ \tag*{∗14·3} \begin{align} \{ [ \forall p \forall q ( p \equiv q ) \supset (\Phi(p) \equiv \Phi(q) )] \lamp E\bang[x \phi(x)] \} \supset \\ \{ \Phi [x \phi (x) ]\chi(x \phi (x) )\equiv [x \phi (x)] \Phi(\chi(x \phi (x ))) \} \end{align} \]

This theorem is another indication of the way in which the philosophical basis of PM, with its propositional functions that are intensional is left behind as the mathematical content of PM is introduced with the definition of classes in the next sections.

The “No-Classes” Theory of Classes

The theory of sets (classes) in PM is based on a number of contextual definitions, similar in some ways to the theory of descriptions. In what follows we will occasionally use the expression “class” for the PM notion, to remind the reader of the differences between this and an axiomatic theory of sets, such as ZF, not to indicate that these are “proper classes” in the sense used in ZF or VGB class theory, to indicate an expression that does not define a set, such as \(\{ x \mid x = x \}\), which is true of the universe \(\rV\) and so too “large” to be a set.

The basic definition eliminates terms for classes from contexts in which they occur, just as the theory of definite descriptions eliminates descriptions occuring in the positions of terms:

Contextual definition of classes \[ \phi \{x \mid \psi (x) \} \eqdf \exists \chi \left [ \begin{split} \forall x [ \chi \bang (x) \equiv \psi (x) ] \\ {} \lamp \phi (\chi \bang) \end{split} \right] \tag*{∗20·01} \] \[ \phi \{x \mid \psi (x) \} \eqdf \exists \chi \left [ \begin{split} \forall x [ \chi \bang (x) \equiv \psi (x) ] \\ {} \lamp \phi (\chi \bang) \end{split} \right] \tag*{∗20·01} \] for \(\chi \bang\) a predicative function

In other words, an expression seeming to attribute the property \(\phi\) to a class \(\{x \mid \psi (x) \}\) is true if and only if there is some predicative property \(\chi\), which is co-extensive with \(\psi\), which really has the property \(\phi\).

The notion of membership (\(\in\)) which is the one non-logical relation symbol of ZF, is defined in the PM system:

Definition of \(\in\) \[ \tag*{∗20·02} x \in \phi \eqdf \phi \bang (x) \] \[ \tag*{∗20·02} x \in \phi \eqdf \phi \bang (x) \] for \(\phi\) a predicative function.

The principal role of this “no-classes” theory of classes, as it is called, is to show how the theory of types resolves the paradoxes that had afflicted the naive theory of classes in The Principles of Mathematics and was seen by Russell to afflict Frege’s theory. After these foundational sections, all the individual variables that appear in PM should be seen as ranging over classes, (and, as will be explained below, the relation symbols are to be interpreted as ranging over relations in extension). The paradoxes appear in different forms, as seen in the Introduction to PM, but the resolution of the paradox of “the class of all classes which do not belong to themselves”, which appears in Russell’s intial letter to Frege, will be used as our example. This class, which leads directly to a contradiction, would appear in contemporary notation as \(\{ x \mid x

otin x \}\). The paradox arises when one asks whether that class is a member of iteself or not. The expression that it is a member of itself \(\{ x \mid x

otin x \} \in \{ x \mid x

otin x \}\) will have two class expressions to be eliminated by the first definition, and then several uses of the relation symbol \(\in\) which will also be eliminated. In the end there will be an expression \(\lnot (\phi_{\tau} \in \psi_{\tau})\), which is not legitimate, since this is not well-formed for any \({\tau}\). A function must be of a higher order than its arguments.

The effect of these two definitions is to demonstrate that classes fall into a simple theory of types, and while subject to these type restrictions, all of the inferences involving class expressions observe classical quantification theory as stated in ∗10 above. The definitions of existential and universal quantification are simple. Note that Russell uses Greek letters (\(\alpha\), \(\beta\),…) to range over classes:

Definition of quantification over “all classes” \[ \tag*{∗20·07} \forall \alpha \chi (\alpha) \eqdf \forall \phi \chi ( \{ x \mid \phi \bang(x) \})\] \[ \tag*{∗20·07} \forall \alpha \chi (\alpha) \eqdf \forall \phi \chi ( \{ x \mid \phi \bang(x) \})\] for \(\phi \bang\) a predicative function. Definition of quantification over “some classes” \[ \tag*{∗20·071} \exists \alpha \chi ( \alpha ) \eqdf \exists \phi \chi \{ x \mid \phi \bang(x) \}\] \[ \tag*{∗20·071} \exists \alpha \chi ( \alpha ) \eqdf \exists \phi \chi \{ x \mid \phi \bang(x) \}\] for \(\phi \bang\) a predicative function.

The definition of \(\in\) is extended to classes without change:

Definition of membership of a class in a function \[ \tag*{∗20·07} \alpha \in \psi \eqdf \psi \bang ( \alpha) \] \[ \tag*{∗20·07} \alpha \in \psi \eqdf \psi \bang ( \alpha) \] for \(\psi \bang\) a predicative function.

The remainder of ∗20 consists of theorems proving that the theorems of quantificational logic developed in ∗10 apply as well to expressions about classes, with the “Greek” variables \(\alpha, \beta, \ldots\) in the place of individual variables \(x, y, \ldots\). Because formulas with Greek variables look and behave the same as individual variables with respect to quantificational logic, it is possible to overlook the interaction of the theory of classes with the theory of types. As Gödel points out in the passage quoted above (Gödel 1944 [1951: 126]), the “contextual definitions” of class variables \(\alpha\), \(\beta\), etc., does not specify the elimination of class abstracts from all possible contexts, and in particular those that talk about classes. Linsky (2004) argues that PM has no notation for classes of propositional functions to distinguish them from classes of classes, although one could be added. This is another indication of the turn in PM after the intial sections (up to ∗21) to an extensional system of classes and relations.

In effect the class variables can be seen as propositional function variables, restricted to r-types in which only predicative functions appear, in arguments as well, leading to what might be seen as “hereditarily predicative functions”. In other words, the class variables can be replaced with propositional function variables in which the r-type of the function, and of all the arguments are of the form \((\beta_1, \beta_2, \ldots, \beta_m )/1\) and the same applies to \(\beta_1, \beta_2, \ldots, \beta_m\) as well. This means that variables and terms for classes will obey the simple theory of types. These can be contrasted with r-types by presenting an alternative system of simple types or “s-types”.

Church’s (1974) “Simple” Theory of Types \(\iota\) is the s -type for an individual.

-type for an individual. Where \(\tau_1 \ldots, \tau_m\) are any s-types, then \((\tau_1 \ldots, \tau_m)\) is the s-type of a propositional function of a m-ary propositional function which has arguments of r types \(\tau_1 \ldots, \tau_m\), respectively. The order of an entity in the system of s-types is defined as follows: the order of an individual (of r -type \(\iota\)) is 0

-type \(\iota\)) is 0 the order of a function of r-type \(\tau_1 \ldots, \tau_m\) is \(n+1\) where n is the greatest of the order of the arguments \(\tau_1 \ldots, \tau_m\) Church’s notion of “order” is not quite one that is familiar from talk of “first order logic” and “second order logic”. First order logic will have bound variables of s-type 0, and a logic which quantifies over variables of s-type 1, thus the familiar notion of “order” is one more than the highest order of any of the bindable variables in the s-type system.

It should be noted that the every s-type is also an r-type, namely one that is hereditarily predicative. Thus it might seem that the expressions of the theory of classes are all simply a special case of formulas of the full system of the ramified theory of types. This will be true of the assignment of types to variables, but it must be remembered that the entire formula \(\phi \{x \mid \psi (x) \}\) about a class is by definition

\[\exists \chi [ \forall x [ \chi (x) \equiv \psi (x)] \lamp \phi (\chi)].\]

All we have discussed so far is the relative types of \(\phi\) and \(\chi\). The Axiom of Reducibility guarantees that there is a predicative \(\chi\) co-extensive with any \(\psi\) in the defining condition of a class. To justify use of the class term \(\{x \mid \psi (x) \}\) one must then just show that there is some function that has the higher-order property \(\phi\). This is the step comparable to the proof that a definite description is proper, i.e., true of exactly one thing, that justifies using that description as a singular term.

Comparison of the Classes of PM with Axiomatic Set Theory

It is widely thought that the system of PM offers a very different approach to the solution of the paradoxes than that of axiomatic set theory as formulated in the Zermelo-Fraenkel system ZF. While the theory of types is thought of as a desperate attempt to save the logicist program by artifically introducing types in order to resolve the paradoxes, axiomatic set theory seems to simply postulate sets as entities and adopts axioms in a first order language with “\(\in\)” for membership as its one non-logical symbol. This view has been forcefully expressed by Quine:

Whatever the inconveniences of type theory, contradictions such as [the Russell paradox] show clearly enough that the previous naive logic needs reforming.…There have been other proposals to the same end—one of them coeval with the theory of types. [Quine cites Zermelo 1908.] But a striking circumstance is that none of these proposals, type theory included, has any intuitive foundation. None has the backing of common sense. Common sense is bankrupt, for it wound up in contradiction. (Quine 1951: 153)

However, both the view that type theory lacks intuitive support, and that type theory and axiomatic set theory are based on the same intuitions dates back to Gödel in 1933, referring to set theory as the “theory of aggregates”:

At least hitherto only one solution which meets these two requirements [of avoiding the paradoxes while retaining mathematics and the theory of aggregates] has been found.…This solution consists in the theory of [simple] types.…It may seem as if another solution were afforded by the system of axioms for the theory of aggregates, as presented by Zermelo, Fraenkel and von Neumann; but it turns out that this system of axioms is nothing else but a natural generalization of the theory of types, or rather, it is what it becomes of the theory of types if certain superfluous restrictions are removed. (Gödel 1933 [1995: 45–46])

The two “restrictions” that Gödel intends are the restriction that types are not cumulative and that the levels of types are limited to the natural numbers 0, 1,… n,…. Gödel suggests that one adopt a cumulative system of types in which a given type includes functions of all lower types (or orders), and the types extend beyond \(\omega\), \(\omega\) + 1, …\(\omega^{\omega}\), …, through all the ordinals. Such a “natural generalization” of the theory of types, he asserts, amounts to the same as Zermelo-Fraenkel set theory (ZF). Gödel’s claim is spelled out by George Boolos (1971) as the “iterative conception” of sets, which can be expressed fromally. If one thinks of sets as built up in stages, with each stage adding all sets of members of the last stage, and the process extending endlessly, then one can prove that the axioms of ZF set theory are indeed provable from the axioms of the theory of the “iterative conception” of sets. In turn the “iterative conception” relies on a strong intuition, contrary to what Quine says. It is the same intuition that underlies the hierarchy of types.

Following Boolos’ presentation of the “iterative conception of set” it seems that axiomatic set theory and PM do not differ widely, and express the similar intuitive notions of set that provide the same solution to the paradoxes.

Strictly as presented in PM, however, the no-classes theory differs significantly from ZF. The sentences of the PM theory are expressed in the theory of types, as opposed to the first order theory of ZF. ZF and PM cannot simply be compared in terms of their theorems. Not only are there different axioms in the two theories, but the very languages in which they are expressed differ in logical power. If we follow Gödel and Boolos, however, the two are seen to be based on the same intuitive basis, and the differences are seen as the same, barring certain “superfluous restrictions” on the theory of PM.

Relations in PM

∗21 extends the notion of class which is the extension of a one place propositional function to the comparable notion of a “Relation” for functions of two arguments with the analagous contextual definition.

Contextual definition of a relation in extension \[ \phi \{x ; y \mid \psi (x,y) \} \eqdf \exists \chi \left[ \begin{split} \forall x \forall y ( \phi (x,y) \equiv \psi (x,y) ) \\ {} \lamp \chi ( \lambda x \lambda y \psi (x,y)) \end{split} \right ] \tag*{∗20·01} \] \[ \phi \{x ; y \mid \psi (x,y) \} \eqdf \exists \chi \left[ \begin{split} \forall x \forall y ( \phi (x,y) \equiv \psi (x,y) ) \\ {} \lamp \chi ( \lambda x \lambda y \psi (x,y)) \end{split} \right ] \tag*{∗20·01} \]

(Note: The use of this unusual notation \(\phi \{x; y \mid \psi (x,y) \}\) in this one definition is meant to avoid the implication that a relation is interpreted as a set of ordered pairs, that would be represented by the contemporary notation \(\phi \{\langle x,y \rangle \mid \psi (x,y) \}\). The PM notation for propositional functions, as in \(\phi \hat{x}\) uses a caret over the variable where we would write \(\lambda x \psi(x)\). The PM notation for a class is \(\hat{x} \psi (x)\). A two-place propositional function is identified with variables also with carets: \(\phi (\hat{x} \hat{y})\) and the corresponding relation \(\hat{x} \hat{y} \phi (x,y)\). This notation does not identify relations as classes of ordered pairs, and that is how our blend of PM and contemporary notation in \(\phi \{x ; y \mid \psi (x,y) \}\) is to be taken.)

The introduction of Greek letters for classes in ∗20 and the use of “Roman letters” R, S, …in ∗21 for relations, marks a change in the notation used in PM. After ∗21 the letters \(\phi, \psi, \ldots\) rarely appear. As Quine remarks in his study of the logic of Whitehead and Russell, it would seem that after a certain point the body of PM makes use of extensional higher-order logic in a simple theory of types:

In any case there are no specific attributes [propositional functions] that can be proved in Principia to be true of just the same things and yet to differ from one another. The theory of attributes receives no application, therefore, for which the theory of classes would not have served. Once classes have been introduced, attributes are scarcely mentioned again in the course of the three volumes. (Quine 1951: 148)

Quine here hints at the view of PM that is widely shared among mathematical logicians, who see the ramified theory of types, with its accompanying Axiom or Reducibility, as a digression taking logic into a realm of obscure intensional notions, when instead logic, even if expressed in a theory of types, is extensional and is comparable to axiomatic set theory presented with a simple hierarchy of sets of individuals, sets of sets individuals, and so on.

It is certainly true that the the remainder of PM is devoted to the theory of individuals, classes, and relations (in extension) between those entities. Thus the ontology of these later portions is a hierarchy of predicative functions arranged in a simple theory of types. This has led one interpreter, Gregory Landini (1998), to argue that only predicative functions are values of bound variables in PM. What we have interpreted as variables ranging over possibly non-predicative propositional functions, \(\phi\), \(\psi\),… are for Landini only schematic letters, and are not bindable variables. The only bound variables in PM, he asserts, range over predicative functions. This is a strong version of a view that others such as Kanamori (2009) have expressed, going back to Ramsey (1931), namely that the introduction of the Axiom of Reducibility has the effect of undoing the ramification of the theory of types, at least for a theory of classes, and so a higher-order logic used for the foundations of mathematics ought to have only a simple type structure.

Our interpretation of this change in attention to classes and relations indicated by the shift in notation is that it indicates the extent to which the solution to the paradoxes, which required a ramified theory of (possibly intensional) propositional functions may have superceded a logic based on an unproblematic notion of class and mathematical functions and relations between them, that appeared in the body of The Principles of Mathematics before Russell’s attention was drawn to the paradoxes. In the summary of the later sections of PM that follows below, it will appear that in fact the symbolic development follows very closely that of PoM from ten years earlier. While we do not know much about the order in which sections of PM were composed, but it will appear from this change of attention from propositional functions to classes and relations, that the later parts are in fact an earlier stratum in the conceptual development of the project that started out as a symbolic “Volume II” to follow PoM.

To remind the reader of the change from talking of propositional functions to relations in extension, two further notational alterations are introduced. Greek letters such as \(\alpha\), \(\beta\), etc., but now y and x will be used as variables for ranging over classes as well. The individual variables which are ambiguous with respect to type, “typically ambiguous”, will now also range over classes. A function \(\phi\) of two variables x and y is indicated with the arguments in parentheses after the function variable: \(\phi(x, y)\). A two place relation R holding between x and y is written \(x \relR y\), with the R in “infix” position. The obvious limitation of this notation is that it is not readily extended to three place relations, adding a third variable, say z. We will follow the practice in PM and write \(x \relR y\) for binary relations. PM only requires binary relations for most of the three volumes, although the projected volume IV on geometry would need a notation for “x is between y and z”, as can be seen from Henry Sheffer’s unpublished notes from Russell’s lectures on geometry from 1910 at Cambridge. There he uses the notation \(y\rels{B}(x,y)\) which blends the two styles.

The Algebra of Classes

The notions of the subset relation and the intersection and union of sets are defined in PM exactly as they are now (albeit with different terminology). The complement of a set and the universal class \(\rV\) are not allowed in set theory, and rejected as “proper classes”. In PM, as they only are a set of entities of a given type \(\tau\), they form a set of the next higher type, \((\tau)/1\). The complement of a set of a given type is the set of all entities (of that type) that are not in the set. Each empty set will be the complement of the universal set (of a given type \(\tau\) ) and so there will be the empty set of type \(\tau\).

\[ \begin{align} \alpha \subseteq \beta & \eqdf \forall x (x \in \alpha \supset x \in \beta) \tag*{∗22·01}\\ \alpha \cap \beta & \eqdf \{x \mid ( x \in \alpha \lamp x \in \beta ) \tag*{∗22·02}\\ \alpha \cup \beta & \eqdf \{ x \mid (x \in \alpha \lor x \in \beta ) \tag*{∗22·03}\\ \end{align} \]

The type subscript \(\tau\) is added below as a reminder that the notions of universal set \(\rV\) and complement are each with regard to a given type (and so an empty set \(\emptyset\) will recur in each type.)

The Universal Class and the Empty Class

\[ \begin{align} - \alpha &\eqdf \{ x_{\tau} \mid \lnot (x \in \alpha )\}\tag*{∗22·04}\\ {\alpha - \beta} & \eqdf {\alpha \cap {- \beta}}\tag*{∗22·05}\\ \end{align} \]\[\tag*{∗24·01} \rV_{\tau + 1} \eqdf \{ x_{\tau} \mid (x = x) \} \]

The subscript on ‘\(\rV\)’ indicates that the universe of classes of a given (simple) type \(\tau\) will be a member of the next type. There is no class of all classes of whatever type. This is in common with axiomatic set theory which holds that there is no set of all sets.

Mathematical functions in PM

\[\tag*{∗24·02} \emptyset_{\tau} \eqdf - \rV_{\tau} \]

The logic of PM is based on propositions, propositional functions and relations in extension, unlike Frege’s which deals with objects, in particular, truth values, and functions, with the special case of concepts, which are functions from objects to truth values. PM reduces mathematical functions to “functional relations” in a way that is familiar from elementary courses. If there is binary relation which has a unique second argument for each first argument, i.e.,

\[\forall x \exists y [x \relR y \lamp \forall z (x \relR z \supset z = y)]\]

then one can introduce a new function symbol \(f_R\), such that

\[\forall x \forall y (x\relR y \equiv f_R(x) = y).\]

Similarly for an \(n+1\) place relation for each \(x_1\), …, \(x_n\) there is a unique y such that \(R(x_1, \ldots, x_n, y)\), then one can introduce an n-place function g mapping \(x_1, \ldots, x_n\) onto y. In PM the expressions for mathematical functions are definite descriptions, referring to the last argument of a relation as the “value” of the function described by that relation. We will use the expression \(f_R\) to refer to the functional term referring to the function derived from a relation R. PM uses the explicit definite description“the R of y” where we would use the functional expression \(f_R\). The definition of a monadic functional term then is:

\[ \tag*{∗30·01} f_Ry \eqdf (x)(x \relR y) \]

with the general form for an n-place functional term g derived from an \(n+1\) place relation S (following Russell’s notation in lectures):

\[g_S(x_1, \ldots, x_n) \eqdf (y )(x_1 S x_2, \ldots, x_n, y)\]

(The diligent reader will find that this presentation does not follow PM exactly. The example “the father of” based on a relation R expressing “x is the father of y” would make “the R’ of x” actually refer to the unique x which is the father of y, and so what has been explained above is appropriate to the converse of that relation, \(\relbR\). The practice of reading the argument of a relational function as the x and the value as the y is so well established that we have taken a liberty with the actual definitions in PM.)

Recall that from this point on in PM, the relations are to be considered as “relations in extension” and so it is easy to see how one can treat the relations as ordered \(n+1\)-tuples of which the last member is unique given the first n arguments. In particular, a monadic function f can be seen in the familiar way as a set of ordered pairs (of \(\langle x, f_R(x) \rangle\)) for each argument x in the domain of the function.

Given the treatment of “relations” as “relations in extension” it is no accident that the development of the logic of relations in ∗30–∗38 looks familiar to contemporary logicians, with even some of the notation from PM surviving into contemporary usage. A series of notions are defined in a way quite familiar to the modern treatment of relations as sets of n-tuples:

The Converse of a Relation

\[ \tag*{∗31·02} {\relbR} = \{\lambda x \lambda y (y \relR x)\} \]

or, in terms of pairs:

Domains, Ranges and Fields of Relations

\[{\relbR} = \{ \langle x, y \rangle \mid (y \relR x) \}\]

The notions of the domain and range of a relation are also given a contemporary definition (and so also the notions of the domain and range of a function).

\[ \begin{align} \tag*{∗33·11} \Domain (R) &\eqdf \{x \mid \exists y ( x \relR y ) \}\\ \tag*{∗33·111} \Range (R) &\eqdf \{y \mid \exists x (x \relR y) \}\\ \end{align} \]

Note that it is possible that a relation can its domain in one type and range in another. This adds complications in the theory of cardinal numbers when a relation of similarity (equinumerousity) holds between classes of different types. (See the discussion of ∗100 below.)

The Product of Two Relations

The composition of relations R and S is called their relative product and uses a different symbol \(R\mid S\) where we write \(R \circ S\):

Restricted Relations

\[ \tag*{∗34·01} R \circ S \eqdf \lambda x \lambda z \{ \exists y ( x \relR y \lamp y \relS z ) \} \]

In the case of the restriction of a relation R to a particular class \(\beta\), the terminology from PM is now retained with the same definition:

\[ \tag*{∗35·02} R \upharpoonright \beta \eqdf \lambda x \lambda y (x \relR y \lamp y \in \beta) \]

In his survey of PM, Quine (1951: 155) complains that this last 100 pages of Part I is occupied with proving theorems relating redundant definitions of the same notions. Thus PM defines the notion of domain and range and then introduces notions that again define the same classes, which are proved to be equivalent. PM defines the notation of ‘\(R\pmdq\beta\)’ to be read as “the terms which have the relation R to members of \(\beta\)” and uses the example:

If \(\beta\) is the class of great men, and R is the relation of wife to husband, \(R\pmdq\beta\) will mean “wives of great men”. (PM, 278)

In contemporary logic with the notation of set theory used above, there is no need for a special symbol for this notion, as it is written as:

Products and Sums of Classes of Classes

\[ \tag*{∗37·01} R\pmsq \pmsq \beta \eqdf \{ x \mid \exists y (y \in \beta) \lamp x \relR y \} \]\[ \tag*{∗40·01} {} \cap \alpha \eqdf \{ x \mid \forall \beta (\beta \in \alpha \supset x \in \beta) \} \]

This is the intersection of \(\alpha\).

\[ \tag*{∗40·02} {} \cup \alpha \eqdf \{ x \mid \exists \beta (\beta \in \alpha \lamp x \in \beta) \} \]

is the union of \(\alpha\).

4.2 Part II: Prolegomena to Cardinal Arithmetic

The Cardinal Number 1

\[ \tag*{∗52·01} 1 \eqdf \{ \alpha \mid \exists x ( \alpha = \{x \} ) \} \]

So the cardinal number 1 is the class of all singletons. There will be a different number 1 for each type of x. Frege, by contrast, defines the natural number 1 as the extension of a certain concept, namely being identical with the number 0, which itself is the extension of the (empty) concept of not being self identical. In axiomatic set theory the natural numbers are particular finite ordinals, in particular the series with 0 as \(\emptyset\), 1 is \(\{0 , \{ 0 \} \}\), 2 is \(\{1, \{1 \} \}\), and so on. This construction is named the von Neumann ordinals.

Pairs

\[ \tag*{∗54·02} 2 \eqdf \{ \alpha \mid \exists x \exists y (x

eq y \lamp \alpha = \{y \} \cup \{ x \} ) \} \]

Similarly, the number 2 is the class of all pairs, rather than a particular pair. In the type theory of PM there will be distinct couples for the types of y and x. When they are of the same type the couple is called “homogenous”. Even with homogenous pairs there will be distinct classes of pairs for each type, and thus a different number 2 for each type. The same notion applies to relations.

Ordered Pairs

The notion of an ordered pair, called an “ordinal couple” is defined as:

\[ \tag*{∗55·01} \langle x, y \rangle \eqdf \textrm{ the extension of } \lambda x \lambda y (x \in \{x \} \lamp y \in \{y\}) \]

The idea is that the order of the relation \(\lambda x \lambda y (x \in \{x\} \lamp y \in \{y\})\) determines the first and second element of the ordered pair. It is a relation in extension, which is the analogue of a property in extension or class. A relation in extension has a distinction between the first and second elements due to the order of the defining relation. The closest in contemporary language would be:

\[\phi \langle x, y \rangle \eqdf \exists \psi \forall u \forall v ( \psi (u, v) \equiv \lambda x \lambda y [ x \in \{x\} \lamp y \in \{y\} ] (u, v) \lamp \phi (\psi) ) \]

Given the definition of extensions of relations this is the version of the no-classes theory for relations. After attending classes of Russell the year before, and having several discussions, Norbert Wiener (1914) proposed the following definition (in modern notation):

\[\langle x, y \rangle \eqdf \{\{\{ x\}, \emptyset \}, \{\{ y\}\}\} \textrm{ where } \emptyset \textrm{ is the empty set.}\]

Wiener’s accomplishment was to capture the ordering of the pair which in PM is captured by the ordering of the arguments of relations with the unordered notion of set membership.

The end of PM to ∗56

The paperback abridged edition of PM to ∗56 only goes this far, so the remaining definitions have only been available to those with access to the full three volumes of PM.

Relative Types

This section presents a discussion of relations between individuals of distinct types, introducing a notation for types, \(t\pmsq x\) for the type to which x belongs. This section is little used in Volume I. The special consequences for this notion when dealing with relative types of cardinal numbers is the topic of the Preface to Volume II, which was added after the first volume was already in print. The delay due to working out these details partially explains the three year gap between the publication of Volume I in 1910, and the remaining volumes II and III in 1913. Section ∗65 (On the Typical Definition of Ambiguous Symbols), is a discussion of typical ambiguity, the ambiguity of variables with respect to type.

\[ \tag*{∗70·01} f: \alpha \rightarrow \beta \eqdf \]

The functions f from \(\alpha\) onto \(\beta\), that is, the \(\Domain (f) = \alpha\) and \(\Range (f) = \beta\)

Similarity of Classes

\[ \tag*{∗73·01} \alpha \approx \beta \eqdf (\exists f) f : \alpha \stackrel{1-1}{\longrightarrow} \beta. \]

There is a one-one function mapping \(\alpha\) onto \(\beta\) (similarity of \(\alpha\) and \(\beta\)). Contemporary discussions say that \(\alpha\) and \(\beta\) are equinumerous. Difficulties arise with respect to the definition of cardinal numbers when the relation of similarity they involve is one that has a domain and range in different types. See ∗100 below.

The main theorem in this chapter is a proof of the Cantor-Bernstein theorem, that if a set \(\alpha\) is similar to a subset z of another set \(\beta\) and \(\beta\) is similar to a subset \(\delta\) of \(\alpha\) then \(\alpha\) and \(\beta\) are themselves similar:

\[ \forall \alpha \forall \beta \forall \gamma \forall \delta \left[\left ( \begin{split} \alpha \approx \gamma & {}\lamp \beta \approx \delta \\ &{} \lamp \gamma \subseteq \beta \\ &{} \lamp \delta \subseteq \alpha \\ \end{split} \right) \supset \alpha \approx \beta \right] \tag*{∗73·88} \]

The proof here explicitly follows the proof by Ernst Zermelo from 1908. Whitehead and Russell call this the “Schröder-Bernstein” theorem.

The Axiom of Choice (Multiplicative Axiom)

The Multiplicative Axiom, or “Axiom” of Choice, is not an axiom of PM, what is termed a “primitive proposition”, but is instead a defined expression that is added as an hypothesis to theorems for which it is used. This reflects the emerging awareness at the time of the role of the Axiom of Choice in various proofs, in particular, Zermelo’s proof of that every class can be well-ordered.

\[{}\ \begin{aligned}&\textrm{Multiplicative}\\&\textrm{Axiom}\end{aligned} \eqdf \forall \alpha \left\{ \begin{split} &\forall \beta \forall z \left [ \left( \begin{split} \beta & \in \alpha \lamp {}\\ z & \in \alpha \lamp {}\\ \beta &

eq z \end{split} \right) \supset (\beta \cap z = \emptyset) \right] \supset \\ & \exists \beta \forall z \left[ \begin{split} & \alpha \in z \supset \alpha \cap {} \\ &\beta \in \{ \sigma \mid \exists \rho (\sigma = \{\rho \} )\} \end{split} \right] \end{split} \right\} \tag*{∗88·03} \]

If \(\alpha\) is a class of mutually exclusive classes, then there is a (“choice”) set \(\beta\) such that the intersection of \(\beta\) with each member of \(\alpha\) is a unique member of \(\beta\).

\(\Rast\) The Ancestral Relation

\[ \quad\Rast \eqdf \left \{ \begin{split} & \langle x, y \rangle \mid (\exists u x\relR u \lor \exists u uRx ) \lamp {} \\ & \forall \alpha \left [ \left[ \begin{split} & x \in \alpha \lamp {}\\ &\forall z \forall w (z \in \alpha \lam