Sparked by Arnold Neumaier’s work on “a computer-aided system for real mathematics,” we had a very stimulating discussion, mostly about various different foundations for mathematics. But that thread started getting rather long, and John pointed out that people without some experience in formal logic might be feeling a bit lost. In this post I’ll pontificate a bit about logic, type theory, and foundations, which have also come up recently in another thread. Hopefully this too-brief summary of background will make the discussion in the next post about the meaning of structuralism more readable. (Some of this post is intended as a response to comments on the previous discussions, but I probably won’t carefully seek them out and make links.)

At the most basic level, what we do when we do mathematics is manipulate symbols according to specified rules. Just as in chess the rules state that a knight moves like so and not like so, in mathematics the rules state that a quantifier can be eliminated like so and not like so. (The opinion that this is all there is to mathematics is called formalism, but I’m not espousing that. You are free to believe that these symbols have whatever meaning you want, but even the most Platonist mathematician does mathematics by manipulating symbols on paper.)

Now, the actual rules of the game of mathematics are extremely complicated. But the idea of foundations is to derive these complicated rules from a much simpler list of fundamental rules. I think many people would agree that it’s reasonable to take these fundamental rules to be some sort of calculus of terms. That is, the game is played with sequences of symbols called terms, and the rules specify what terms are valid and what assertions (called judgments) can be made about them. Here are some sample terms:

x 2 − y 2 x^2-y^2

∫ 0 ∞ e − x 2 dx \int_0^\infty e^{-x^2}\;dx

𝒫 ( 𝒫 ( ℝ ) ) \mathcal{P}(\mathcal{P}(\mathbb{R}))

∀ n ≥ 3 . ∀ x . ∀ y . ∀ z . x n + y n ≠ z n \forall n\ge 3. \forall x.\forall y. \forall z. x^n + y^n

eq z^n

Now many people (myself included) believe that there is more than one fundamentally different type of term. For instance, ∫ 0 ∞ e − x 2 dx \int_0^\infty e^{-x^2}\;dx is a real number, 𝒫 ( 𝒫 ( ℝ ) ) \mathcal{P}(\mathcal{P}(\mathbb{R})) is a set, and ∀ n ≥ 3 . ∀ x . ∀ y . ∀ z . x n + y n ≠ z n \forall n\ge 3. \forall x.\forall y. \forall z. x^n + y^n

eq z^n is a claim or a theorem, and it’s illogical to confuse them (and makes it harder for a computer to guess what you meant). Thus, the rules should define one or more types, and one of the judgments should be of the form “ t t is a well-formed term of type A A ,” written as t : A t:A . But if you don’t believe in types, you can just use a version of these rules where there is only one type, call it (say) Thing Thing , and read t : Thing t:Thing as merely “ t t is a well-formed term.”

In this way we can build many different “type theories”. Most familiar ones include a special type called Prop Prop of “propositions”, and a special type of judgment which asserts that a given proposition (i.e. a term of type Prop Prop ) is “true.” We generally also include “logical inference” rules for constructing complex propositions (e.g. if φ : Prop \varphi:Prop then ( ∀ x : A . φ ) : Prop (\forall x:A. \varphi) :Prop ) and deducing the truth of propositions (e.g. if φ \varphi and ψ \psi then also φ ∧ ψ \varphi\wedge\psi ). We may also include “axioms”, i.e. propositions φ \varphi such that the judgment “ φ \varphi is true” can be made unconditionally. For example, in Peano arithmetic, there are two types N N and Prop Prop , a constant term 0 : N 0:N , a rule that if n : N n:N then s ( n ) : N s(n):N , a rule that if n : N n:N and m : N m:N then ( n = m ) : Prop (n=m):Prop , the usual rules of logical inference, and various axioms.

Now if we want a foundation for all of mathematics, of course we need a more powerful theory than these. But ZFC can also be stated in this style: it has two types Set Set and Prop Prop , rules saying that if x : Set x:Set and y : Set y:Set then ( x = y ) : Prop (x=y):Prop and ( x ∈ y ) : Prop (x\in y):Prop , the usual rules of logical inference and a collection of axioms. The same with ETCS, which it is convenient to write with three types Set Set , Function Function , and Prop Prop . Especially when we intend a theory like ZFC or ETCS as a foundation for all of mathematics, it is convenient to call the type-theoretic language in which these theories are written the “meta-language” or “meta-theory,” while ZFC/ETCS is the “object language” or “object theory.”

Things get a bit confusing because there are various ways to augment type theories. It’s common to include type constructors, which allow us to form new types such as A × B A\times B out of types A A and B B . Then we need an additional judgment T : Type T:Type , so that such a type-constructor is a rule saying that (for example) if A : Type A:Type and B : Type B:Type then ( A × B ) : Type (A\times B):Type . We can also have type-constructors which take terms as input, in addition to types; these are called dependent types. When we include exponential or dependent-product types, the complexity and power of the type-theoretic meta-language begins to approach that of the object languages ZFC and ETCS, enabling it itself to serve as a foundation for mathematics. That is, instead of defining a group to be a set equipped with (among other things) a function G × G → G G\times G\to G , we could interpret a group as a type G G equipped with (among other things) a term m ( x , y ) : G m(x,y):G with free variables x : G x:G and y : G y:G . Or, which amounts to the same thing, we could change our terminology so that what we have been calling “types” are instead called “sets.” In fact, as Toby has pointed out, it is natural to say that a type theory (especially one with many type-constructors) is itself the object-theory in a meta-meta-theory having meta-types (or “kinds”) such as Type Type , Term Term , and Judgment Judgment .

The point is that words like “type” and “set” and “class” are really quite fungible. This sort of level-switch is especially important when we want to study the mathematics of type theory, i.e. the mathematical theory of manipulating symbols according to the rules of type theory, analogous to the mathematical theory of moving pieces around on a chessboard according to the usual rules. When we study type theory in this way, we are doing mathematics, just like when we’re doing group theory or ring theory or whatever. It’s just that our objects of study are called “types”, “terms”, and so on. However, what we do in this mathematical theory can, like any other area of mathematics, be formalized in any particular chosen foundation, be it ZFC or ETCS or a type theory at a higher level. Now the type theory is itself the “object-theory” and ZFC is the “meta-theory”!

That was syntax, the rules of the game; semantics is about its “meaning.” Intuitively, we generally think of a type A A as denoting some “collection” of “things”, and a term t : A t:A as indicating a “particular one” of those things. In order for this to make sense, the type theory has to exist in some metatheory (which might or might not be formalized) having a notion of “set” to specify the relevant “collections of things”. In particular, there must be a set of types, and for each type there is a set of terms which can be judged to be of that type. The judgment rules for propositions then become the study of formal logic; we say that a proposition is “provable” or is a “theorem” if it can be judged to be true.

Now, a model of this theory assigns a set [ A ] [A] (in the meta-theoretic sense) to every type A A and a function of appropriate arity to every term, in a way so that the rules and axioms are satisfied. Thus, for instance, a model of Peano arithmetic consists of a set [ N ] [N] , an element [ 0 ] ∈ [ N ] [0]\in [N] , a function [ s ] : [ N ] → [ N ] [s]\colon [N]\to[N] , and so on. Likewise, a model of the type theory of ZFC (here the levels get confusing) consists of a set [ Set ] [Set] , a function [ ∈ ] : [ Set ] × [ Set ] → [ Prop ] [{\in}]\colon [Set]\times [Set] \to [Prop] , and so on.

One can then prove, under certain hypotheses, various things about the relationship between syntax and semantics, such as:

The Soundness Theorem: if φ \varphi is a proposition which is provable from the axioms of a theory, then the corresponding statement [ φ ] [\varphi] in any model is actually true (in the sense of the metatheory). Equivalently, if a theory has at least one model, then it doesn’t prove a contradiction.

is a proposition which is provable from the axioms of a theory, then the corresponding statement in any model is actually true (in the sense of the metatheory). Equivalently, if a theory has at least one model, then it doesn’t prove a contradiction. The Completeness Theorem: if [ φ ] [\varphi] is true in every model of a theory, then φ \varphi is provable in that theory. Equivalently, if a theory doesn’t prove a contradiction, then it has at least one model.

is true in every model of a theory, then is provable in that theory. Equivalently, if a theory doesn’t prove a contradiction, then it has at least one model. The (first) Incompleteness Theorem: if a theory doesn’t prove a contradiction, then there exist statements φ \varphi such that neither φ \varphi nor ¬ φ

ot\varphi is provable in the theory.

such that neither nor is provable in the theory. Corollary to the completeness and incompleteness theorems: if a theory doesn’t prove a contradiction, then it has more than one model.

The “certain hypotheses” is where we get into the difference between first-order and higher-order. We say that a type theory is higher-order if it involves type constructors such as function-types B A B^A (intended to represent the “type of all functions A → B A\to B ”) or power-types P A P A (intended to represent the “type of all subtypes of A A ). Otherwise it is first-order. (We have to deal with Prop Prop specially in first-order logic. If we actually have a type Prop Prop , then the theory should be higher-order, since Prop ≅ P 1 Prop \cong P 1 ; thus in first-order logic we take Prop Prop to be a “kind” on the same level as Type Type , which doesn’t participate in type operations.) We say “second-order” if we never iterate the power-type operation.

The Soundness Theorem is true for all theories, but the Completeness Theorem is true only for first-order theories. I believe that the Incompleteness Theorem as I have stated it is true for higher-order theories (if I’m wrong, someone please correct me), but the corollary fails since the completeness theorem does. In particular, a higher-order theory can sometimes be categorical in the logician’s sense: having exactly one model (at least, up to isomorphism). The second-order version of Peano Arithmetic has this property.

Now at the level we’re talking about, it seems that there is no fundamental difference between first-order and higher-order theories; they each have advantages and disadvantages. However, when we move up to the metalevel and talk about the term calculus itself, we always get a first-order theory. This is what I was trying to get at when I said elsewhere that higher-order logic has “no foundational significance”. The point is that what we do when we do mathematics is manipulate symbols on paper, and that is a first-order notion.

In particular, higher-order logic doesn’t allow you to escape any of the philosophical consequences of the Incompleteness Theorem. You are free to believe in a Platonic collection of “actual” or “standard” natural numbers. (I don’t really, myself, but I can’t argue you out of such an essentially metaphysical belief.) Now by the corollary to the Incompleteness Theorem, first-order Peano Arithmetic can’t capture those natural numbers uniquely; there will always be alternate models containing “nonstandard” natural numbers. By contrast, in second-order logic, the second-order versions of the Peano axioms can be shown to have a unique model. However, this second-order metatheory involves basic notions like “power-types” or “power-sets,” so really what you’ve done with this proof is to explain the notion of “natural number” in terms of the notion of “set,” which is (I think most people would agree) far more ontologically slippery! And indeed, when your second-order metatheory is interpreted in some meta-meta-theory, it will have lots of different models, each of which has its own “unique” natural numbers. You are, of course, also free to believe in a Platonic notion of “sets,” but first-order axioms such as ZF can’t characterize those either. There may be a “second-order” version of ZF which uniquely characterizes a model, but you’ve just explained the notion of “set” in terms of the notion of “class,” which is not much of an improvement. The initial ZF-algebra in a category of classes is a similar idea.

To put it another way, suppose that you have some Platonic universe of numbers, sets, and so on in mind, but I’m an alien from another dimension who has a different such Platonic universe in mind. If I persist in interpreting all of your words like “number,” “set,” and so on as referring to things in my Platonic universe, there’s no way you can convince me that I’m wrong, no matter whether the logic in which we speak is first-order or higher-order.

Now there’s another way to build a “canonical” model of a theory, which is how one usually proves the Completeness Theorem: we make a “tautological” model out of the theory itself. That is, for each type A A we simply take the set [ A ] [A] to be the set of terms of type A A with no free variables (or “ground terms”). Without modification, this naive idea fails for two reasons.

First of all, there might not be enough ground terms. Some of the axioms of the theory might assert that there exists something with some property, without there being a corresponding term constructor actually producing something with that property. This is obviously the case for the usual version of ZFC, which has no term constructors at all (hence no ground terms at all!) but lots of axioms that assert the existence of things. This problem is easily remedied, however, by introducing new constant terms or term constructors into the language.

The second problem is that we may not know how to define all the necessary relations on the ground terms in order to have a model. Suppose, for instance, we have a couple of ground terms t 1 t_1 and t 2 t_2 in some augmented version of ZFC; how can we tell whether t 1 ∈ t 2 t_1\in t_2 should hold in our tautological model? Certainly if the axioms of the theory imply t 1 ∈ t 2 t_1\in t_2 , then it should hold, and if they imply t 1 ∉ t 2 t_1

otin t_2 , then it shouldn’t, but they might not imply either one. The usual way to remedy this is to enumerate all such statements and one by one decide arbitrarily whether to make them true or false in the model we’re constructing.

This works, but the model we get (though small, even countable, and concrete) isn’t really canonical; we had to make a bunch of arbitrary choices. In the case of Peano Arithmetic, we can avoid introducing new constant terms and obtain a model which is “canonical” and in fact the “smallest” in some sense: it consists of the terms s ( s ( … ( s ( 0 ) ) … ) ) s(s(\dots(s(0))\dots)) , which can of course be identified with “the” natural numbers in the meta-theory. (Note that talking about “the set of terms” always depends on the meta-theory having something that it calls the natural numbers, so that terms can be defined inductively.) But I don’t think this is true for many other theories. Suppose, for instance, that we augment ZF with term constructors for all of its existence axioms. Let φ \varphi be a sentence independent of ZF; then our term-constructor for the axiom of separation gives us a term { ∅ | φ } \{\emptyset | \varphi\} . Does the relation ∅ ∈ { ∅ | φ } \emptyset \in \{\emptyset | \varphi\} hold in the term model? We have to make an arbitrary choice.

There’s a slicker categorial approach which does produce a really canonical model, but only with an expanded notion of “model”: instead of each [ A ] [A] being a set, we take it to be an object of some fixed category 𝒮 \mathcal{S} with enough structure. We can then build a much more “tautological” model because we have the freedom to build the category 𝒮 \mathcal{S} along with the model. In the resulting model, the true statements are precisely the statements provable in the theory, and it’s even initial among all models of the theory in the appropriate sort of category.

In conclusion: