The discussion on Tom’s recent post about ETCS, and the subsequent followup blog post of Francois, have convinced me that it’s time to write a new introductory blog post about type theory. So if you’ve been listening to people talk about type theory all this time without ever really understanding it—or if you’re a newcomer and this is the first you’ve heard of type theory—this post is especially for you.

I used to be a big proponent of structural set theory, but although I still like it, over the past few years I’ve been converted to type theory instead. There are lots of reasons for this, the most important of which I won’t say much about until the end. Instead I mostly want to argue for type theory mainly from a purely philosophical point of view, following on from some of the discussion on Tom’s and Francois’ posts.

I do want to emphasize again some things that sometimes get lost sight of (and which I sometimes lose sight of myself in the heat of a discussion). All foundational systems are based on valid philosophical points of view, and have their uses. The real reason for preferring one foundation over another should be practical. Type theory, of course, has lots of practical advantages, including a computational interpretation, direct categorical semantics, and especially the potential for a homotopical version; but other foundational choices have different practical advantages. But aside from practical concerns, philosophy is interesting and fun to talk about, and can be helpful when getting used to a new way of thinking when trying to learn a new foundational system—as long as we all try to keep an open mind, and focus on understanding the philosophy behind things that are unfamiliar to us, rather than insisting that they must be wrong because they’re different from what we’re used to.

My goal for this post is to start from material set theories (like ZFC) and structural set theories (like ETCS), and show how type theory can arise naturally out of an attempt to take the best of both worlds. At the end, I’ll argue that to really resolve all the problems, we need univalent type theory.

The basic objects of ZFC and ETCS are both called “sets”, but they behave so differently that it can be confusing to use the same name for both. For clarity, in this post I’ll call them material-sets (for ZFC) and structural-sets (for ETCS). There are a lot of differences between the two, but personally, I think one very important one is that although both kinds of sets have elements, their manner of “having elements” is different.

If X X is a material-set, then for any other thing A A , we can ask whether A ∈ X A\in X . The statement A ∈ X A\in X is a proposition in the logician’s sense: something which can be either true or false, which can be assumed as a hypothesis and proven as a conclusion, negated, combined with other propositions using conjunction, disjunction, and implication, quantified over, and so on. (In ZFC, the only things that exist are material-sets, so that A A in the proposition “ A ∈ X A\in X ” must itself be a material-set. However, some material-set theories include “atoms” which are not sets, but which can be elements of material-sets.) (Also, most material-set theories are extensional in that a material-set X X is uniquely determined by knowing for which A A we have A ∈ X A\in X . This is a natural assumption if ∈ \in is the only available notion.)

By contrast, if X X is a structural-set, then there are some things which by their very nature are elements of X X . (In ETCS, those things are the functions with domain 1 1 and codomain X X .) If a thing is not given to you as an element of X X , then it isn’t an element of X X , and no thing can be given to you as an element of more than one structural-set. The statement A ∈ X A\in X is not something that you would ever think about proving about two pre-existing things A A and X X , or assuming as a hypothesis except at the same time as you introduce A A as a new variable (as a way of saying “this is the sort of thing that A A is”). That is, to be maximally faithful to its usage in mathematics, A ∈ X A\in X should not be considered as a proposition.

To illustrate the difference, consider a statement like “for all x ∈ ℝ x\in \mathbb{R} , x 2 ≥ 0 x^2\ge 0 ”. If ℝ \mathbb{R} is a material-set, then this statement may be (and generally is) interpreted as a shorthand for “for all things x x , if x ∈ ℝ x\in \mathbb{R} , then x 2 ≥ 0 x^2\ge 0 ”. But if ℝ \mathbb{R} is a structural-set, then the quantifier “for all x ∈ ℝ x\in \mathbb{R} ” is an atomic operation of logic: since “ x ∈ ℝ x\in \mathbb{R} ” is not a proposition, it cannot be the hypothesis of an if-then statement.

Personally, I think this aspect of structural-set theory matches mathematical practice more closely. When I say “for all x ∈ ℝ x\in \mathbb{R} , x 2 ≥ 0 x^2\ge 0 ”, I regard it as a property of every real number that its square is nonnegative. I don’t regard it as a property of every thing in mathematics that if it happens to be a real number, then its square is nonnegative. Under the material-set theory reading, one particular instance of “for all x ∈ ℝ x\in \mathbb{R} , x 2 ≥ 0 x^2\ge 0 ” is the statement “if the monster simple group happens to be a real number, then its square is nonnegative”. I wouldn’t expect most mathematicians to naturally regard that statement as part of the content of “for all x ∈ ℝ x\in \mathbb{R} , x 2 ≥ 0 x^2\ge 0 ”.

On the other hand, sometimes in mathematics, it does make sense to regard “ A ∈ X A\in X ” as a proposition. For instance, if L L is the set of complex numbers with real part 1 2 \frac{1}{2} , then a lot of people would really like to prove that “for all z ∈ ℂ z\in \mathbb{C} , if ζ ( z ) = 0 \zeta(z)=0 and z z is not a negative even integer, then z ∈ L z\in L ”. Note the difference between the two uses of “ ∈ \in ” in this statement. The quantifier “for all z ∈ ℂ z\in \mathbb{C} ” is naturally treated as atomic, with z z being given as a complex number, rather than as an arbitrary thing with the additional hypothesis that it happens to be a complex number. On the other hand, the statement “ z ∈ L z\in L ” is a proposition, which might be true or false for any particular z z , and is susceptible of proof or disproof (as z ∈ ℂ z\in \mathbb{C} is not). Thus, in this statement we see ℂ \mathbb{C} behaving like a structural-set, while L L behaves like a material-set.

The crucial observation here is that L L is a subset of ℂ \mathbb{C} . Thus, since z z is given as an element of the structural-set ℂ \mathbb{C} , we can ask whether it happens to belong to this subset L L . This situation is generic, and it gives us a reasonable way to combine structural-sets and material-sets in a single theory. Namely,

Material-sets are subsets of structural-sets.

Thus, an expression like A ∈ X A\in X has two different meanings depending on what X X is. If X X is a structural-set, then “ A ∈ X A\in X ” is not a proposition; instead it is a way of describing how A A is given. On the other hand, if A A is given as an element of some structural-set Z Z , of which X X is a subset, then “ A ∈ X A\in X ” is a proposition. We might refer to these two meanings of ∈ \in as the “quantifier-bounding membership” and the “propositional membership”. In my experience, this way of thinking captures the vast majority of usages of sets in mathematics.

From this perspective, we can see that some of the awkwardnesses of ZFC and ETCS arise from forcing one kind of set into the mold of the other. In ZFC, there are no structural-sets, forcing us to interpret the structural membership expression in terms of the material one, as above. On the other hand, in ETCS there are only structural-sets, forcing us to interpret the material membership proposition in terms of these. (Specifically, we define a “subset” of a structural-set Z Z to be a structural-set X X equipped with an injective function i : X → Z i:X\to Z . Then if a : 1 → Z a:1\to Z is an element of Z Z , we define “ a ∈ X a\in X ” to mean “there exists an element a ′ : 1 → X a':1\to X such that i ∘ a ′ = a i\circ a' = a ”.)

Rather than try to resolve these inelegancies directly, however, I want to argue that they are merely symptoms of a deeper problem afflicting both material and structural set theories: an impoverishment of mathematical objects. In ZFC, everything is a material-set—even the elements of other material-sets. In ETCS, everything is either a structural-set or a function between such (with elements of structural-sets being particular functions). Other set theories have different “basic objects” — for instance, we can modify ZFC to contain “atoms” as well as material-sets, while in SEAR the basic objects are structural-sets, elements of structural-sets, and relations between them. But in all cases, the list is very short, forcing us to “encode” other mathematical objects in terms of these. Sometimes the encoding is more natural materially, sometimes it is more natural structurally, and sometimes it is unnatural in either one.

On one hand, I think power sets seem more natural materially. In ZFC, the elements of the power set 𝒫 ( ℕ ) \mathcal{P}(\mathbb{N}) are themselves subsets of ℕ \mathbb{N} . As such, we can ask, for any X ∈ 𝒫 ( ℕ ) X\in \mathcal{P} (\mathbb{N}) and n ∈ ℕ n\in \mathbb{N} , whether n ∈ X n\in X . (Note that the first two ∈ \in ’s are quantifier-bounding, while the third is a proposition). By contrast, in ETCS, the elements of 𝒫 ( ℕ ) \mathcal{P}(\mathbb{N}) are not themselves, intrinsically, subsets of ℕ \mathbb{N} . Instead they are “codes” for subsets of ℕ \mathbb{N} , and there is a relation between ℕ \mathbb{N} and 𝒫 ( ℕ ) \mathcal{P}(\mathbb{N}) called “membership” which has the expected properties.

On the other hand, I think the natural numbers seem more natural structurally. In ETCS, the natural numbers object is a structural-set ℕ \mathbb{N} equipped with an element 0 0 and a function s : ℕ → ℕ s:\mathbb{N}\to \mathbb{N} satisfying the laws of induction and recursion. An element of ℕ \mathbb{N} is “a natural number” and has no structure other than that. By contrast, in ZFC, we generally define the natural numbers to be some particular material-set, such as { ∅ , { ∅ } , { ∅ , { ∅ } } , { ∅ , { ∅ } , { ∅ , { ∅ } } } … } \{ \emptyset, \{\emptyset\}, \{\emptyset, \{\emptyset\}\}, \{\emptyset, \{\emptyset\}, \{\emptyset, \{\emptyset\}\} \} \dots \} or { ∅ , { ∅ } , { { ∅ } } , { { { ∅ } } } , … } \{ \emptyset, \{\emptyset\}, \{\{\emptyset\}\}, \{\{\{\emptyset\}\}\}, \dots \} . In this case we can construct all the usual operations, but we also have other unwanted results such as 0 ∈ 1 0\in 1 that we have to consciously “abstract away from”.

Finally, on the gripping hand, I don’t think that function sets are particularly natural in either case. In ZFC, we have to define ordered pairs in some particular way, such as ( a , b ) = { { a } , { a , b } } (a,b) = \{ \{a\},\{a,b\}\} , and define a function to be a certain kind of set of ordered pairs. To obtain the set B A B^A of functions between two specified sets A A to B B , we usually also want to annotate these functions with A A and B B . Thus we have a set B A B^A , and we can define a notion of “application” of such a function to an element of A A , but we also have other unwanted properties. By contrast, in ETCS, the set B A B^A is characterized by a universal property. Its elements are not, literally, functions from A A to B B (which are basic things in the theory of ETCS), but only “codes” for them, which get identified with functions via an “evaluation” function ev : B A × A → B ev\colon B^A \times A\to B .

We can try to apply various band-aid fixes for these problems. For instance, we might augment ZFC with atoms, and define the natural numbers to be a set of atoms. We might similarly augment it with a basic “ordered pair” operation, or with a basic notion of “function” in addition to “set”. But it would be better to have a principled solution which solves all such problems at once. Thus, we ought to seek a foundational system which

contains structural-sets and their elements as basic things, but also subsets of these which behave like material-sets; contains both the structural quantifier-bounding membership and the material membership proposition; and allows elements of structural-sets to have some internal structure, but doesn’t force them to always have a particular kind of internal structure. The elements of a structural-set should have different structure depending on what that structural-set is: they may be material-sets, or functions, or ordered pairs, or natural numbers.

In fact, such a foundational system already exists: it is called Martin-Löf dependent type theory (MLTT). The language we use in MLTT is a bit different from that of set theory, but at the level of informal mathematics, the differences really amount to nothing more than changes in terminology and notation. Instead of “structural-sets” we speak of types. Sometimes we call their elements terms. And the quantifier-bounding membership is written as a : Z a:Z rather than a ∈ Z a\in Z (here Z Z , of course, is a type—that is, a structural-set—while a a is being given as an element of it).

In MLTT, we have many different ways to construct (or “form”) types, and each type-forming operation comes with rules specifying what the elements of that type are and how they behave. For instance, the rule for forming cartesian product types essentially says that the elements of A × B A\times B are, by definition, pairs ( a , b ) (a,b) where a : A a:A and b : B b:B . The pair-forming operation is primitive and not defined in terms of any other structure. This is in contrast to ZFC, where ( a , b ) (a,b) must be defined in terms of set-membership, and ETCS, where ( a , b ) (a,b) is only characterized as such relative to a choice of a cartesian product diagram for A A and B B .

Similarly, the rule for forming function types essentially says that the elements of B A B^A are, by definition, rules associating to every element of A A an element of B B . Formally, by this we mean an expression denoting an element of B B which contains a specified free variable of type A A , such as “ x 2 + 1 x^2+1 ” where x : ℝ x:\mathbb{R} . The resulting element of B A B^A is often denoted by a λ \lambda which binds the appropriate variable, such as “ λ x . x 2 + 1 \lambda x.\, x^2+1 ” for the element of ℝ ℝ \mathbb{R}^{\mathbb{R}} defined by the expression “ x 2 + 1 x^2+1 ”. There is a basic operation of application which associates to every f : B A f:B^A and every a : A a:A an element f ( a ) : B f(a):B , with the expected properties, e.g. ( λ x . x 2 + 1 ) ( 3 ) = 10 (\lambda x.\,x^2+1)(3) = 10 . This is in contrast to ZFC, where functions must be defined as sets of ordered pairs, and ETCS, where elements of B A B^A are “codes” for functions rather than literally functions themselves.

There are several other type-forming operations, but of particular note is the former for subsets. (Type theorists bear with me for a moment; this paragraph is a tad unusual.) Analogously to function-types, for any type A A we may have a type 𝒫 ( A ) \mathcal{P}(A) whose elements are expressions of the form { x : A | ϕ ( x ) } \{ x:A | \phi(x) \} , where ϕ ( x ) \phi(x) is a property involving a free variable x x of type A A . And we have a relation ∈ \in between elements of A A and subsets, with the expected properties, e.g. a ∈ { x : A | ϕ ( x ) } ⇔ ϕ ( a ) a\in \{ x:A | \phi(x) \} \;\Leftrightarrow\; \phi(a) . This relation ∈ \in is a basic aspect of the type 𝒫 ( A ) \mathcal{P}(A) and not defined in terms of anything else, just as function application is a basic aspect of the type B A B^A . Thus, the elements of 𝒫 ( A ) \mathcal{P}(A) are literally the subsets of A A , and these material-sets coexist in just the right way with the structural-sets (i.e. types) of the ambient theory.

In fact, usually in type theory 𝒫 ( A ) \mathcal{P}(A) is defined to be the function type Ω A \Omega^A , where Ω \Omega is the “type of propositions” or “truth values”, with { x : A | ϕ ( x ) } \{ x:A | \phi(x) \} defined to mean λ x . ϕ ( x ) \lambda x.\, \phi(x) . Then the membership relation ∈ \in reduces to function application, yielding exactly the correct rules. This amounts to defining subsets to be their characteristic functions, which may seem very natural to a category theorist. But if you don’t like it, you can ignore it: there’s no problem with a type theory that defines 𝒫 ( A ) \mathcal{P}(A) directly as I described above.

One last type-forming operation of note is a universe type, whose elements are other types. By combining this with dependent sums, we can build types whose elements are structured types. For instance, there is a type whose elements are pairs ( A , s ) (A,s) where A A is a type and s : A → A s:A\to A is an endofunction. The ability to build sets of this sort in ZFC is, I think, what Francois calls “using sets as data structures”. In ETCS the only way to talk about a structured set is to do the “pairing” at a meta-level, regarding a quantifier such as “for all groups” as abbreviating “for all structural-sets G G and functions m : G × G → G m:G\times G\to G such that the axioms of a group hold”. In type theory, we can avoid this infelicity, but we don’t have to use sets as data structures either: we can use data structures as data structures.

At this point, however, you may be feeling that type theory sounds very complicated. Lots of different ways to form types, each with their own rules for forming elements? Where is the simplicity and intuitiveness that we expect of a foundational theory?

Well, first of all, there are not that many type-forming rules. In fact, if we allow “rule schemas” (in the same way that ZFC has “axiom schemas” such as separation and replacement), then there are only three: dependent products, inductive types, and universes. (One might add coinductive types as a fourth.) Certainly that compares reasonably well with the half-dozen or dozen axioms (or axiom schemas) of ZFC and ETCS.

Secondly, every type-forming rule conforms to a well-understood pattern: there is a part for formation (making a new type), a part for introduction (how to construct elements of that type), a part for elimination (how to extract information from an arbitrary element of that type), and a part for computation (how introduction and elimination interact). This uniformity makes the meta-theoretic analysis of type theory particularly simple, allowing proofs of “consistency by evaluation”. It makes type theory into essentially a programming language, at the same time as a foundational system, allowing the easy verification and construction of proofs by computers. And it governs a principled approach to the construction of new models, and matches very closely the category-theoretic notions of left and right universal properties. (Some, but not all, of these advantages are shared by structural-set-theoretic axioms that describe universal properties in the category of sets.)

Even with all of this said, however, you may still feel that a conceptual simplicity is lacking. In ZFC, we have only the concepts of set and of membership. In ETCS, we have only the concepts of set and function, along with identities and composition. The axioms, you may say, only stipulate how these basic concepts behave.

In fact, one may argue that a foundational system should be about encoding more complicated concepts into simpler ones, even if the encoding turns out to be somewhat unnatural. For instance, the fewer basic concepts we have to deal with, the easier it is to prove consistency results and construct new models from old ones.

However, I claim that the conceptual simplicity of set theories vis-a-vis type theories is an illusion, because both material-set theories and structural-set theories are built on top of the edifice of first-order logic. To introduce them formally, therefore, we must first describe the rules for forming first-order formulas, with connectives such as conjunction, disjunction, and implication, and quantifiers such as “for all” and “there exists”, and also the rules governing inference that tell us how to construct proofs of formulas involving connectives and quantifiers. In other words, ZFC and ETCS are not just theories of sets; they are theories of sets and propositions. The interplay between sets and propositions is most obvious in the axiom schemas such as replacement, which introduce an arbitrary first-order formula. It is also evident when constructing a new model of set theory, in which case one must first specify how all logical formulas are to be interpreted.

By contrast, type theory is not built on top of first-order logic; it does not require the imposing superstructure of connectives, quantifiers, and inference to be built up before we start to state axioms. Of course, type theory has first-order logic, which is a necessity for doing mathematics. But first-order logic in type theory is just a special case of the type-forming rules. A proposition is merely a certain type; to prove it is to exhibit an element of that type. When applied to types that are propositions, the type-forming operations of cartesian product, disjoint union, and function types reduce to the logical connectives of conjunction, disjunction, and implication; the quantifiers arise similarly from dependent sums and products. Thus, type theory is not an alternative to set theory built on the same “sub-foundations”; instead it has re-excavated those sub-foundations and incorporated them into the foundational theory itself. So not only is it more faithful to mathematical practice than either kind of set theory, it is literally simpler as well.

The problem of encoding, however, has not gone away entirely. In type theory, we can see it as another aspect of modularity in software engineering. For instance, the usual definition of the natural numbers in type theory is as an inductive type generated by zero and successor. However, when we regard type theory as a programming language, this definition gives rise to a “unary” representation, which is computationally inefficient. Instead, we could define a “binary” representation, involving as basic the operations of “doubling” and “doubling and adding one”. Mathematically, these two definitions should be “equivalent”, but they are not literally identical, in the same way that { ∅ , { ∅ } , { ∅ , { ∅ } } , { ∅ , { ∅ } , { ∅ , { ∅ } } } … } \{ \emptyset, \{\emptyset\}, \{\emptyset, \{\emptyset\}\}, \{\emptyset, \{\emptyset\}, \{\emptyset, \{\emptyset\}\} \} \dots \} and { ∅ , { ∅ } , { { ∅ } } , { { { ∅ } } } , … } \{ \emptyset, \{\emptyset\}, \{\{\emptyset\}\}, \{\{\{\emptyset\}\}\}, \dots \} are not.

The solution to this problem is the univalence axiom, which can be concisely described as saying that “isomorphic structures are equal”. More specifically, it says that if A A and B B are types, then proving that A = B A=B is the same as specifying an isomorphism between A A and B B . This implies that analogously, structured types, such as natural numbers objects, groups, rings, fields, etc., are equal if they are isomorphic by an isomorphism preserving the structure. In particular, the unary and binary definitions of the natural numbers can be shown to be isomorphic (as sets, or as monoids, or semirings, or whatever structure we are interested in), and therefore equal. Thus, anything we say about one of them is automatically equally true of the other. More generally, the “type of natural numbers objects” can be shown to contain exactly one element, so that we can speak freely of “the natural numbers” without any abuse of language—the natural numbers are literally unique.

In set theoretic foundations, of course, we could already prove that any two natural numbers objects were isomorphic. In material-set theory, all we can conclude from that is that any “isomorphism-invariant property” of one natural numbers object carries over to another. Then we have to do work (or, as is usually the case, leave work to the reader) proving that any particular property we care about is isomorphism-invariant. The situation in structural-set theory is somewhat better: if a structural-set theory is presented carefully enough, then we can only say isomorphism-invariant things. Both of these approaches, however, eventually lead to dependent type theory anyway: it is the way to characterize the isomorphism-invariant properties, and the careful way to present structural set theories. Moreover, in univalent type theory we can prove the statement “all properties are invariant under isomorphism” directly, rather than only observing it as a meta-property of the theory.

Univalence is made possible precisely by the incorporation of first-order logic into type theory at a basic level. Since the proposition A = B A=B is, like any proposition, a type, which is proven by giving an element of it, its elements can be things with appropriate structure. When A A and B B are types, therefore (that is, elements of a universe type), we can require that the elements of A = B A=B are isomorphisms from A A to B B . This is unworkable in classical first-order logic, where propositions and proofs exist at a different level than sets, and in particular proofs cannot contain information, such as which isomorphism between A A and B B was used to prove A = B A=B . And retaining this information is essential, because to say that a property or structure is “isomorphism-invariant” involves transporting it along a particular isomorphism.

Univalence does require some getting used to, since as you can see, “equality” in univalent type theory behaves a bit differently from what you are probably used to. Fortunately, for most kinds of concete mathematics, this funny behavior of equality doesn’t arise. The best way to think about this is that “types” are a more general kind of thing than structural-sets. Some types do behave just like structural-sets; in univalent type theory we call them “h-sets”. However, other types, such as the type of h-sets, or the type of algebraic structures of some sort, have “higher h-level”, and that’s when equality starts to behave more unusually. All we’ve done is to expand the universe of mathematics to contain new things; no familiar object behaves any differently than it used to.