Axiomatic set theories such as that of Zermelo and Fraenkel, in their attempt to provide a comprehensive foundation for mathematics, involve several intricate tricks to avoid becoming inconsistent. A suitably naïve set theory is already inconsistent due to the infamous paradoxical set of Russell [1938].

Here we have used set comprehension to define as the set of all sets that do not contain themselves. This leads to the question, does contain ? If is an element of , then it is not, as only contains sets that do not contain themselves. If is not an element of , then it is, as does not contain itself – We have a paradox!

To address this, different foundations take different approaches. Most axiomatic set theories eliminate or restrict the rule of comprehension, that is, they don’t allow sets to be constructed from arbitrary predicates. Instead, set comprehension can only be used to describe subsets of already constructed sets. This prevents comprehension from being used above, but it also prevents a lot of other useful constructions, like products or unions! Thus a handful of other axioms to construct sets are added, such as pairing, union, powerset and so on (all nicely explained in Halmos [1960]).

Another axiom, that of regularity, says that there is no infinite sequence such that, for any , . This implies that no set can contain itself, and allows us to build the universe of set theory by stages, called ranks. At rank zero, no sets exist; at rank one, there is just the empty set; at rank two, there is also the set containing the empty set; and at each following rank, the added sets all contain the sets that are defined at earlier ranks, as shown in the following figure:

The entire universe of set theory can be thought of as the union of the universe at each rank, , a presentation originally due to Zermelo [1930], but commonly attributed to John von Neumann.

Moving to Type Theory

This stratification bears remarkable similarity to Russell’s theory of types (see Russell [1938]), his own solution to the the paradoxical set , and the distant ancestor of modern type theory.

Indeed, in the intuitionistic type theory of Martin-Löf [1984], the approximate foundation of the Agda proof assistant, we have a heirarchy of types that very much resembles that of von Neumann or Zermelo :

The rule of cumulativity, which is not present in Agda , but exists in some type theories and languages such as Idris, makes this resemblance even stronger:

This rule implies that like the von Neumann rank , a type is inhabited by every type where .

The differences between the two theories start to emerge when one examines why this stratification exists in type theory. In axiomatic set theory, eliminating the axiom of regularity and thus the stratification it implies makes it rather difficult to do induction, but it does not make the theory inconsistent – there have been several non-well-founded set theories proposed, such the hyperset theory of Aczel [1988], which do exactly this.

Removing unrestricted set comprehension is enough to avoid Russell’s paradox, as it allows us to distinguish between formulae (or predicates) and sets. Unlike informal set theory, we cannot construct a set for any given formula. For example, is a valid formula, but is not a set.

Type theories are not set theories – they do not have a separate logical formula language, like that of Frege, to serve as a basis for the theory. So, one cannot achieve consistency in type theory by restricting how a set may be constructed from a logical formula. Instead, type theory places restrictions on the kinds of formulae that can be expressed. Rather that rule out paradoxical sets representing self-referential propositions, type theory rules out the propositions themselves. In such a theory, it is not even well-formed to ask if a set contains itself .

This restriction is a consequence of the hierarchy mentioned earlier – remove this from type theory, by saying instead that , and the result is more or less equivalent to Falso . We can show that type theory is inconsistent with this change using Girard’s paradox, which is a generalised encoding of Russell’s paradox for pure type systems. The contradiction derived from this paradox is rather involved, so much so that Martin-Löf himself didn’t realise that it applied to the first version of his type theory. Hurkens [1995] provided a simplification, which is encoded in Agda here.

With inductive types, however, we can use Russell’s paradox directly, by formalising a naïve notion of sets as comprehensions, and using this to derive a contradiction.

Russell’s Paradox in Agda

For these (interactive) Agda snippets, I have enabled --type-in-type , which removes the predicative heirarchy from the type theory, instead stating that .

This defines a set (written set X f ) as a comprehension over an carrier type X and a function f , where the element for a given index value x : X is given by f x . This definition is already using the fact that – normally, a type (of type ) would not be permitted as a parameter to set , which constructs a type of the same size .

The empty set, having no elements, uses the empty type as its carrier :

The set containing the empty set, having one element, uses the unit type as its carrier:

The next rank, , has two elements, and thus can use Bool as its carrier:

More sets could be defined using similar techniques, so I will forgo any further definitions.

We can also define the membership operators for our SET type:

A value of type a ∈ set X f can be thought of as a proof that there exists a value x : X for which the element function f gives a .

Using these operators, we can define Russell’s paradoxical set as follows:

This is a set which, for its carrier type, uses pairs containing a set s and a proof that s does not contain itself. The element function just discards the proof, leaving us with the SET of all SET s that do not contain themselves.

Indeed, we can prove that every set which is in Δ does not contain itself:

Δ

But we know that every set which does not contain itself is in Δ :

And from this we can derive a contradiction:

Musings and Speculation

I find it very curious that two very different approaches to formalising mathematics end up with much the same stratified character, and for different reasons. Perhaps this Russell-style heirarchy is, kind of like the Church-Turing thesis, a fundamental characteristic of any sufficiently expressive foundation. Something discovered rather than invented. In the words of Scott [1974]:

The truth is that there is only one satisfactory way of avoiding the paradoxes: namely, the use of some form of the theory of types. That was at the basis of both Russell’s and Zermelo’s intuitions. Indeed the best way to regard Zermelo’s theory is as a simplification and extension of Russell’s. (We mean Russell’s simple theory of types, of course.) The simplification was to make the types cumulative. Thus mixing of types is easier and annoying repetitions are avoided. Once the later types are allowed to accumulate the earlier ones, we can then easily imagine extending the types into the transfinite – just how far we want to go must necessarily be left open. Now Russell made his types explicit in his notation and Zermelo left them implicit. [emphasis in original]

Acknowledgements

The Agda development in this post is taken from one of Thorsten Altenkirch’s lectures, the code of which is available here. The original proof is, as far as I can tell, due to Chad E Brown, who formulated the same thing in Coq.

References