First published Wed Jan 4, 2006; substantive revision Tue May 22, 2018

The observant reader will have spotted the appearance above of the expression weakly stratified. What is this? The instance of the comprehension that says that \(x \cup \{y\}\) always exists is stratified, and is therefore an axiom. So \(x \cup \{y\}\) always exists. So \(x \cup \{x\}\) always exists, by substitution. However the instance of the comprehension scheme alleging its existence is not stratified. The term ‘\(x \cup \{x\}\)’ is said to be weakly stratified by which it is meant that it can be stratified if we are allowed to give different types to distinct occurrences of free variables. Weak stratification is what is needed to admit ab initio those instances of the comprehension scheme that give us substitution instances of stratified instances, and not require a detour such as the detour here through the existence of \(x \cup \{y\}\).

Some illustrations may help: \(x \in x\) is not stratified. \(x \in y\) is. Thus not every substitution-instance of a stratified formula is stratified. \(y = \wp(x)\) is stratified (the fancy P means power set), with the variable \(y\) being given a type one higher than the type given to \(x\). To check this we have to write out ‘\(y = \wp(x)\)’ in primitive notation. (In primitive notation, this formula becomes: \(\forall z(z \in y \leftrightarrow \forall w(w \in z \rightarrow w \in x))\). One can assign 0 to \(w, 1\) to \(z\), and 2 to \(x\), to get a stratification of this formula.) In general it is always necessary to write a formula out in primitive notation – at least until one gets the hang of it.

where ‘\(x\)’ is not free in \(\Phi\) and \(\Phi\) is (weakly) stratified. This last condition requires that there should be a function \(\sigma\) (a “stratification”) from the bound variables in \(\Phi\) to an initial segment of the natural numbers such that if ‘\(u \in v\)’ is a subformula of \(\Phi\) then \(\sigma(`v\rsquo) = \sigma(`u\rsquo) + 1\) and if \(`u = v\rsquo\) is a subformula of \(\Phi\) then \(\sigma(`v\rsquo) = \sigma(`u\rsquo)\). The origins of this constraint will be explained below.

1. The Roots of Set Theory

There are various forces that gave rise to set theory, and various forces that sustain it and drive it on. The following two conceptions of set overlap but do not entirely coincide.

The notion of arbitrary set grew in the late 19th century from developments in analysis, somewhat in the way that the concept of arbitrary real-valued function had grown earlier in the century. To this day there are people who feel that the centre ground of set theory lies in Analysis. A second idea is that of sets as one extensional (in the sense of the intension/extension distinction) limb (the others being lists and multisets) of a body whose intensional limbs are function-in-intension, properties, algorithm. (Sets as properties-in-extension is not a completely straightforward story: the point has been made that much of the heartsearching occasioned by the emergence of the Axiom of Choice as a principle of set existence arose precisely because sometimes the sets it postulated seemed not to be the extension of any property. But this is to make the same point in a back-handed way: it shows that people were expecting sets to be extensions of known intensions.)

If we think of sets as extensions, what are the corresponding intensions? The answer must surely be: one-place predicates. (Solace for the distractible: “So what are the extensions corresponding to many-place predicates?” The answer is not to worry; whatever you think they ought to be, you can fake them with ordered tuples and sets).

From our point of view, this is an interesting analysis, because on this account of set, we clearly ought to believe in an axiom of complementation for sets. If you think that a negation of a proposition is a proposition (so that in particular a negation of an atomic proposition is a proposition) then you also believe that a negation (or complement or whatever you want to call it) of a predicate is another predicate. Notice that this doesn’t tell you that your complementation has to be classical, but it does tell you that a complement of a set should be a set. This will turn out to be one of the axioms of NF.

Sets are just extensional entities with particular multiplicity properties (that mark them off from multisets) and certain functionalities (that mark them off from functions-in-extension, graphs etc). Among the variety of objects in the extensional canon, sets are distinguished by their almost complete absence of structure. This points to one of the most perplexing but prominent rôles of set theory in modern mathematics. Almost any (indeed many people believe literally any) fact in mathematics can be construed as a fact about sets.

The idea that sets are extensions manifests itself in the idea (to which we will return) that set theory is the theory of one binary extensional relation (parallel to graph theory as the theory of a single symmetric irreflexive relation). Useful though this idea will turn out to be, it does not suggest any particular axioms for Set theory.

Axioms of set theory have arisen in part out of our understanding of the paradoxes. The most striking feature of the paradoxes is something vulgarly called “self-reference”, and ideas about how to deal with it have been a useful constraint on our invention of axioms for set theory. The search for a spell to ward off the evils attendant upon self-reference has fed into the axiomatisation of set theory in two ways, corresponding to the two ways in which self-reference has been (mis)-understood. Is it circularity in the specification of sets that is to be eschewed? Or do we locate the problem in circularity in the membership relation between sets? At all events, we have to be careful to keep to a minimum the number of sinews we cut, for there are important theorems (Gödel’s Incompleteness theorem among others) whose proofs rely on the same “self-reference” that we find in the paradoxes but which we are loath to abandon.

2. Wellfoundedness and ZF

The axioms of ZF are motivated by the second point of view – in fact by a very specific school of it: the view that the key is not mere noncircularity of \(\in\) but wellfoundedness. A set \(x\) is wellfounded if there are no infinite sequences \(\ldots x_n \in x_{n-1} \in \ldots x_3 \in x_2 \in x_1\). The axioms of ZF can usefully be thought of as arising from an attempt to axiomatise the theory of wellfounded sets. This project has been so successful that nowadays many people believe that wellfounded sets are the only kind of set there is – if it isn’t wellfounded it isn’t a set but a hyperset perhaps. This idea is a modern rewriting of history: Set theory did not start with the idea that all sets should be wellfounded (had it done so, the axiom scheme of naïve comprehension would never have been entertained!) The idea that we should admit a candidate set as a genuine set only if promises to be wellfounded is an overreaction to the uncontroversial observation that the suspect collections that feature in the paradoxes (the universe, the Russell class) are all flagrantly illfounded, or at least would be if they existed. It is easy to jump from this insight to the conclusion that wellfoundedness of \(\in\) is the only thing that matters. The result of this jump is ZF set theory.

But historically-wrong and philosophically-hasty does not imply unproductive. There is no doubt that restricting attention to wellfounded sets (in the sense that it is them alone that one tries to axiomatise) concentrated the minds of 20th century mathematicians to great effect. ZF does seem to avoid the paradoxes, and the conception of set it formalises is one that apparently allows one to interpret the whole of mathematics. But how can an historically wrong and philosophically hasty story have such a happy ending? The answer is that, although it may be a mistake to think that all sets are wellfounded, it is not a mistake to think that the concept of wellfounded set is worth axiomatising.

3. Type Theory and the Axiomatisation of NF

The conception of set underlying NF arises from a related – indeed complementary – oversimplification. ZF arises from the belief that if we enforce a kind of wellfoundedness in the constitution of set then all will be well. NF arises ultimately from the idea that all will be well if we enforce a kind of wellfoundedness (or at least noncircularity) in the description of sets. This concern can take two forms: One might feel uneasy about conjuring up a set by means of a spell that invokes other sets that are in some sense “not yet” available (sets of which the candidate will turn out to be a member, for example). This is a mathematically fruitful concern (keyword “predicativity”) and it will reappear later in the study of NF (and in ZF too, of course) but it is not this first form that gives rise to the axiomatisation of NF. In the second view, the key to neutralising the dangers of self-reference lies in segregating sets into pairwise disjoint levels called types. (That this concern is orthogonal to the first kind of concern can be shown by consideration of the axiom of sumset. This is a legitimate axiom from the second point of view, even though the spell that conjures \(\bigcup X\) does it by invoking the set \(X\), which is of higher type.)

This view of set theory gave rise to the Theory of Types. In the Theory of Types (as simplified successively by Ramsey, Carnap, Gödel and Quine) the bottom type is a collection of atoms. Atoms are non-sets (dormice, teapots, anything you please) with no internal structure visible to the set theory. Thereafter type \(n + 1\) is simply the power set of type \(n\), and there is a type for each natural number. The process cannot be extended to give transfinite types, since the types (being formally disjoint) are not cumulative so there is no sensible way to define a \(\omega\)th type. NF arises from a consideration of the syntax used to describe structures of this kind, and the syntax requires more careful handling. A first-order language for describing structures of this sort has infinitely many suites of variables, one suite for each type, and a membership symbol (‘\(\in\)’ as usual) which can be placed only between variables of consecutive types: ‘\(x_n \in y_{n+1}\)’ is wellformed but ‘\(x_n \in y_n\)’ is not. The equality sign can be placed only between variables of the same type.

This type theory has the following axioms. Extensionality at each type:

\[ \forall x_n \forall y_n [x_n =y_n \leftrightarrow \forall z_{n+1}(x_n \in z_{n+1} \leftrightarrow y_n \in z_{n+1})] \]

and an axiom scheme of comprehension at each type. Notice that the formula displayed above is not, strictly speaking, an axiom of extensionality, since the variables do not have concrete type subscripts, so it is not officially a formula at all. It’s a kind of axiom scheme. Annoyingly, this subtlety matters. The fact that variables have type subscripts precludes the more usual exploitations of subscripting for purposes such as indicating position in a sequence. This makes the language of type theory infeasible for most mathematical purposes, so one looks for excuses to omit the type subscripts. Why might this be a safe manœuvre? Let us define \(\Phi^+\) to be the result of raising all the type subscripts on all variables in \(\Phi\). It’s safe because the theory has the same axioms at each type, so it has the same theorems at each type, so whenever one has a proof of a formula \(\Phi\) one also has a proof of \(\Phi^+\). One omits the type subscripts because even though, once has omitted them, there are infinitely many ways of restoring them, these infinitely many differently restored versions do not appear to differ significantly. We shall take up later the theme of the significance of this difference, but for the moment we note only that it appears – as it appeared to Russell and his successors up to and including Quine – that the differences could be safely overlooked. This strategy of overlooking is called typical ambiguity.

This version of Type theory, as simplified by Ramsey, Gödel, Carnap and Quine. is the point of departure for Quine [1937a] (which is still a good place to read the background). NF is obtained from this type theory by taking its axioms and stripping the type subscripts off all the variables. This is subtly different from Type Theory used with a policy of typical ambiguity. Under that scheme a formula like

\[\tag{A} \exists x\forall y(y \in x) \]

isn’t really a formula at all, but a formula scheme, or shorthand for any of the genuine formulæ to be obtained from it by adding subscripts to ‘\(x\)’ and ‘\(y\)’. Which subscript? In every setting it will either (i) become clear which subscript, or (ii) become clear that it doesn’t matter. At all events, (A) is not an axiom of Type theory: the theory does not have a universal set, what it has is a universal set at each type.

However it is not just the universal set that appears at each type. We also get the naturals, the reals and the rationals etc. etc. at each type, and this is clearly not what we want. (Holmes refers to this as the Hall of Mirrors effect.) Quine’s move in [1937a] is the natural one of taking the pseudoformulæ like (A) and thinking of them as genuine formulæ in a new, one-sorted, language. The new theory does not have a universal set at each type, for it has no types: it has a single universal set. And a single implementation of the reals, and so on. Formulæ of the one-sorted language obtained in this way from formulæ of type theory are said to be stratified. The axioms of NF are now extensionality plus those instances of naïve comprehension that happen to be stratified.

Interestingly, natural and well-motivated though this progression sounds, it does not provide a consistency proof for the system it ends with. We will discuss this later.

The hard part is to fully understand stratification. There is an easy rule of thumb with formulæ that are in primitive notation, for one can just ask oneself whether the formula could become a wff of type theory by adding type indices. It’s harder when one has formulæ no longer in primitive notation, and the reader encounters these difficulties very early on, since the ordered pair is not a set-theoretic notion. How does one determine whether or not a formula is stratified when it contains subformulæ like \(f(x) = y\)? The technical/notational difficulty here lands on top of – as so often – a conceptual difficulty. The answer is that of course one has to fix an implementation of ordered pair and stick to it. Does that mean that – for formulæ involving ordered pairs – whether or not the given formula is stratified depends on how one implements ordered pairs? The answer is ‘yes’ but the situation is not as grave as this suggests. There are various standard definitions of ordered pair, and they are all legitimate in NF, and all satisfactory in the sense that they are “level” or homogeneous. All of them make the formula “\(\langle x, y\rangle = z\)” stratified and give the variables \(x\) and \(y\) the same type; \(z\) takes a higher type in most cases (never lower). How much higher depends on the version of ordered pair being used, but there are very few formulæ that come out stratified on one version of ordered pair but unstratified on another, and they are all pathological in ways reminiscent of the paradoxes.

The best way to illustrate this is by considering ordinals \((=\) isomorphism classes of wellorderings) in NF. For any ordinal \(\alpha\) the order type of the set (and it is a set) of the ordinals below \(\alpha\) is wellordered. In ZF one can prove that the wellordering of the ordinals below \(\alpha\) is of length \(\alpha\). In NF one cannot prove this equation for arbitrary \(\alpha\) since the formula in the set abstract whose extension is the graph of the isomorphism is not stratified for any implementation of ordered pair. Now any wellordering \(R\) of a set \(A\) to length \(\alpha\) gives rise to a wellordering of \(\{\{\{a\}\} : a \in A\}\), and if instead one tries to prove (in NF) that the ordinals below \(\alpha\) are isomorphic to the wellordering of length \(\alpha\) decorated with curly brackets, one finds that the very assertion that there is an isomorphism between these two wellorderings comes out stratified or unstratified depending on one’s choice of implementation of ordered pair! This is because, in some sense, the applications of the pairing function are two deep in wellordering of the ordinals below \(\alpha\), but only one deep in the wellordering of the set of double singletons. If we use Quine ordered pairs, the assertion is stratified – and provable. If one uses Wiener-Kuratowski ordered pairs then the assertion is unstratified and refutable. However if one uses Wiener-Kuratowki ordered pairs there is instead the assertion that the ordinals below \(\alpha\) are isomorphic to the obvious wellordering of \(\{\{\{\{\{a\}\}\}\} : a \in A\}\), which comes out stratified (and provable). In general for each implementation of ordered pair there is a depth of nesting of curly brackets which will make a version of this equality come out stratified and true. This does not work with deviant implementations of ordered pair under which “\(\langle x, y\rangle = z\)” is unstratified or even with those which are stratified but give the variables \(x\) and \(y\) different types. Use of such implementations of ordered pairs results in certain sets not being the same size as themselves!

Perhaps a concrete example would help. Let us try to prove Cantor’s theorem. The key step in showing there is no surjection \(f : X \to \wp(X)\) by reductio ad absurdum is the construction of the diagonal set \(\{x \in X : x

ot\in f(x)\}\). The proof relies on this object being a set, which it will be a set if “\(x \in X \amp x

ot\in f(x) \amp f : X \rightarrow \wp(X)\)” is stratified. This in turn depends on “\(\exists y[y \in \wp(X) \amp \langle y, x\rangle \in f \amp f : X \rightarrow \wp(X)]\)” being stratified. And it isn’t stratified, because “\(\langle y, x\rangle \in f\)” compels ‘\(x\)’ and ‘\(y\)’ to be given the same type, while “\(f : X \rightarrow \wp(X)\)” will compel ‘\(y\)’ to be given a type one higher than ‘\(x\)’. This is because we have subformulæ ‘\(x \in X\)’ and ‘\(y \subseteq X\)’. Notice that we can draw this melancholy conclusion without knowing whether the type of ‘\(f\)’ is one higher than that type of its argument, or two, or three …. We cannot prove Cantor’s theorem.

However if we try instead to prove that \(\{\{x\} : x \in X\}\) is not the same size as \(\wp(X)\) we find that the diagonal set is defined by a stratified condition and exists, so the proof succeeds. This tells us that we cannot prove that \(\lvert X\rvert = \lvert \{\{x\} : x \in X\}\rvert\) for arbitrary \(X\): graphs of restrictions of the singleton function tend not to exist. If they did, we would be able to prove Cantor’s theorem in full generality. This gives rise to an endomorphism \(T\) on cardinals, where \(T\lvert X\rvert =_{df} \lvert \{\{x\} : x \in X\}\rvert\). \(T\) misbehaves in connection with the sets that in NF studies we call big (as opposed to large, as in large cardinals (in ZF)). These are the collections like the universal set, and the set of all cardinals and the set of all ordinals: collections denoted by expressions which in ZF-like theories will pick out proper classes. If \(\lvert X\rvert = \lvert \{\{x\} : x \in X\}\rvert\) we say that \(X\) is cantorian. If the singleton function restricted to \(X\) exists, we say that \(X\) is strongly cantorian. Sets whose sizes are concrete natural numbers are strongly cantorian. IN (the set of Frege natural numbers) is cantorian, but the assertion that it is strongly cantorian implies the consistency of NF.

It can be plausibly argued that the bizarre behaviour of the \(T\) function (and it is at its most bizarre in connection with the big sets) is just the paradoxes bubbling up again. Assertions to the effect that \(T\) is well-behaved have strong consistency strength (Orey [1955], [1956]) and play in NF studies the kind of rôle that large cardinal axioms play in ZF-like systems. Indeed there are some exact equivalents. The \(T\)-assertions may look odd, but their oddness is only skin-deep. They are actually expressing the same mathematics that happens in ZF, but through an unfamiliar formalisation.

4. History

In the early years of NF studies (1937–1952) it was not known whether or not NF proved the axiom of infinity. Specker showed that it did, though he never published his proof: the (quite different) proof in Specker [1953] is of his stronger result that NF proves that the universe cannot be wellordered (and is therefore infinite) and the first published proof is in Henson [1973a])

So NF refutes AC. However – so far – it seems that (apart from some weird weak formulations that make no sense outside an NF context and which all arise from refinements of Specker’s proof) it is only full AC that fails, so that some strong consequences of AC remain on the cards for the moment. In particular, no-one has refuted the prime ideal theorem or countable choice or DC. Further, it is striking that all known failures of AC involve big sets. An axiom saying that all wellfounded sets are wellordered might – for all we know – be consistent with NF. In particular there is no reason to suppose that NF refutes the weak forms of AC needed in the study of the reals and complexes (“Analysis”). Indeed as far as we know we can safely add to NF some axioms to say that the wellfounded sets form a model of ZFC. According to this view, NF does not so much contradict ZFC as cover extra topics, such as the universal set – phenomena about which ZF has nothing to say. This underpins the view held by both Church and Quine, albeit in different ways, that a synthesis of ZFC and NF could be obtained along these lines.

Quite early on Quine added classes to NF to obtain a theory of sets-with-classes known as ML. In the ZF context, adding classes is a natural thing to do, for it enables one to reduce the infinite replacement scheme to a single set-existence axiom: “the image of a set in a class is a set” if augmented with suitable class-existence axioms. However, none of the axioms of NF refer to classes in the way the replacement scheme of ZF does, so there is nothing for the class existence axioms to do. For this reason ML is nowadays regarded as a pointless syntactic complication of NF with no new mathematics, and it is not the subject of any research. At best the classes can be a harmless halo of epiphenomena (Wang [1950]). However, at worst – if the set existence axioms are allowed to have bound class variables (as in the 1940 first edition of Quine [1951a]) – then the class of total orders that are genuinely (“externally”) wellordered (no subclasses without least members) becomes a set and we have the Burali-Forti paradox (Rosser [1942]). The rumours still occasionally surfacing among mathematicians of the inconsistency of NF probably have their origin in a misreporting of this result, but the result itself concerns ML only so it has no bearing on the consistency question for NF. If NF is inconsistent it is for totally different reasons. There is recent work suggesting that NF is consistent but it has not yet been confirmed. A paper containing an alleged proof of Con(NF) is working its way through the refereeing process as this entry is being revised in 2018.

In [1944] Hailperin gave the first of a number of finite axiomatisations of NF now known. Many of them exploit the function \(x \mapsto \{y : x \in y\}\) which is injective and total and is an \(\in\)-isomorphism. This function was known to Whitehead, who suggested to Quine that \(\{y : x \in y\}\) should be called the “essence” of \(x\) (a terminology clearly suggested by a view of sets as properties-in-extension).

Another early result was the observation of Rosser [1952] that NF admitted a type-level ordered pair iff the axiom of infinity was a theorem of NF. (see Forster [1994])

The second major development was Scott’s [1962] application of Rieger-Bernays independence methods to NF. If \(\langle X, E\rangle\) is a binary structure with \(E\) an extensional relation on \(X\) (so that \(\langle X, E\rangle\) is a model of a set theory of some kind) and \(\pi\) is a permutation of \(X\), then \(\{\langle x, \pi(y)\rangle : \langle x, y\rangle \in E\}\) is also an extensional relation on \(X\) and gives us a new model of some kind of set theory. If certain constraints are respected, this method outputs models of ZF when given models of ZF (and was first invented in order to prove the independence of the axiom of foundation from the other axioms of ZF) but it works even better with NF, since the construction preserves stratified formulæ.

The third major development came when Specker (in [1958] and [1962]) placed the narrative in Quine [1937a] recounted above on a firm mathematical footing. (For an illustrated translation, with commentary, of Specker [1958], see Forster [2000], in the Other Internet Resources section below.) We’ve seen that \(\Phi^+\) is the result of raising all the type subscripts on all variables in \(\Phi\) by one. Then \(\Phi \leftrightarrow \Phi^+\) is an axiom of typical ambiguity. Specker established that the consistency of NF is equivalent to the consistency of TST + the scheme of all axioms of typical ambiguity. This reduction of the consistency question of NF to the consistency of TST + the axiom scheme of typical ambiguity clarifies matters but doesn’t actually solve anything, since the axioms of typical ambiguity turn out to be quite strong – as are their analogues in other languages (such as geometry) that admit endomorphisms. Notice that the mere fact that if \(\phi\) is provable then so is \(\phi^+\) is not enough to ensure that \(\phi \rightarrow \phi^+\) is provable, so the axioms of typical ambiguity are prima facie nontrivial and we should not be surprised that they turn out to be strong.

Kaye [1991] generalised Specker’s method to show how any typed theory of sets, garnished with the axioms of typical ambiguity, turns out to be equiconsistent with a one-sorted theory. Various weak fragments of NF can be shown consistent by this kind of analysis. Oswald ([1976], [1981] and [1982]) introduced a notation according to which \(NF_n\) is the system axiomatised by the \(n\)-stratified axioms of NF, and \(N_n F\) is extensionality plus the existence of \(\{x : \phi(x)\}\) where \(\phi\) is \(n\)-stratified (parameters are allowed). Oswald showed that

\[ N_1 F \subseteq NF_2 \subseteq N_2 F \subseteq NF_3 \subseteq N_3 F = \mathrm{NF} \]

and that the inclusions are all proper. The consistency of \(NF_3\) is a result of Grishin [1969].

Crabbé [1982a] proved the consistency of two predicative fragments of NF: NFP and NFI. Both are extensionality plus existence of \(\{y : \phi(y, x_1 ,\ldots ,x_n)\}\) where \(\phi\) is stratified and there are extra restrictions on the variables in \(\phi\). All variables (bound or free) are restricted to type no higher than that of the set being defined; bound variable are not to be of any type exceeding the type of \(y\) (NFP) or (type of \(y) + 1\) (NFI). NFP + Axiom of sumset = NF. They are distinguished from the other subsystems of NF known to be consistent by the circumstance that they allow a proof of the axiom of infinity which makes essential use of Specker’s proof of the axiom of infinity. In NFP we argue: either the axiom of sumset holds, in which case we have NF and can run Specker’s proof of the Axiom of infinity, or (ii) it doesn’t, in which case – since finite sets have sumsets – not every set is finite.

The most dramatic result obtained by a type-theory-plus-ambiguity-axiom analysis came when Ronald Jensen [1969] noticed that if we weaken TST to a theory called TSTU by allowing the existence of urelemente (distinct empty sets) then it is easy to prove the consistency of the axioms of typical ambiguity. The result is a model of NFU, which is NF with extensionality weakened to allow urelemente. Jensen referred to NFU as “slight(?) modification of Quine’s NF”, but – as he showed – the modification has huge effects: Specker’s proof of the axiom of infinity no longer works, and AC is not refutable either.

Jensen’s consistency proof for TSTU (and thereby NFU) combines Ramsey theory with a technique that had been developed originally by Abraham Robinson [1939] to obtain a proof of the independence of extensionality from the other axioms of ZF.

Boffa [1988] noticed another consistency proof of NFU. If \(\langle V, \in \rangle\) is a model of \(Z\) (Zermelo set theory) with an external automorphism \(\sigma\) and an ordinal \(\kappa\) such that \(\kappa \rangle \sigma(\kappa)\), then \(V_{\kappa}\) is the domain of a model of NFU whose membership relation is

\[ \{\langle x, y\rangle : \sigma(x) \in y \amp \varrho(y) \le \sigma(\kappa) + 1\}. \]

\((\varrho\) is set-theoretic rank.) Remarkably, as Holmes showed, one can even recover from this model of NFU the nonstandard model of ZF that gave rise to it. He does this by exploiting the idea of set pictures. We saw earlier the idea that set theory is the theory of one extensional binary relation (with equality of course: extensionality cannot be defined in a language without equality). Thus one could say that a set (or, better, a set picture) is a isomorphism class of binary structures consisting of a domain, an extensional binary relation, and a designated element of the carrier set. Relational structures of this kind look like transitive closures of sets and the designated element of the domain indicates the set whose transitive closure is being depicted. (Consequently there is an accessibility condition for the designated element: if one thinks of the extensional relation as a graph, there must be a directed path from any element to the designated element).

Although it was Aczel’s [1988] book that made the idea of Accessible Pointed Graph common currency, the idea goes back at least to Hinnion’s Ph.D. thesis [1974] and in a less developed form to Specker’s (unpublished) Habilitationsschrift and quite possibly earlier. Aczel used this technique to provide models of Forti-Honsell’s antifoundation axiom [1983]; Hinnion’s idea was to use isomorphism classes of wellfounded pointed extensional relational types to interpret theories of wellfounded sets in NF. With wellfounded extensional relations one does not need to make the pointing explicit, since the wellfoundedness ensures that at most one element of the carrier set can satisfy the accessibility condition. (This is not so in general: the two sets \(x = \{y\}\); \(y = \{x, \varnothing \}\) have set pictures that have the same binary relation but different pointing.) In NFU each set picture is a set (a big set, as it happens) and Holmes showed that in a model arising à la Boffa the set pictures reproduce the sets of the original nonstandard model. This striking phenomenon – which has no parallel in the study of NF – justifies Holmes’ dictum that “NFU can be understood as the theory of certain nonstandard models of Zermelo-style set theory with automorphisms.”

There is a widespread misapprehension that Cantor’s theorem shows that there cannot be a consistent set theory with a universal set. That error can be refuted by Jensen’s construction of his models for NFU, but by far the most appealing demonstration is a later construction of a model of a set theory with a universal set by Church and Oswald independently and roughly simultaneously in the early 1970s. It has the advantage over NFU of respecting extensionality: it is a genuine model of a theory of pure sets. The construction is a refinement of Rieger-Bernays methods. It goes as follows: start with a model of something like ZF, and a bijection \(k\) between \(V\) and \(V \times \{0, 1\}\). We now define a new membership relation \(E\) so that \(xEy\) iff either the second component of \(k(y)\) is 1 and \(x \in\) first component, or the second component of \(k(y)\) is 0 and \(x

ot\in\) first component. \(k^{-1}(\varnothing)\) is now the universal set. In fact \(k^{-1}(\langle x,0\rangle)\) and \(k^{-1}(\langle x,1\rangle)\) are set-complements in this new structure.

Perhaps surprisingly, given that constructions like this provide complements for all sets, they give a central rôle to the concept of wellfounded set in that the wellfounded sets of the new model can be made to be an isomorphic copy of the wellfounded sets in the structure one starts with. Further, these models satisfy a scheme of replacement for wellfounded sets: anything the same size as a wellfounded set is a set. (Replacement for, say, wellordered sets doesn’t come free in the same way)

5. Other Current Research Areas

ZF arose as an attempt to axiomatise the theory of wellfounded sets, and this principle is kept constantly in mind by the ZF-istes as they struggle to find new and more informative axioms. In contrast, the syntactic version of the noncircularity insight (which is the one that underlies NF) has been a much less fertile source of axioms for its offspring. Most new axioms for NF are either versions of AC or postulates of smooth behaviour by the \(T\) functions. It may be more natural to compare/contrast NF with Zermelo set theory rather than with ZF(C); Zermelo was a first attempt at formalising a theory of sets-of-ordinary-mathematics; it took a few decades for it to become clear that the pack of axioms needed the axiom scheme of replacement, and it could be argued that NF should be analogously augmented with axioms such as the axiom of counting (which says that the singleton function restricted to the natural numbers is a set), or with the axiom of choice for wellfounded sets.

NF and Proof Theory

The proof theory of stratified formulæ is intriguing but frustratingly little work has been done on it. The sentence

\[ \forall x[\forall y(y \in x \rightarrow \forall z(z \in y) \rightarrow \bot) \rightarrow \forall z(z \in x) \rightarrow \bot] \]

is an example of a stratified theorem of first-order logic that has stratified proofs but no stratified cut-free proofs. Crabbé [1991], [1994] has given two proofs of the fact that SF (NF with extensionality dropped altogether) admits cut-elimination. The connections with type theory make constructive versions of NF an obvious target for investigation, but nothing publishable has emerged so far. The only known proof (Specker’s) of the axiom of infinity in NF has too little constructive content to allow a demonstration that INF (NF with a constructive instead of classical ambient logic) admits an implementation of Heyting arithmetic, but opinion is divided on whether or not this tells us that INF is significantly weaker than NF.

6. Implementation of Mathematical Concepts in NF

To the extent that mathematical concepts can be implemented in the theory of wellfounded sets (aka ZF) they can also be interpreted in NF – since NF has a theory of wellfounded sets. However this is not the way in which we routinely interpret mathematics in NF since the interpreting formulæ are unstratified and unwieldy: extra axioms are needed. For example, the Von Neumann implementation of natural numbers tries to tell us that the collection of natural numbers is not the extension of a stratified formula, so it cannot be expected to be a set. By the same token, ordinary functions from the naturals to the naturals can fail to have extensions that are sets. Instead we trade on the fact that NF (and NFU too) admits a natural implementation of various kinds of mathematical entities as global isomorphism classes – which will of course be big sets (qv). The natural number \(n\) is implemented as the set of all \(n\)-membered sets: this is the Frege implementation used also in TST. Rosser [1953a] uses NF instead of ZF as a foundation for presenting ordinary mathematics (and to this day remains the only book to do so, though Holmes [1998] uses NFU for this purpose).

As we have seen, implementing mathematical objects (ZF-style) as wellfounded sets and then reasoning about those wellfounded sets) is not a very satisfactory way to proceed, since NF does not have sufficient strength for us to manipulate the implemented objects with the freedom we want. The situation with the endogenous interpretation (mathematical objects as global equivalence classes) is potentially more satisfactory. In this scenario cardinals are implemented as equivalence classes under equipollence and are (big) sets. The collection of all cardinals is a set. Ordinals can be handled similarly: ordinals are (big) sets and the collection of all ordinals is a (big) set; the collection of all groups is a set, the collection of all rings is a set; …. Being big, these sets cannot be manipulated in the freewheeling way that small sets can be. (The axiom of scheme of (full, unstratified) separation fails for big sets: if it didn’t then the Russell class would be \(\{x \in V: x

ot\in x\})\) and their manipulation is complicated by the appearance on the scene of various analogues of the \(T\) function (that we saw in connection with the cardinals) that are needed to work round the stratification constraints. However, it is most striking that this doesn’t turn out to be much of a problem. On the whole, ordinary mathematics seems to be what the Computer Scientists call strongly typed, and to a large extent the stratification constraints of NF coincide with the endogenous strong typing of the mathematical objects being implemented. Sadly there is no space here for a proper discussion of this deep and interesting question.

However, the expression “Ordinary mathematics” is disputed turf. On another front there are logicians that are fond of making the point that some of the further reaches of large cardinal theory, routinely disregarded – and even disparaged – by many mathematicians nevertheless have implications for a kind of low-level mathematics whose core significance is not in dispute. Inevitably the question of the adequacy of NF for “Ordinary mathematics” will overlap with this debate. However there is a further complication to be considered. All these big collections that implement the collection of all cardinals (or of all ordinals etc.) are nevertheless sets, so there are questions arising from their sethood that we can ask about them, and some of these questions have no parallel in ZF. What is the cofinality of the wellordering of the ordinals? Can we map onto \(V\) the set of those sets that do not map onto \(V\)? No doubt there will be people who say that these questions are not proper mathematical questions at all, but are mere pathological artefacts of a misconceived system of axioms. However, there is the intriguing possibility that answers to these questions could have implications for the study of the kind of small sets that appear in “ordinary mathematics”. No such implications have been found so far, but then no-one has looked very hard. We know of subclasses of the naturals definable by formulæ that have big sets as parameters, so we know where to start looking. It is tempting to speculate what the consequences might be for NF studies were any to be discovered.

7. Coda

The current status of NF studies is odd. In the 1970s Boffa and Coret proved some interesting results about stratified formulæ in theories of wellfounded sets, and work continues. However there is little traffic in the other direction and it seems unlikely that NF will attract the attention of a critical mass of scholars until someone succeeds in proving it consistent relative to ZF or one of its kin, (and even this is uncertain, for NFU is known to be consistent and still finds few friends) and this consistency question seems to be very hard. There are older unresolved issues in Set Theory – CH for one – but the status of NF does seem to be the oldest outstanding consistency question. The fact that NF refutes the axiom of choice sharpens the consistency problem greatly, and ought to sharpen public interest, but the widely-held view that sets are, of their essence, wellfounded represents a huge rhetorical obstacle to NF attracting the attention that its foundational interest merits. That idea is now (post Aczel [1988]) less suffocating than it was, but is still casts a shadow. The shadow it casts falls over NFU as well, which is unfortunate, since NFU is known to be consistent, and has a wealth of interesting model theory. It also distracts us from the most basic feature of NF: as a theory of a universal set, it does at least represent an earnest endeavour to take the paradoxical classes seriously, and not just erase them. “Cowards, you notice” says Higgins in Pygmalion “are always shrieking to have troublesome people killed”.