When I first encountered the ZFC set theory axioms, the notion that they in particular should be the foundation of mathematics struck me as not making a great deal of sense. What was this Axiom of Foundation:

∀x (∃y (y ∈ x)) → (∃z (z ∈ x) ∧ (∀w ~((w ∈ z) ∧ (w ∈ x)))

and why would anyone care? What do we learn about practical mathematics from this logical sentence? What makes ZFC the mathematics we care about?

Further study in mathematics tells us: not all that much. ZFC isn’t special. Set theorists study all sorts of axiomatic systems, some strong and others weaker; ZFC happens to be a point of comfort, a level of mathematical proving strength which comfortably gives us the mathematics we want (and, arguably, strange mathematical results that a few might not want). It is not so strong as to be sufficient to prove all that is true, because no logical system is. It also cannot, without appealing to a stronger system, prove that it is without contradiction. and it cannot, without appealing to a stronger logical system, prove that it contains no contradictions.

So, why ZFC? The axioms suffice to construct a mathematics that does not seem to diverge in a meaningful way from the “real world mathematics” (to be discussed later) that power the sciences. A point in space, it seems, requires three real numbers to define it. We do not actually know what is the fine-grained nature of physical space, but it feels like a continuum.

The truth is that ZFC isn’t “the foundation of mathematics” but the foundation of a mathematics, one that (a) arguably suffices for real-world purposes, (b) allows us to reason about infinite objects far beyond what seems to exist in the real world, while (c) being small enough for its axioms(*) to fit on a blackboard.

(*) Except, of course, for the need to introduce two axiom schemas, but that shall be covered.

What is ZFC?

Zermelo-Fraenkel set theory with Choice is a first-order logic with equality. Its syntax includes:

The connectives of propositional logic: ~ ∧ ∨ → ⇔ .

. An equality symbol: = .

. Grouping symbols ( and ) are used to specify order of operations. Spaces are used for clarity but have no meaning.

and are used to specify order of operations. Spaces are used for clarity but have no meaning. As many variable symbols (lower-case letters x , y , z , and so on) as are needed.

, , , and so on) as are needed. The universal and existential quantifiers: ∃ and ∀ .

All of which are called logical symbols, and a domain-specific non-logical symbol ∈ .

The logical symbols’ meanings and corresponding axioms are not included among ZFC’s axioms. All propositional tautologies are considered theorems of ZFC, but we shall not discuss them further. The same holds for first-order sentences that can be proven without using any set theory, like:

(x = y) → (y = x)

which is invariably presented as an axiom attendant to equality

∀x (x = z) → ∃x (x = z)

which follows from the quantifiers being defined in the standard way (if something is true of all x , it must be true of at least one x by definition.

When we discuss ZFC, we’ll present the axioms that define for us:

what things are and are not sets.

how they relate to each other, using the added non-logical symbol, ∈ .

When we speak of formal logic, we do not need to interpret ∈ . Formal logical systems do not describe what is “true” or “false” in any real world, but what is provable (a theorem) from the axioms. Which statements are theorems and which are not is derivable syntactically (or “typographically”) from the symbols themselves, without caring about their meaning. A machine could do the job.

All we know is that:

Some sentences, like ∃x∀y ~(y∈x) (there is an empty set) are theorems of ZFC.

(there is an empty set) are theorems of ZFC. Some valid sentences, like ∃x (x∈x) , are non-theorems of ZFC. They may be negations of known theorems, or they may be unprovable.

, are non-theorems of ZFC. They may be negations of known theorems, or they may be unprovable. Some strings of logical symbols, like ∃∈∈x→~ , are not valid sentences and shall not be considered further.

Our lack of commitment to interpretations of these symbols– using only that logical symbols have the familiar axioms, with ∈ to be introduced and applied according to ZFC’s axioms to-be-given– is necessary because, from first principles, we don’t know anything yet about the collection of values over which variables– the x ‘s quantified over by existential ( ∃x ) and universal ( ∀x ) quantifiers, is. We haven’t built a model that tells us that.

In set theory, these variables always range over sets. But what is a set? We haven’t determined that yet.

Equality has a special meaning: if x = y , they are the same object and perfectly interchangeable with one another. That is, they are elements of the same things and have the same elements or, more formally:

(x = y) → ∀z (x ∈ z) ⇔ (y ∈ z) , and (x = y) → ∀z (z ∈ x) ⇔ (z ∈ y)

Now that we’ve done this work, let’s introduce ZFC’s axioms.

Axiom 0: Sets Exist

In fact, we’ll make a stronger claim: the empty set exists.

∃x∀y ~(y ∈ x)

Literally, “there exists a set that all sets are not in”; more legibly, there exists a set with nothing in it.

We could derive the existence of this set from the Axioms of Infinity (which declares that an infinite set exists) and Comprehension (using a logical formula that is always false). We technically do not need it. So, why include it? From a programmer’s perspective, it helps. What’s the first thing one wants to know about a data structure K ? It’s, How does one create it? We now have the empty function (or method) of the Set class. Furthermore, since sets are immutable values, we can treat empty as a value.

Axiom 1: Extensionality, which lets us prove things equal.

We have an empty set. How many do we have?

It would be annoying if we had to keep track of a myriad of empty sets, all of which had differing “personalities” irrelevant to the theory. Computers can handle this extraneous detail; they must keep track of equivalent data structures if those are mutable. In set theory, though, we deal with objects that never change.

Two sets are equal if they contain the same elements. This is not innate to logical systems; we must make it an axiom of our set theory.

∀x∀y (∀z (z ∈ x) ⇔ (z ∈ y)) → (x = y)

This, in essence, gives us an equals function– or, at least, the notion required to build one. We also know everything about a set if we know its elements.

So, there is only one empty set, and we know all we need to know– there is nothing in it.

By Extensionality, there is no meaningful distinction between {x} and {x, x} or between {x, y} and {y, x}. We’ll discuss how to handle that with our next axiom.

Axiom 2: Pairing, which lets us build sets.

The Axiom of Pairing gives us the unordered world’s equivalent of a cons cell; given two sets, we can always create the set containing both.

∀x∀y ∃z ∀w (w = x) ∨ (w = y) ⇔ (w ∈ z)

This will have one or two elements; one, in the special case where x = y .

We shall later see that Pairing is redundant. It could be proven using Infinity, Comprehension, and Replacement, to be discussed below; but that would be monstrously inelegant. From a computational perspective, we would not construct an ordered pair by constructing an infinite data structure, then choosing two elements.

Pairing upholds the notion of sets as containing more than one thing. It also indicates that there is only one set universe; there are no x and y that live in disjoint universes, because ZFC will always contain {x, y}.

From this, we can build the notion of an ordered pair: we interpret {{x}, {x, y}} as <x, y>.

So far, though, we can only build sets with zero, one, or two elements. We’re quite limited in what we can do.

Axiom 3: Union, which flattens set structures and yields bigger sets.

The Axiom of Union allows us to collect all the sets that live in a given set into one.

∀x ∃u ∀y (y ∈ u) ⇔ ∃t (y ∈ t) ∧ (t ∈ x)

So, from {{x, y}, {z, w}} we can make {x, y, z, w}. We can think of this as a flatten function that removes (or interprets) one level of set structure, or as set theory’s natural notion of what functional programming calls a reduce .

The binary union operator (∪) is a special case of this axiom: x ∪ y is the set that must exist by applying Union to {x, y} , which must exist by Pairing.

This gives us the ability to generate the fundamental objects of arithmetic, the natural numbers.

0 := {} x + 1 := x ∪ {x}

It may not be meaningful to say, in an absolute sense, that the natural number 3 is the set constructed in this way; but we can interpret that set as 3. (Building the rest of arithmetic, such as addition and multiplication, can be done in the first-order logic of ZFC; but it is tedious and will not be done here.) Note that this 3 is defined as the set {0, 1, 2}; this practice continues with infinite ordinals: for example, the smallest infinite ordinal ω is identified with the set of all natural numbers.

Axiom (Schema) 4: Comprehension, the filter of sets.

We’ve developed tools for making sets larger. Comprehension gives us a tool for making them smaller. A definable subset of a set ought also to be a set.

For any well-formed logical formula with one free variable, F , we have:

∀x ∃y ∀z (z ∈ y) ⇔ (z ∈ x) ∧ F[z]

In other words, given a set x and a definable logical property F , the y containing all elements z for which F[z] is a set. This corresponds to filter in functional programming, with the F taking the role of the function argument.

This is not, technically speaking, an axiom. It’s an axiom schema. It represents a countably infinite set of axioms, one for each of the F that can be defined. In this way, ZFC doesn’t have have nine (or so) axioms but infinitely many. That turns out to be no problem in practice. As long as a computer can check within finite time whether a sentence is or is not an axiom, we’re okay.

We do not need Filter (or Replacement, to be discussed) in the hereditarily finite world– the world where all sets (including the sets within sets) are finite. Nor do we need the Axiom of the Power Set. We will need these axioms if we want to wrangle which infinite sets, which of course we do.

Axiom (Schema) 5: Replacement or map .

One of the most subtle axioms of set theory is that of Replacement. It seems less powerful than it is. In fact, the largest infinities known to ZFC can only be constructed with Replacement. This is counterintuitive. In the finite world, the Power Set Axiom generates “much larger” sets (from size n to size 2n) but Replacement does not; used alone, it creates a set no larger than an existing set.

The Axiom Schema of Replacement gives us, for every logical formula F with two free variables, if F[a, b] is true for exactly one b per a , this sentence.

∀x ∃y (∀a ∀b (a ∈ x) → F[a, b] → (b ∈ y)) ∧ (∀c (c ∈ y) → ∃d ((d ∈ x) ∧ F[d, c]))

The F described behaves likes like a function the elements of x to those of y . It says that if {a, b, c, …} is a set, then so is {f(a), f(b), f(c), …}.

You can get a simpler formulation if you demand that F be one-to-one; then you have:

∀x ∃y (∀a ∀b F[a, b] → ((a ∈ x) ⇔ (b ∈ y))

In ZFC, both presentations– the simpler one mandating a one-to-one F , or the more complex one without that restriction– generate the same sets, and result in the same theory.

In computational terms, this is akin to that crown jewel of functional programming, map .

Axiom 6: Power Set

The axiom of the Power Set is stated thus:

∀x ∃p (∀y (∀z (z ∈ y) → (z ∈ x)) ⇔ (y ∈ p))

In other words, for every x , there is a larger set p that contains all of x ‘s subsets y . In the finite world, this is strictly larger: we jump from n elements to 2n elements. That turns out to be true in the infinite world, where size is a more complicated matter, as well. There is always a bigger set. Since there is no largest set, there is no “set of all sets”. We’ll get more specific on this when we discuss foundation.

Axiom 7: Infinity, where the fun begins.

We have the tools to build up (and tear down) all the natural numbers, and all finite sets of natural numbers, all finite sets of finite sets of natural numbers, and so on.

We don’t yet have infinite sets. None of our axioms provides that one must exist. By the tools we have, we can’t get to the infinite world from what we have.

We have infinitely many sets already. We have the natural numbers. That’s an outside-the-system judgment about what exists in our world thus far. We don’t yet have that {0, 1, 2, …} is itself a set. Since the purpose of set theory is to wrangle with notions of infinity, and since we need an axiomatic statement that one exists, we add:

∃n ∃z (z ∈ n) ∧ (∀x (x ∈ n) → ∃s ((s ∈ n) ∧ (x ∈ s) ∧ (∀y (y ∈ x) → (y ∈ s))

What does this say? It says that there exists some set n that is not empty (that is, it contains at least one element z ) and that, for every x in it, contains the s(x) defined as x ∪ {x} . This will be a larger set than x , so long as we never have x ∈ x , which we’ll establish to be the case when with Foundation, next.

We could, alternatively, make it an axiom that N , the set of natural numbers exists. This is achieved by setting z to the empty set. In that case, we don’t need to use ∀x ~(x ∈ x) , which we haven’t yet established. In ZFC, it is irrelevant whether we use the weaker “an infinite set” exists or construct specifically the set of natural numbers; with Replacement, they are equivalent.

We’ve now reached a space beyond what we can compute. We can conceive of the infinite and we can write programs that never terminate (using a while loop) but we cannot store a completed infinity in a computer. As infinities get larger, and sets more complicated, the divergence between what “exists mathematically” and what we can realize, computationally, grows.

An aside on Replacement, massive infinities, and “ordinary mathematics”.

We define two sets x and y to have the same size (or cardinality) if we can define a function from x to y that is invertible (one-to-one) covers all of y (onto). It turns out that for every set x, the power set P(x) is larger. So from the smallest infinity of the natural numbers, N, we get the larger infinities P(N), P(P(N)), P(P(P(N))), and so on.

If we exclude the Axiom of Replacement, we still have a mathematical universe that contains:

the natural numbers,

negative, rational, real, and complex numbers, which can be constructed using set-theoretic tools,

sets of (and sets of sets of, and sets of sets of sets of) mathematical entities above,

algebraic entities like groups, rings, fields, and vector spaces,

functions on the mathematical entities above (e.g., functions C 8 → C ),

→ ), and higher-order functions that can accept as arguments or return the entities above.

So, even bizarre mathematical objects like {7, 3.14159…, 3+4i, {(λx → x + 3), -17, {}}} exist in our universe. We have ordinary mathematics– almost all of mathematics excluding set theory.

What don’t we have, without Replacement? We don’t have this thing as a set:

{N, P(N), P(P(N)), P(P(P(N))), … }

or its union; both of which, most of us feel, deserves to be a set. That massive infinite set, bigger than anything we encounter in ordinary mathematics, may be not be something we need in daily math… but it seems that it should deserve to exist in the mathematical universe.

Axiom 8: Foundation, the limiter.

So far, we’ve discussed axioms that create sets. We have the tools to create the sets we want. How do we avoid calling things sets that we don’t want?

First-order logic can’t limit size. We can’t say that our mathematical universe is “only yay big”. For example, ZFC without Replacement could not generate the large set described above, but it also cannot prove that entity not to be a set. So, there will always be more things in the heaven and earth of our first-order logic than are dreamt-of in our philosophy; but, we can limit undesirable behaviors.

In programming terminology, we can impose a contract that sets must follow.

The Axiom of Foundation is the only one to place limits on set-ness, the only one to prevents us from making sets.

I put it toward the end because its formulation is the most opaque. It is:

∀x (∃y (y ∈ x)) → (∃z (z ∈ x) ∧ (∀w ~((w ∈ z) ∧ (w ∈ x)))

To which an appropriate response might be, What the hell does that mean?

The answer is: For every set x that is not empty (that is, has a y in it), there is a z in it that is disjoint from x .

We know what it means; the question, then is, Why the hell do we care?

The most important consequence of this might be that no set contains itself. Therefore, we cannot define the set x = {x} . In computer science, self-containing data structures are admissible. (To paraphrase Trump, when you’re a * , pointers let you do it.) Mathematicians, however, don’t want their foundations be self-referential. (There are mathematical structures, like graphs, that allow such recursion and self-reference, but sets themselves should not be.) Why? Russell’s Paradox. If sets can contain themselves, and all collections are sets, then define a “Russell set” as any set that doesn’t contain itself. Is the set of all Russell sets a Russell set? If the set of all Russell sets is a Russell set, then it contains itself, so it’s not a Russell set; if it’s not a Russell set, then it is. Mathematics becomes inconsistent and the world blows up. Clearly, mathematics can’t afford to confer set-ship on just anything.

The more general result of foundation: there is no infinite descending chain of sets a b c ... It is a consequence of this that there are no self-containing sets or set-membership cycles; e.g., there are no x , y , z for which x ∈ y , y ∈ z , and z ∈ x ).

An alternate way to think of this is that one can always “get to the bottom of” a set, even if it’s infinite. Or: imagine a single-player game like Nim, but instead of stones on a table (from which a player takes some, each turn) there is a set X “on the table”. Each turn, the player selects an element a of X and “places” that a on the table, unless a is the empty set, in which case the game terminates.

For example, letting N be the set of natural numbers, we could have the following state evolution:

Table: {3, 17, {4, 29, {6}}, P(N)}

... player chooses P(N)

... Table : P(N) = {{}, {0}, {1}, {0, 1}, ... }

... player chooses {2, 3, 5, 7, 11, ...}

... Table : {2, 3, 5, 7, 11, ...}

... player chooses 173

... Table: 173 = {0, 1, 2, ..., 172}

... no more than 173 steps later...

... Table : 0

Foundation means that this game always terminates in finite time. It can take an arbitrarily long time– the player could have chosen 82,589,933, or the first prime number larger than TREE(4,567), instead of 173– but there are no infinite paths, assuming that what’s on the table is a well-founded set.

Axiom 9: Choice, the controversial one.

I’ve saved the “controversial” Axiom of Choice, the namesake for the “C” in ZFC, for last.

Is it controversial? Well, let’s understand what a mathematical controversy is. Mathematicians do not, by and large, get into arguments about whether Choice is true or false; it is neither. There are valid mathematical systems with it, and others without it. It is subjective, which systems one prefers to study. But no mathematician will say that the Axiom of Choice is “wrong”.

Although I have worked entirely in the bare language of first-order logic plus equality and the “non-logical” ∈ , I will, to make notation easier, include the ordered-pair notation <a, b> = {{a}, {a, b}} . Extending the alphabet of a logical language can be done in a principled, conservative way (that is, one that does not produce new theorems) but I will skip over the details. One who wants them can read Kenneth Kunen’s 2011 book, Set Theory.

The Axiom of Choice is:

∀x ∃f ∀a (a ∈ x) → (∃c c ∈ a) → ∃p (p ∈ f) ∧ (p = <a, b>) ∧ (b ∈ a)

This means that for any set x we have a choice function f – a set of ordered pairs– that for all non-empty a in x contains an <a, b> for which b ∈ a .

In other words, from any set of sets, we can derive a choice function that maps each of its nonempty sets to one of its elements (that is, it “picks” one).

Choice, of all the mathematical axioms, is most flagrant in divergence from the real world. Infinity can be problematic, but physics forces us to grapple with its possibility. We do not know the fine-grained nature of physical space, but it seems to exhibit rotational symmetry (that is, there are no privileged x-, y-, and z-axes, but the basis vectors we choose) which suggests strongly that it is R3. Infinite sets, we need for geometry to work.

Choice, however, is counter-physical. If we agree that choosing an element from a set is computational, and therefore requires energy (or increases the entropy of the universe, or costs in some currency) than to realize a choice function has an infinite cost. In other words, we are calling things “sets” that have no presence in the physical world.

To wit, in a world with Choice, events can be described (in theory) that have no probability. That is not: They have zero probability; rather, it creates a contradiction to assign them probability zero or to assign any positive probability to them. More troubling, a sphere can be decomposed into five pieces and rearranged into two spheres of the same size. While this absurdity cannot be realized with a real-world object, it does establish the behavior of mathematics, with Choice, to be counter-physical.

So, why have it? Well, there are mathematical objects that do not exist in a world without Choice. A vector space might not have a basis, in ZF + ~C. Mathematics isn’t supposed to be about what we can make in the real world, but what exists theoretically.

One source of trouble with Choice, I think, is that as beginning or amateur mathematicians, it’s tempting to think of an agent actively choosing. There’s a quip about the Axiom of Choice: you don’t need it to choose one shoe each from an infinite set of pairs of shoes, but you do need it to choose one sock each from an infinite set of pairs of socks. (The shoes are distinguishable; the socks are not. Analogously, Comprehension and Replacement derive from existing logical formulae; Choice is fully independent of how the choosing is done, and may be arbitrary.) The problem is that Choice is not about “you can choose…” because mathematics exists in an ethereal world where there is no “you” or “I”, nor an energetic process of “choosing”. The real debate is not about “you can…” or “you cannot…” but about what we choose to call a set. If we want to include among sets arbitrary selections, we need Choice.

Mathematical existence, itself, is a slippery and tricky notion. For example, we’ve covered many of the sets that exist and we can build. Now, this is a set:

{n : 2^2^n is prime}

However, which set? Is it finite or infinite? We don’t know. Most likely, it’s the very simple set, {0, 1, 2, 3, 4}. At present, though, we don’t know what it is.

Computational Analogue

To wrap this up. let’s think of Set as a type (or a class) in a programming language. We’ll pretend we have a computer with infinite space, so infinity is no issue. The axioms of set theory support the following interface:

empty :: Set

-- () A set with no elements.

elementOf :: Set -> Set -> Bool

-- (x, y) Returns True if x ∈ y, False if not.

equal :: Set -> Set -> Bool

-- (x, y) Two sets are equal if they contain exactly the same elements.

pair :: Set -> Set -> Set

-- (x, y) Returns {x, y}.

union :: Set -> Set

-- (x) Returns {a : a ∈ t ∈ x for some t}. Or: flattens a set of sets by one level.

filter :: Set -> WFP -> Set

-- Comprehension

-- (x, F) F is a well-formed predicate with one free variable; returns {a : a ∈ x where F[a]}.

map :: Set -> WFF -> Set

-- Replacement

-- (x, F) F is a well-formed binary predicate for which F(x, y) holds for exactly one y per x, corresponding to a function f; returns {f(a) : a ∈ x}.

powerSet :: Set -> Set

-- (x) Returns {y : y is a subset of x}.

infinity :: Set

-- () A set of infinite cardinality.

validate :: Set -> ()

-- Foundation

-- (x) Asserts that x has an element disjoint from itself. If all sets created pass validation, then no infinite descending chains (or cycles) of ∈ exist; every set is well-founded.

choice :: X@Set -> (A@X -> Either () A)

-- (x) Returns a function on x that maps every non-empty a ∈ x to [Right b] for some b ∈ a, and the empty set to [Left ()].

Conclusion

Set theory has a reputation for being painfully abstract and opaque. Much work is required to get from the expression of, say, Foundation, to what it means and why we care about it.

We learn later that the axiom exists because set-theorists did not want to contend with the notion of unwrapping sets ad infinitum; they did not want a theory that wasn’t well-founded or whose subtle imprecisions could destroy all of mathematics (as Russell’s Paradox did naive set theory). What makes for compact formalism is:

∀x (∃y (y ∈ x)) → (∃z (z ∈ x) ∧ (∀w ~((w ∈ z) ∧ (w ∈ x)))

but without additional context, the formula is calamitously uninteresting. The human element (why we care) and the story behind axiomatic set theory are (as they should be, in formalism) absent.

I bring this up because, for me, Foundation was the hardest of the axioms to understand its motivation. (Replacement was difficult to motivate, until I realized it was just map or, in the Python world, a list comprehension.) Foundation’s often presented early in set theory, and not without good reasons. It is the only axiom that mandates certain possibilities not be sets, ensuring we don’t have to worry about Russell’s Paradox.

An interesting consequence of this is that we can now discuss what the variables in ZFC, a first-order logic, range over. When we say ∀x something, in ZFC, what is the space we’re quantifying over? All sets. But there is no “set of all sets” in ZFC. We need a bigger logical system to think about what ZFC actually means. This does not invalidate mathematics; arguably, this sort of distinction is what saves it.

Set theory’s implications are complicated and abstruse, but most of the axioms on which it’s built have computational analogues that a programmer uses every day. Comprehension is like filter , Replacement like map .

What about Choice? Well, Choice gives us that weird, hackish hand-written function that may be a giant switch / case block. On many infinite sets, we cannot hope to code it by hand or realize it in physical space. However, it exists in set theory, so long as we want set theory to contain everything that mathematics can imagine.

Set theory, in all its glory, isn’t the easiest subject to comprehend. The simplicity of it, as a first-order logic with only one added symbol ( ∈ ), renders it difficult to reconcile it with the complexities of full mathematics, much in the way that quantum physics is not hard to comprehend but difficult to apply to real systems. The more perspectives we can take to this, the easier it might be to learn. I hope that by focusing on computation and construction, as imperfect as those lenses are, I’ve added one more perspective and that it helps.