This time I’d like to think about three different approaches to ‘defining equality’, or more generally, introducing equality in formal systems of mathematics.

These will be taken from old-fashioned logic — before computer science, category theory or homotopy theory started exerting their influence. Eventually I want to compare these to more modern treatments.

If you know other interesting ‘old-fashioned’ approaches to equality, please tell me!

The equals sign is surprisingly new. It was never used by the ancient Babylonians, Egyptians or Greeks. It seems to originate in 1557, in Robert Recorde’s book The Whetstone of Witte. If so, we actually know what the first equation looked like:

As you can see, the equals sign was much longer back then! He used parallel lines “because no two things can be more equal.”

Formalizing the concept of equality has raised many questions. Bertrand Russell published The Principles of Mathematics [R] in 1903. Not to be confused with the Principia Mathematica, this is where he introduced Russell’s paradox. In it, he wrote:

identity, an objector may urge, cannot be anything at all: two terms plainly are not identical, and one term cannot be, for what is it identical with?

In his Tractatus, Wittgenstein [W] voiced a similar concern:

Roughly speaking: to say of two things that they are identical is nonsense, and to say of one thing that it is identical with itself is to say nothing.

These may seem like silly objections, since equations obviously do something useful. The question is: precisely what?

Instead of tackling that head-on, I’ll start by recalling three related approaches to equality in the pre-categorical mathematical literature.

The indiscernibility of identicals

The principle of indiscernibility of identicals says that equal things have the same properties. We can formulate it as an axiom in second-order logic, where we’re allowed to quantify over predicates P P :

∀ x ∀ y [ x = y ⇒ ∀ P [ P ( x ) ⇔ P ( y ) ] ] \forall x \forall y [x = y \; \implies \; \forall P \, [P(x) \; \iff \; P(y)] ]

We can also formulate it as an axiom schema in 1st-order logic, where it’s sometimes called substitution for formulas. This is sometimes written as follows:

For any variables x , y x, y and any formula ϕ \phi , if ϕ ′ \phi' is obtained by replacing any number of free occurrences of x x in ϕ \phi with y y , such that these remain free occurrences of y y , then x = y ⇒ [ ϕ ⇒ ϕ ′ ] x = y \;\implies\; [\phi \;\implies\; \phi' ]

I think we can replace this with the prettier

x = y ⇒ [ ϕ ⇔ ϕ ′ ] x = y \;\implies\; [\phi \;\iff \; \phi']

without changing the strength of the schema. Right?

We cannot derive reflexivity, symmetry and transitivity of equality from the indiscernibility of identicals. So, this principle does not capture all our usual ideas about equality. However, as shown last time, we can derive symmetry and transitivity from this principle together with reflexivity. This uses an interesting form of argument where take “being equal to z z ” as one of the predicates (or formulas) to which we apply the principle. There’s something curiously self-referential about this. It’s not illegitimate, but it’s curious.

The identity of indiscernibles

Leibniz [L] is often credited with formulating a converse principle, the identity of indiscernibles. This says that things with all the same properties are equal. Again we can write it as a second-order axiom:

∀ x ∀ y [ ∀ P [ P ( x ) ⇔ P ( y ) ] ⇒ x = y ] \forall x \forall y [ \forall P [ P(x) \; \iff \; P(y)] \; \implies \; x = y ]

or a first-order axiom schema.

We can go further if we take the indiscernibility of identicals and identity of indiscernibles together as a package:

∀ x ∀ y [ ∀ P [ P ( x ) ⇔ P ( y ) ] ⇔ x = y ] \forall x \forall y [ \forall P [ P(x) \; \iff \; P(y)] \; \iff \; x = y ]

This is often called the Leibniz law. It says an entity is determined by the collection of predicates that hold of that entity. Entities don’t have mysterious ‘essences’ that determine their individuality: they are completely known by their properties, so if two entities have all the same properties they must be the same.

This principle does imply reflexivity, symmetry and transitivity of equality. They follow from the corresponding properties of ⇔ \iff in a satisfying way. Of course, if we were wondering why equality has these three properties, we are now led to wonder the same thing about the biconditional ⇔ \iff . But this counts as progress: it’s a step toward ‘logicizing’ mathematics, or at least connecting = = firmly to ⇔ \iff .

Apparently Russell and Whitehead used a second-order version of the Leibniz law to define equality in the Principia Mathematica [RW], while Kalish and Montague [KL] present it as a first-order schema. I don’t know the whole history of such attempts.

When you actually look to see where Leibniz formulated this principle, it’s a bit surprising. He formulated it in the contrapositive form, he described it as a ‘paradox’, and most surprisingly, it’s embedded as a brief remark in a passage that would be hair-curling for many contemporary rationalists. It’s in his Discourse on Metaphysics, a treatise written in 1686:

Thus Alexander the Great’s kinghood is an abstraction from the subject, and so is not determinate enough to pick out an individual, and doesn’t involve the other qualities of Alexander or everything that the notion of that prince includes; whereas God, who sees the individual notion or ‘thisness’ of Alexander, sees in it at the same time the basis and the reason for all the predicates that can truly be said to belong to him, such as for example that he would conquer Darius and Porus, even to the extent of knowing a priori (and not by experience) whether he died a natural death or by poison — which we can know only from history. Furthermore, if we bear in mind the interconnectedness of things, we can say that Alexander’s soul contains for all time traces of everything that did and signs of everything that will happen to him — and even marks of everything that happens in the universe, although it is only God who can recognise them all. Several considerable paradoxes follow from this, amongst others that it is never true that two substances are entirely alike, differing only in being two rather than one. It also follows that a substance cannot begin except by creation, nor come to an end except by annihilation; and because one substance can’t be destroyed by being split up, or brought into existence by the assembling of parts, in the natural course of events the number of substances remains the same, although substances are often transformed. Moreover, each substance is like a whole world, and like a mirror of God, or indeed of the whole universe, which each substance expresses in its own fashion — rather as the same town looks different according to the position from which it is viewed. In a way, then, the universe is multiplied as many times as there are substances, and in the same way the glory of God is magnified by so many quite different representations of his work.

(Emphasis mine — you have to look closely to find the principle of identity of indiscernibles, because it goes by so quickly!)

There have been a number of objections to the Leibniz law over the years. I want to mention one that might best be handled using some category theory. In 1952, Max Black [B] claimed that in a symmetrical universe with empty space containing only two symmetrical spheres of the same size, the two spheres are two distinct objects even though they have all their properties in common.

As Black admits, this problem only shows up in a ‘relational’ theory of geometry, where we can’t say that the spheres have different positions — e.g., one centered at the points ( x , y , z ) (x,y,z) , the other centered at ( − x , − y , − z ) (-x,-y,-z) — but only speak of their position relative to one another. This sort of theory is certainly possible, and it seems to be important in physics. But I believe it can be adequately formulated only with the help of some category theory. In the situation described by Black, I think we should say the spheres are not equal but isomorphic.

As widely noted, general relativity also pushes for a relational approach to geometry. Gauge theory, also, raises the issue of whether indistinguishable physical situations should be treated as equal or merely isomorphic. I believe the mathematics points us strongly in the latter direction.

A related issue shows up in quantum mechanics, where electrons are considered indistinguishable (in a certain sense), yet there can be a number of electrons in a box — not just one.

But I will discuss such issues later.

Extensionality

In traditional set theory we try to use sets as a substitute for predicates, saying x ∈ S x \in S as a substitute for P ( x ) P(x) . This lets us keep our logic first-order and quantify over sets — often in a universe where everything is a set — as a substitute for quantifying over predicates. Of course there’s a glitch: Russell’s paradox shows we get in trouble if we try to treat every predicate as defining a set! Nonetheless it is a powerful strategy.

If we apply this strategy to reformulate the Leibniz law in a universe where everything is a set, we obtain:

∀ S ∀ T [ S = T ⇔ ∀ R [ S ∈ R ⇔ T ∈ R ] ] \forall S \forall T [ S = T \; \iff \; \forall R [ S \in R \; \iff \; T \in R]]

While this is true in Zermelo-Fraenkel set theory, it is not taken as an axiom. Instead, people turn the idea around and use the axiom of extensionality:

∀ S ∀ T [ S = T ⇔ ∀ R [ R ∈ S ⇔ R ∈ T ] ] \forall S \forall T [ S = T \; \iff \; \forall R [ R \in S \; \iff \; R \in T]]

Instead of saying two sets are equal if they’re in all the same sets, this says two sets are equal if all the same sets are in them. This leads to a view where the ‘contents’ of an entity as its defining feature, rather than the predicates that hold of it.

We could, in fact, send this idea back to second-order logic and say that predicates are equal if and only if they hold for the same entities:

∀ P ∀ Q [ ∀ x [ P ( x ) ⇔ Q ( x ) ] ⇔ P = Q ] \forall P \forall Q [\forall x [P(x) \; \iff \; Q(x)] \; \iff P = Q ]

as a kind of ‘dual’ of the Leibniz law:

∀ x ∀ y [ ∀ P [ P ( x ) ⇔ P ( y ) ] ⇔ x = y ] \forall x \forall y [ \forall P [ P(x) \; \iff \; P(y)] \; \iff \; x = y ]

I don’t know if this has been remarked on in the foundational literature, but it’s a close relative of a phenomenon that occurs in other forms of duality. For example, continuous real-valued functions F , G F, G on a topological space obey

∀ F ∀ G [ ∀ x [ F ( x ) = G ( x ) ] ⇔ F = G ] \forall F \forall G [\forall x [F(x) \; = \; G(x)] \; \iff F = G ]

but if the space is nice enough, continuous functions ‘separate points’, which means we also have

∀ x ∀ y [ ∀ F [ F ( x ) = F ( y ) ] ⇔ x = y ] \forall x \forall y [ \forall F [ F(x) \; = \; F(y)] \; \iff \; x = y ]

Notes