← →

Intuitionistic mathematics for physics

Andrej Bauer

,

At MSFP 2008 in Iceland I chatted with Dan Piponi about physics and intuitionistic mathematics, and he encouraged me to write down some of the ideas. I have little, if anything, original to say, so this seems like an excellent opportunity for a blog post. So let me explain why I think intuitionistic mathematics is good for physics.



Intuitionistic mathematics, whose main proponent was L.E.J. Brouwer, is largely misunderstood by mathematicians. Consequently, physicists have strange ideas about it, too. For example, David Deutsch somehow managed to write in his otherwise excellent popular science book “The Fabric of Reality” that intuitionists deny existence of infinitely many natural numbers (those would be the ultrafinitists, if there are any). He also produced rather silly arguments against intuitionistic mathematics, which I explained to myself by believing that he never had a chance to learn that intuitionistic mathematics supports his point of view.

While Brouwer’s and other preintuitionists’ reasons for intuitionistic mathematics were philosophical in nature, there is today a vibrant community of mathematicians, logicians, computer scientists, and even the odd physicist, who work with intuitionistic mathematics not because of their philosophical conviction but because it is simply the right kind of math for what they are doing.

Intuitionistic understanding of truth

A common obstacle in understanding intuitionistic logic is the opinion that the difference between classical and intuitionistic logic arises because classicists and intuitionists just happen to disagree about what is true. A typical example of this is the principle known as Proof by Contradiction:

For every proposition $\phi$, if $\phi$ is not false then $\phi$ is true.

With a formula we write this as

$\forall \phi \in \mathsf{Prop}, \lnot \lnot \phi \Rightarrow \phi$.

Classical mathematicians accept it as true. Intuitionists do not accept it, but neither do they claim it is false. In fact, they claim that the principle has no counterexamples, that is

$\lnot \exists \phi \in \mathsf{Prop},

\lnot (\lnot \lnot \phi \Rightarrow \phi)$.

This becomes very confusing for classical mathematicians who think that the two displayed formulae are equivalent, because they believe in Proof by Contradiction. It is like believing that the Earth is flat while trying to make sense of Kepler’s Laws of planetary motion.

The difference between intuitionistic and classical logic is in the criteria for truth, i.e., what evidence must be provided before a statement is accepted as true. Speaking vaguely, intuitionistic logic demands positive evidence, while classical logic is happy with lack of negative evidence. The intuitionist view is closer to the criterion of truth in science, where we normally confirm a statement with an experiment (positive evidence), but this analogy should not be taken too far.

What counts as “evidence” is open to interpretation. Before I describe the three most common ones below, let me just explain the difference between $\phi$ (“$\phi$ is true”) and $\lnot \lnot \phi$ (“$\phi$ is not false”). Intuitionistically:

$\phi$ holds if there is positive evidence supporting it,

$\lnot \phi$ holds if it is contradictory to assume $\phi$, that is to say, evidence of $\phi$ would entail a contradiction.

$\lnot \lnot \phi$ holds if it is contradictory to assume that it is contradictory to assume $\phi$.

That is a bit complicated. In essence, it says that $\lnot \lnot \phi$ is accepted when there is no evidence against it. In other words, $\lnot \lnot \phi$ means something like “$\phi$ cannot be falsified” or “$\phi$ is potentially true”. For example, if someone says

“There is a particle which does not interact with anything in the universe.”

that would be a statement which is not accepted as true, for how would you ever present positive evidence? But it is accepted as potentially true, for how would you ever falsify it?

A statement which is logically equivalent to one of the form $\lnot \lnot \phi$ is called doubly negated. For the purposes of this post I shall call a statement $\phi$ potentially true if its double negation $\lnot \lnot \phi$ is true. It seems nontrivial to come up with useful statement in physics which are only potentially true (but see the discussion about infinitesimals below). Perhaps Karl Popper would have something to say about that.

Let me now describe three most common interpretations of “evidence” in intuitionistic logic.

Computational interpretation

This is the interpretation of intuitionistic logic commonly presented in computer science. We view all sets as represented by suitable data structures—a reasonable point of view for a computer scientist. Then a statement is taken to be true if there exists a program (computational evidence) witnessing its truth. To demonstrate the idea, consider the statement

$\forall x \in A, \exists y \in B, \phi(x, y)$.

This is taken to be true if there exists a program which accepts $x$ and outputs $y$ together with computational evidence that $\phi(x,y)$ holds. Another example: the statement

$\forall x \in A, \phi(x) \lor \psi(x)$

is true if there exists a program which takes $x$ as input and outputs either $0$ and evidence of $\phi(x)$, or $1$ and evidence of $\psi(x)$. In other words, the program is a decision procedure which tells us which of the two disjuncts holds, and why. Under this interpretation the Law of Excluded Middle fails because there are unsolvable decision problems, such as the Halting problem.

The computationally minded readers might entertain themselves by figuring out a computational explanation of potentially true statements (Hint: first interpret Pierce’s Law in terms of continuations). I have not done it myself.

Topological interpretation

We may replace the phrases “data structure” and “program” in the computational interpretation by “topological space” and “continuous function”, respectively. Thus a statement is true if it is witnessed by a continuous function which transforms input (hypotheses) to output (conclusions).

The basis for this explanation may be found in physics if we think about what it means for a function to be continuous in terms of communication or information processing. Suppose an observer wants to communicate a real-valued quantity $x$ to another observer. They can do it in many ways: by making sounds, by sending electromagnetic signals, by sending particles from one place to another, by manufacturing and sending a stick of length $x$ by mail, etc. However, as long as they use up only a finite amount of resources (time, space, energy) they will be able to communicate only a finite amount of information about $x$. Similarly, in any physical process (computer, brain, abacus) which transforms an input value $x$ to an output value $f(x)$ the rate of information flow is finite. Consequently, in finite time the process will obtain only a finite amount of information about $x$, on the basis of which it will output a finite amount of information about $f(x)$. This is just the definition of continuity of $f$ phrased in terms of information flow rather than $\epsilon$ and $\delta$. Notice that we are not assuming that $f$ is computable because we do not want to make the rather sweeping assumption that all physical processes are computable.

The conclusion is that “all functions are continuous”, including those that witness truth of statements.

You might be thinking that an analog-to-digital converter is a counterexample to the above argument. It is a device which takes as input an electric signal and outputs either 0 or 1, depending on whether the voltage of the signal is below or above a given threshold. Indeed, this would be a discontinuous function, if only such converters worked exactly. But they do not, they always have a tolerance level, and the manufacturer makes no guarantees about it working correctly very close to the threshold value.

A useful exercise is to think about the difference between “all functions are continuous”, “potentially all functions are continuous”, and “all functions are potentially continuous”. Which one does the above argument about finite rate of information processing support?

Local truth

This explanation of intuitionistic logic is a bit more subtle, but also much more powerful and versatile. It is known by categorical logicians as the Kripke-Joyal or sheaf semantics, while most logicians are familiar at least with the older Kripke semantics.

Imagine a planet and a meteorologist at each point of the surface, measuring the local temperature $T$. We assume that $T$ varies continuously with position. A statement such as $T > 273$ is true at some points of the planet and false at others. We say that it is locally true at $x$ if there exists a small neighborhood around $x$ where it is true. In other words, a statement is locally, or stably true at a given point if it remains true when we perturb the point a little.

On this planet a statement is globally true if it is locally true everywhere, and it is globally false if its negation is locally true everywhere. There are also many intermediate levels of truth. The truth value (a measure of truth) of a statement is the set of those points at which the statement is locally true. Such a set is always open.

The explanation so far is a bit wrong. For a statement to be locally true at $x$, not only must it be true in a neighborhood of $x$, but it must also be true everywhere in the neighborhood “for the same reason”. For example, the statement

$T > 273$ or $T \leq 273$

is true at $x$ if there exists a neighborhood $U$ of $x$ such that $T > 273$ everywhere on $U$, or $T \leq 273$ everywhere on $U$. The reason, namely which of the two possibilities holds, must be the same everywhere on $U$.

The truth value of $T = 273$ is the interior of the set of those points at which $T$ equals 273, while the truth value of $T

eq 273$ is the exterior of the set of those points at which $T$ equals 273. Thus the truth value of the disjunction

$T = 273$ or $T

eq 273$

need not be the entire planet—it will miss isolated points at which $T$ is 273. The Law of Excluded Middle is not valid.

By changing the underlying space and topology, we can express various notions of truth. We can, for example, incorporate passage of time, or a universe branching into possible worlds. In the most general case the underlying space need not even be a space, but a category with a so-called Grothendieck topology which determines what “locally” means.

Apart from being a wonderful mathematical tool, it should be possible to use sheaf semantics to clarify concepts in physics. I would expect the notions of “truth stable under small perturbation” and “truth local to an observer” to appeal to physicists. Fancy kinds of sheaf semantics have been proposed to explain features of quantum mechanics, see for example this paper by Bas Spitters and his coworkers.

Smooth infinitesimal analysis

Philosophical explanations and entertaining stories about intuitionistic mathematics are one thing, but getting actual benefits out of it are another. For physicists this means that they will want to calculate things with it. The good news is that they are already doing it, they just don’t know it!

There is something odd about how physicists are taught mathematics—at least in my department. Physics majors learn the differential and integral calculus in the style of Cauchy and Weierstrass, with $\epsilon$–$delta$ definitions of continuity and differentiability. They are told by math professors that it is a sin to differentiate a non-differentiable function. They might even be told that the original differential and integral calculus, as invented by Leibniz and Newton, was flawed because it used the unclear concept of infinitesimals, which were supposed to be infinitely small yet positive quantities.

Then these same students go to a physics class in which a physics professor never performs $\epsilon$–$\delta$ calculations, freely differentiates everything in sight, and tops it off by using the outlawed infinitesimals to calculate lots of cool things. What are the students supposed to think? Clearly, the “correct” mathematics is useless to them. It’s a waste of time. Why aren’t they taught mathematics that gives a foundation to what the physics professors are actually doing? Is there such math?

Yes there is. It’s the mathematics of infinitesimal calculus, brought forward to the 20th century by Anders Kock and Bill Lawvere under the name Synthetic Differential Geometry (SDG), or Smooth Infinitesimal Analysis. (I am too young to know exactly who invented what, but I’ve heard people say that Eduardo Dubuc also played a part. I would be happy to correct bibliographical omissions on my part.) By the way, I am not talking about Robinson’s non-standard analysis, which uses classical logic.

This is not the place to properly introduce synthetic differential geometry. I will limit myself to a few basic ideas and results. For a first reading I highly recommend John Bell’s booklet A Primer of Infinitesimal Analysis. If you refuse to read physical books, you may try his shorter An Invitation to Smooth Infinitesimal Analysis online. For further reading Anders Kock’s Synthetic differential geometry is an obvious choice (available online!), and there is also Moerdijk and Reyes’s Models of smooth infinitesimals analysis, which shows in detail how to construct models of SDG using sheaves of germs of smooth functions.

To get a feeling for what is going on, and why intuitionistic logic is needed, let us review the usual proof that infinitesimals do not exist. This requires a bit of logical nitpicking, so bare with me. Both intuitionistic and classical mathematics agree that there is no real number $x$ which is neither negative, nor zero, nor positive:

$\lnot \exists x \in \mathbb{R}, \lnot (x < 0) \land \lnot (x = 0) \land \lnot (x > 0)$.

(There is some disagreement as to whether every number is either negative, zero, or positive, but that is beside the point right now.) A nilpotent infinitesimal of second degree, or just infinitesimal for short, is a real number $dx$ whose square is zero. Any such $dx$ is neither negative nor positive, because both $dx > 0$ and $dx < 0$ imply $dx^2 > 0$, which contradicts $dx^2 = 0$. If $dx$ were also non-zero, we would have a number which is neither negative, zero, nor positive. Thus we proved that an infinitesimal cannot be non-zero:

$dx^2 = 0 \Rightarrow \lnot \lnot (dx = 0)$.

A classical mathematician will now conclude that $dx = 0$ by applying Proof by Contradiction. Intuitionistically we have only shown that infinitesimals are potentially equal to zero.

But are there any infinitesimals which are actually different from zero? It can be shown from the main axiom of SDG (see below) that non-zero infinitesimals potentially exist. It is a confusing world: on one hand all infinitesimals are potentially zero, but on the other non-zero ones potentially exist. Like all good things in life, intuitionistic mathematics is an acquired taste (and addictive).

Can a physicist make sense of all this? We may think of infinitesimals as quantities so small that they cannot be experimentally distinguished from zero (they are potentially zero), but neither can they be shown to all equal zero (potentially there are some non-zero ones). By the way, we are not talking about lengths below Planck length, as there are clearly reals numbers smaller than $1.6 * 10^(-35)$ whose square is positive.

The actual axiom which gets the infinitesimal calculus going does not explicitly state anything about non-zero infinitesimals. Instead, it expresses the principle of micro-affinity (sometimes called micro-linearity) that physicists use in their calculations.

Principle of micro-affinity: An infinitesimal change in the independent variable $x$ causes an affine (linear) change in the dependent variable $y = f(x)$.

More precisely, if $f : R \to R$ is any function, $x \in R$ and $dx$ is an infinitesimal, then there exists a unique number $f'(x)$, called the derivative of $f$ at $x$, such that $f(x + dx) = f(x) + f'(x) dx$. This principle has many consequences, such as potential existence of non-zero infinitesimals described above. For actual calculations the most important consequence is

Law of cancellation: If $a$ and $b$ are real numbers such that $a \cdot dx$ = $b \cdot dx$ for all infinitesimals $dx$ then $a = b$.

What this says is that we may cancel infinitesimals when they are arbitrary. This is important because infinitesimals do not have inverses (they are potentially zero). Nevertheless, we may cancel them in an equation, as long as they are arbitrary.

Let me show how this works in practice by calculating the derivative of $f(x) = x^2$. For arbitrary infinitesimal $dx$ we have

$f'(x) \cdot dx = f(x + dx) – f(x) = (x + dx)^2 – x^2 = x^2 + 2 x \cdot dx + dx^2 – x^2 = 2 x \cdot dx$

where we used the fact that $dx^2 = 0$. Because $dx$ is arbitrary, we may cancel it on both sides and get $f'(x) = 2 x$. I emphasize that this is a mathematically precise and logically correct calculation. It is in fact very close to the usual treatment which goes like this:

$f'(x) = (f(x+dx) – f(x))/dx = (x^2 + 2 x \cdot dx – dx^2 – x^2)/dx = 2 x + dx = 2 x$

There are two incorrect steps here: we divided by an infinitesimal $dx$ without knowing that it is different from zero (it isn’t!), and we pretended that $2 x + dx$ is equal to $2 x$ because “$dx$ is very small”. By the same reasoning we should have concluded that $f(x+dx) – f(x) = f(x) – f(x) = 0$, but we did not. Why?

The principle of micro-affinity allows us to easily derive the usual rules for computing derivatives, the potential existence of non-zero infinitesimals, prove the fundamental theorem of calculus in two lines, derive the wave equation like physicists do it, etc. And it is all correct, exact math. No approximations, no guilty feeling about throwing away “negligible terms” here but not there, and other hocus-pocus that physicists have to resort to because nobody told them about this stuff.

Just for fun, let me compute more derivatives. The general strategy in computing $f'(x)$ is to consider an arbitrary infinitesimal $dx$ and express $f'(x) \cdot dx = f(x + dx) – f(x)$ as a quantity multiplied by $dx$. Then we cancel $dx$ on both sides and get $f'(x)$. Throughout we use the fact that $dx^2 = 0$. Here we go: