From Scholarpedia

Bell's theorem asserts that if certain predictions of quantum theory are correct then our world is non-local. "Non-local" here means that there exist interactions between events that are too far apart in space and too close together in time for the events to be connected even by signals moving at the speed of light. This theorem was proved in 1964 by John Stewart Bell and has been in recent decades the subject of extensive analysis, discussion, and development by both physicists and philosophers of science. The relevant predictions of quantum theory were first convincingly confirmed by the experiment of Aspect et al. in 1982; they have been even more convincingly reconfirmed many times since. In light of Bell's theorem, the experiments thus establish that our world is non-local. This conclusion is very surprising, since non-locality is normally taken to be prohibited by the theory of relativity.

Historical background

John Bell's interest in non-locality was triggered by his analysis of the problem of hidden variables in quantum theory and in particular by his learning about the de Broglie–Bohm1 "pilot-wave" theory (aka "Bohmian mechanics"2). Bell wrote that David "Bohm's 1952 papers on quantum mechanics were for me a revelation. The elimination of indeterminism was very striking. But more important, it seemed to me, was the elimination of any need for a vague division of the world into 'system' on the one hand, and 'apparatus' or 'observer' on the other."3

In particular, learning about Bohm's "hidden variables"4 theory helped Bell recognize the invalidity of the various "no hidden variables" theorems (by John von Neumann and others) which had been taken almost universally by physicists as conclusively establishing something like Niels Bohr's Copenhagen interpretation of quantum theory. Bohm's pilot-wave theory was a clean counterexample, i.e., a proof-by-example that the theorems somehow didn't rule out what they had been taken to rule out.

This led Bell to carefully scrutinize those theorems. The result of this work was his paper "On the problem of hidden variables in quantum mechanics"5. This paper was written prior to the 1964 paper6 in which Bell's theorem was first presented, but (due to an editorial accident) remained unpublished until 1966. The 1966 paper shows that the "no hidden variables" theorems of von Neumann and others all made unwarranted — and in some cases unacknowledged — assumptions. (All these theorems involved an assumption7 which today is usually called non-contextuality.) In examining how Bohm's theory managed to violate these assumptions, Bell noticed that it did have one "curious feature": the theory was manifestly non-local. As Bell explained, "in this theory an explicit causal mechanism exists whereby the disposition of one piece of apparatus affects the results obtained with a distant piece."8 This naturally raised the question of whether the non-locality was eliminable, or somehow essential:

... to the present writer's knowledge, there is no proof that any hidden variable account of quantum mechanics must have this extraordinary character. It would therefore be interesting, perhaps, to pursue some further 'impossibility proofs,' replacing the arbitrary axioms objected to above by some condition of locality, or of separability of distant systems.9

Because of the editorial accident mentioned above, Bell had answered his own question before the paper in which it appeared was even published. The answer is contained in what we will here call "Bell's inequality theorem", which states precisely that "any hidden variable account of quantum mechanics must have this extraordinary character", i.e., must violate a locality constraint that is motivated by relativity.

But the more general result we here call "Bell's theorem" is much more than this: combined with the Einstein–Podolsky–Rosen (EPR) argument "from locality to deterministic hidden variables"10, the inequality theorem establishes a contradiction between locality as such (and not merely some special class of local theories) and the (now experimentally confirmed) predictions of quantum theory.

The EPR argument for pre-existing values

It is a general principle of orthodox formulations of quantum theory that measurements of physical quantities do not simply reveal pre-existing or pre-determined values, the way they do in classical theories. Instead, the particular outcome of the measurement somehow "emerges" from the dynamical interaction of the system being measured with the measuring device, so that even someone who was omniscient about the states of the system and device prior to the interaction couldn't have predicted in advance which outcome would be realized.

In a celebrated 1935 paper11, however, Albert Einstein, Boris Podolsky, and Nathan Rosen pointed out that, in situations involving specially-prepared pairs of particles, this orthodox principle conflicted with locality. Unfortunately, the role of locality in the discussion is often misunderstood — or missed entirely. One thus often hears that the EPR paper is essentially just an expression of (in particular) Einstein's philosophical discontent with quantum theory. This is quite wrong: what the paper actually contains is an argument showing that, if non-local influences are forbidden, and if certain quantum theoretical predictions are correct, then the measurements (whose outcomes are correlated) must be revealing pre-existing values. It is on this basis — in particular, on the assumption of locality — that EPR claimed to have established the "incompleteness" of orthodox quantum theory (which denies the existence of any such pre-existing values).

In the 1935 EPR paper, the argument was formulated in terms of position and momentum (which are observables having continuous spectra). The argument was later reformulated (by Bohm12) in terms of spin. This "EPRB" version is conceptually simpler and also more closely related to the recent experiments designed to test Bell's inequality.

The EPRB argument is as follows: assume that one has prepared a pair of spin-1/2 particles in the entangled spin singlet state

\(\frac1{\sqrt2}\,\big(\left\vert\uparrow\right\rangle\otimes\left\vert\downarrow\right\rangle-\left\vert\downarrow\right\rangle\otimes\left\vert\uparrow\right\rangle\big),\)

with \(\left\vert\uparrow\right\rangle\ ,\) \(\left\vert\downarrow\right\rangle\) an orthonormal basis of the spin state space. A measurement of the spin of one of the particles along a given axis yields either the result "up" (i.e., "spin up") or the result "down" (i.e., "spin down"). Moreover, if one measures the spin of both particles along some given axis (say, the \(z\)-axis), then quantum theory predicts that the results obtained will be perfectly anti-correlated, i.e., they will be opposite ("up" for one particle and "down" for the other). If such measurements are carried out simultaneously on two spatially-separated particles (technically, if the measurements are performed at space-like separation) then locality requires that any disturbance triggered by the measurement on one side cannot influence the result of the measurement on the other side. But without any such interaction, the only way to ensure the perfect anti-correlation between the results on the two sides is to have each particle carry a pre-existing determinate value (appropriately anti-correlated with the value carried by the other particle) for spin along the \(z\)-axis. Any element of locally-confined indeterminism would at least sometimes spoil the predicted perfect anti-correlation between the outcomes.

Now, obviously there is nothing special here about the \(z\)-axis, so what was just established for the \(z\)-axis applies to any axis. Thus it applies to all axes at once13. That is, assuming (a) locality and (b) that the perfect anti-correlations predicted by quantum theory actually obtain, it follows that each particle must carry a pre-existing value for spin along all possible axes, with the values for the two particles in a given pair — which, of course, needn't be the same from one particle pair to another — perfectly anti-correlated, axis by axis. (A mathematical formulation of this argument is presented at the end of Section 5.)

Bell's inequality theorem

Pre-existing values are thus the only local way to account for perfect anti-correlations in the outcomes of spin measurements along identical axes. But a simple argument shows that pre-existing values are incompatible with the predictions of quantum theory (for a pair of particles prepared in the singlet state) when we allow also for the possibility of spin measurements along different axes.

According to quantum theory, when spin measurements along different axes are performed on the pair of particles in the singlet state, the probability that the two results will be opposite (one "up" and one "down") is equal to \((1+\cos\,\theta)/2\ ,\) where \(\theta\in[0,\pi]\) is the angle between the chosen (oriented) axes. It follows from the simple mathematical result below, Bell's inequality theorem, that this is not compatible with the pre-existing values we have been discussing.

To see this, suppose that the spin measurements for both particles do simply reveal pre-existing values. Denote by \(Z^i_\alpha\ ,\) \(i=1,2\ ,\) the pre-determined outcome of the spin measurement for particle number \(i\) along axis \(\alpha\ .\) These values will evidently vary from one run of the experiment (i.e., one particle pair) to the next, and can thus be treated mathematically as random variables (each one assuming only two possible values, say 1 for "up" and -1 for "down").

Now consider three particular axes \(\mathbf a\ ,\) \(\mathbf b\ ,\) and \(\mathbf c\) that lie in a single plane and are such that the angle between any two of them is equal to \(2\pi/3\ .\) Then, since \(\big(1+\cos(2\pi/3)\big)/2=1/4\ ,\) agreement with quantum theory will require that \(P(Z^1_\alpha

e Z^2_\beta)=1/4\) if \(\alpha

e\beta\) are among \(\mathbf a\ ,\) \(\mathbf b\ ,\) \(\mathbf c\) (where \(P\) stands for probability). Agreement with quantum theory also requires opposite outcomes for identical measurement axes, i.e., \(Z^1_\alpha=-Z^2_\alpha\ ,\) for all \(\alpha\ .\) But it turns out that it is impossible to satisfy both requirements:

Bell's inequality theorem. Consider random variables \(Z^i_\alpha\ ,\) \(i=1,2\ ,\) \(\alpha=\mathbf a, \mathbf b, \mathbf c\ ,\) taking only the values \(\pm1\ .\) If these random variables are perfectly anti-correlated, i.e., if \(Z^1_\alpha=-Z^2_\alpha\ ,\) for all \(\alpha\ ,\) then: \[(1)\quad P(Z^1_{\mathbf a}

e Z^2_{\mathbf b})+P(Z^1_{\mathbf b}

e Z^2_{\mathbf c})+P(Z^1_{\mathbf c}

e Z^2_{\mathbf a})\ge1.\]

Proof. Since (at any given point of the sample space) the three \(\pm1\)-valued random variables \(Z^1_\alpha\) can't all disagree, the union of the events \(\{Z^1_{\mathbf a}=Z^1_{\mathbf b}\}\ ,\) \(\{Z^1_{\mathbf b}=Z^1_{\mathbf c}\}\ ,\) \(\{Z^1_{\mathbf c}=Z^1_{\mathbf a}\}\) is equal to the entire sample space. Therefore the sum of their probabilities must be greater than or equal to 1:

\[P(Z^1_{\mathbf a}=Z^1_{\mathbf b})+P(Z^1_{\mathbf b}=Z^1_{\mathbf c})+P(Z^1_{\mathbf c}=Z^1_{\mathbf a})\ge1.\]

But since \(Z^1_\beta = -Z^2_\beta\ ,\) we have that \(P(Z^1_\alpha=Z^1_\beta)=P(Z^1_\alpha

e Z^2_\beta)\ .\) The thesis immediately follows.

Each of the three terms on the left hand side of (1) must equal \(1/4\) in order to reproduce the quantum predictions. But, since \(1/4+1/4+1/4=3/4<1\ ,\) the full set of quantum predictions cannot be matched. This establishes the incompatibility between the quantum predictions and the existence of pre-existing values.

We note that Bell's original paper6 considered for this purpose, instead of the disagreement probability \(P(Z^1_\alpha

e Z^2_\beta)\ ,\) the correlation \(C(\alpha,\beta)\ ,\) defined as the expected value of the product \(Z^1_\alpha Z^2_\beta\ :\)

\[C(\alpha,\beta)=E(Z^1_\alpha Z^2_\beta)=P(Z^1_\alpha Z^2_\beta=1)\,-\,P(Z^1_\alpha Z^2_\beta=-1)=P(Z^1_\alpha=Z^2_\beta)\,-\,P(Z^1_\alpha

e Z^2_\beta)=1\,-\,2P(Z^1_\alpha

e Z^2_\beta).\]

Bell's original inequality (under the same assumptions as for Bell's inequality theorem above) is:

\[\vert C(\mathbf a,\mathbf b)-C(\mathbf a,\mathbf c)\vert\le 1+C(\mathbf b,\mathbf c).\]

Let us see how this inequality is related to inequality (1). Rewriting inequality (1) in terms of the correlations \(C(\alpha,\beta)\ ,\) we obtain:

\[\quad C(\mathbf a,\mathbf b)+C(\mathbf b,\mathbf c)+C(\mathbf c,\mathbf a)\le1.\]

Since (because of the perfect anti-correlations) \(C(\alpha,\beta)=C(\beta,\alpha)\ ,\) this yields that \[(2)\quad C(\mathbf a,\mathbf b)+C(\mathbf a,\mathbf c)+C(\mathbf b,\mathbf c)\le1.\]

Bell's original inequality is equivalent to the conjunction of two inequalities without absolute value: one of them is obtained from (2) by changing the signs of \(C(\mathbf a,\mathbf c)\) and \(C(\mathbf b,\mathbf c)\ .\) (This inequality follows, as (2) does, from Bell's inequality theorem above if we replace \(Z^i_{\mathbf c}\) with \(-Z^i_{\mathbf c}\ .\)) The other inequality is obtained from (2) by changing the signs of \(C(\mathbf a,\mathbf b)\) and \(C(\mathbf b,\mathbf c)\ .\) (This inequality follows from Bell's inequality theorem above by replacing \(Z^i_{\mathbf b}\) with \(-Z^i_{\mathbf b}\ .\))

Bell's theorem

Bell's theorem states that the predictions of quantum theory (for measurements of spin on particles prepared in the singlet state) cannot be accounted for by any local theory. The proof of Bell's theorem is obtained by combining the EPR argument (from locality and certain quantum predictions to pre-existing values) and Bell's inequality theorem (from pre-existing values to an inequality incompatible with other quantum predictions).

Here is how Bell himself recapitulated the two-part argument:

Let us summarize once again the logic that leads to the impasse. The EPRB correlations are such that the result of the experiment on one side immediately foretells that on the other, whenever the analyzers happen to be parallel. If we do not accept the intervention on one side as a causal influence on the other, we seem obliged to admit that the results on both sides are determined in advance anyway, independently of the intervention on the other side, by signals from the source and by the local magnet setting. But this has implications for non-parallel settings which conflict with those of quantum mechanics. So we cannot dismiss intervention on one side as a causal influence on the other.14

Already at the time Bell wrote this, there was a tendency for critics to miss the crucial role of the EPR argument here. The conclusion is not just that some special class of local theories (namely, those which explain the measurement outcomes in terms of pre-existing values) are incompatible with the predictions of quantum theory (which is what follows from Bell's inequality theorem alone), but that local theories as such (whether deterministic or not, whether positing hidden variables or not, etc.) are incompatible with the predictions of quantum theory. This confusion has persisted in more recent decades, so perhaps it is worth emphasizing the point by (again) quoting from Bell's pointed footnote from the same 1980 paper quoted just above: "My own first paper on this subject ... starts with a summary of the EPR argument from locality to deterministic hidden variables. But the commentators have almost universally reported that it begins with deterministic hidden variables."10

The CHSH–Bell inequality: Bell's theorem without perfect correlations

Perhaps motivated by this widespread and persistent misunderstanding concerning his 1964 paper6, Bell wrote many subsequent papers in which he explained and elaborated upon his very interesting result from a variety of angles. After 197515 Bell sometimes presented his result using a new strategy that does not rely on perfect (anti-)correlations and on the EPR argument. The new strategy has some advantages: perfect correlations cannot be demonstrated empirically, and one could also imagine the possibility that quantum theory might be replaced with a new theory that predicts some small deviation from the perfect correlations. So it is desirable to have a version of Bell's theorem that "depends continuously" on the correlations. The new strategy also sheds some light on the meaning of locality.

The idea is to write down a mathematically precise formulation of a consequence of locality in the context of an experiment in which measurements are performed on two systems which have previously interacted — say, systems that have been produced by a common source — but which are now spatially separated. (The EPR scenario considered above is of course an example of such an experiment.) Which of the several possible measurements are actually performed on each system will be determined by (control) parameters — \(\alpha_1\) and \(\alpha_2\) — which should be thought of as being randomly and freely chosen by the experimenters, just before the measurements. The measurements (and the choices of the control parameters) are assumed to be space-like separated. Once \(\alpha_1\) and \(\alpha_2\) are chosen, the experiment is performed, yielding (say, real-valued) outcomes \(A_1\) and \(A_2\) for the measurements on the two systems. While the values of \(A_1\) and \(A_2\) may vary from one run of the experiment to another even for the same choice of parameters, we assume that, for a fixed preparation procedure on the two systems, these outcomes exhibit statistical regularities. More precisely, we assume these are governed by probability distributions \(P_{\alpha_1,\alpha_2}(A_1,A_2)\) depending of course on the experiments performed, and in particular on \(\alpha_1\) and \(\alpha_2\ .\)

Notice that no assumption of pre-determined outcomes is being invoked here: part (or all) of the randomness of \(A_1\ ,\) \(A_2\) can arise during the process of measurement. By contrast, recall that in the above proof of Bell's inequality theorem using the random variables \(Z^i_\alpha\ ,\) the randomness was entirely located at the source, or at least occurred prior to the measurements. Moreover, in that context it was meaningful to talk about the joint probability distribution of \((Z^i_\alpha,Z^i_\beta)\) with \(\alpha

e\beta\) (i.e., the joint probability distribution for outcomes of different measurements on the same system), while here a joint probability distribution of that type is not meaningful.

Let us now see how a mathematically precise necessary condition for locality can be formulated. First of all, one should realize that locality does not imply the independence \(P_{\alpha_1,\alpha_2}(A_1,A_2)=P_{\alpha_1,\alpha_2}(A_1)P_{\alpha_1,\alpha_2}(A_2)\) of the outcomes \(A_1\ ,\) \(A_2\ .\) Indeed, it is perfectly natural to expect that the previous interaction between the systems 1 and 2 could produce dependence relations between the outcomes. However, if locality is assumed, then it must be the case that any additional randomness that might affect system 1 after it separates from system 2 must be independent of any additional randomness that might affect system 2 after it separates from system 1. More precisely, locality requires that some set of data \(\lambda\) — made available to both systems, say, by a common source16 — must fully account for the dependence between \(A_1\) and \(A_2\ ;\) in other words, the randomness that generates \(A_1\) out of the parameter \(\alpha_1\) and the data codified by \(\lambda\) must be independent of the randomness that generates \(A_2\) out of the parameter \(\alpha_2\) and \(\lambda\ .\) Since \(\lambda\) can vary from one run of the experiment to the other, it should be modeled as a random variable.

Let us re-state these ideas mathematically\[\lambda\] is a random variable conditioning upon which yields a decomposition

\[(3)\quad P_{\alpha_1,\alpha_2}(A_1,A_2)=\int_\Lambda P_{\alpha_1,\alpha_2}(A_1,A_2|\lambda)\,\mathrm dP(\lambda),\]

into conditional probabilities obeying a factorizability condition of the form:

\[(4)\quad P_{\alpha_1,\alpha_2}(A_1,A_2|\lambda)=P_{\alpha_1}(A_1|\lambda)P_{\alpha_2}(A_2|\lambda).\]

The probability distribution \(P\) of \(\lambda\) should not be allowed to depend on \((\alpha_1,\alpha_2)\ ;\) this is the mathematical meaning of the assumption, noted above, that the control parameters \(\alpha_1\ ,\) \(\alpha_2\) are "randomly and freely chosen by the experimenters". One might imagine here that the experimenter on each side makes a free-will choice (just before the measurement) about how to set his apparatus, that is independent of the data codified by \(\lambda\) (which existed before the choices were made). One needn't worry, however, about whether experimenters have "genuine free will" or about what that exactly means. In a real experiment, the parameters \(\alpha_1\) and \(\alpha_2\) would typically be chosen by some random or pseudo-random number generator (say, a computer) that is independent of any other physical processes that might be relevant for the outcomes, and hence independent of \(\lambda\) — unless, that is, there exists some incredible conspiracy of nature (the kind of conspiracy that would make any kind of scientific inquiry impossible). We will thus call the assumption that the probability distribution of \(\lambda\) is independent of \((\alpha_1,\alpha_2)\) the "no conspiracy" condition.

Note that the "no conspiracy" condition doesn't follow from locality: even if we assume that the choices of \(\alpha_1\) and \(\alpha_2\) are made at space-like separation from the physical processes creating the value of \(\lambda\ ,\) it is still possible in principle that the supposedly random process determining \(\alpha_1\) and \(\alpha_2\) is in fact dependent, via some local influences from the more distant past, on whatever is going on in the process that creates \(\lambda\ .\) The "no conspiracy" assumption, then, is strictly speaking just that — an additional assumption (beyond locality) on which the derivation of Bell-type inequalities rests. That said, we stress that this assumption is necessarily always made whenever one does any empirical science; in practice, one assesses the applicability of the assumption to a given experiment by examining the care with which the experimental design precludes any non-conspiratorial dependencies between the preparation of the systems and the settings of instruments17.

The precise mathematical setup for formulas (3) and (4) is the following: one considers a probability space \((\Lambda,P)\) and, with each \(\lambda\in\Lambda\) and each choice of the parameters \(\alpha_1\ ,\) \(\alpha_2\ ,\) one associates a probability measure \(P_{\alpha_1,\alpha_2}(\cdot|\lambda)\) on the set of possible values for the pair \((A_1,A_2)\ .\) Formula (4) says that, for each \(\lambda\in\Lambda\ ,\) the probability measure \(P_{\alpha_1,\alpha_2}(\cdot|\lambda)\) factorizes as the product of a probability measure \(P_{\alpha_1}(\cdot|\lambda)\) (the marginal of \(A_1\) given \(\lambda\)) that depends only on \(\alpha_1\) and a probability measure \(P_{\alpha_2}(\cdot|\lambda)\) (the marginal of \(A_2\) given \(\lambda\)) that depends only on \(\alpha_2\ .\) The probability distribution (3) of \((A_1,A_2)\) that is observed in the experiment (and for which quantum theory makes predictions) is obtained from \(P_{\alpha_1,\alpha_2}(\cdot|\lambda)\) by averaging (i.e., integrating) over \(\lambda\) with respect to the probability measure of the space \((\Lambda,P)\ .\) As in Section 3, we define the correlation \(C(\alpha_1,\alpha_2)\) as the expected value of the product \(A_1A_2\) for a given choice of \(\alpha_1\ ,\) \(\alpha_2\ :\)

\[C(\alpha_1,\alpha_2)=E_{\alpha_1,\alpha_2}(A_1A_2)=\int_\Lambda E_{\alpha_1,\alpha_2}(A_1A_2|\lambda)\,\mathrm dP(\lambda),\]

where \(E_{\alpha_1,\alpha_2}(A_1A_2|\lambda)\) is the expected value of the product \(A_1A_2\) with respect to the probability measure \(P_{\alpha_1,\alpha_2}(\cdot|\lambda)\ .\)

Now it is easy to prove the CHSH inequality18 (after John F. Clauser, Michael A. Horne, Abner Shimony, and Richard A. Holt). This inequality is also known in the literature as the CHSH–Bell inequality or simply "Bell's inequality". In this article we will call it the "CHSH–Bell inequality" in order to distinguish it from the inequalities of Section 3 which are used in the versions of Bell's theorem that require the assumption of certain perfect (anti-)correlations.

Theorem. Suppose that the possible values for \(A_1\) and \(A_2\) are \(\pm1\ .\) Under the mathematical setup described above, assuming the factorizability condition (4), the following inequality holds:

\[|C(\mathbf a,\mathbf b)-C(\mathbf a,\mathbf c)|+|C(\mathbf a',\mathbf b)+C(\mathbf a',\mathbf c)|\le2,\]

for any choice of parameters \(\mathbf a\ ,\) \(\mathbf b\ ,\) \(\mathbf c\ ,\) \(\mathbf a'\ .\)

Proof. It follows from (4) that \(E_{\alpha_1,\alpha_2}(A_1A_2|\lambda)=E_{\alpha_1}(A_1|\lambda)E_{\alpha_2}(A_2|\lambda)\ ,\) for all \(\lambda\ ,\) \(\alpha_1\ ,\) \(\alpha_2\ .\) Thus:

\[|C(\mathbf a,\mathbf b)-C(\mathbf a,\mathbf c)|+|C(\mathbf a',\mathbf b)+C(\mathbf a',\mathbf c)|\ :\] \[\le\int_\Lambda\Big[\big|E_{\mathbf a}(A_1|\lambda)\big|\,\big(\big|E_{\mathbf b}(A_2|\lambda)-E_{\mathbf c}(A_2|\lambda)\big|\big)\,+\,\big|E_{\mathbf a'}(A_1|\lambda)\big|\,\big(\big|E_{\mathbf b}(A_2|\lambda)+E_{\mathbf c}(A_2|\lambda)\big|\big)\Big]\,\mathrm dP(\lambda)\ :\] \[\le\int_\Lambda\Big[\big|E_{\mathbf b}(A_2|\lambda)-E_{\mathbf c}(A_2|\lambda)\big|\,+\,\big|E_{\mathbf b}(A_2|\lambda)+E_{\mathbf c}(A_2|\lambda)\big|\Big]\,\mathrm dP(\lambda),\]

where the second inequality follows from the observation that \(|E_\alpha(A_1|\lambda)|\le1\ .\) The conclusion now follows directly from the following elementary lemma:

Lemma. For real numbers \(x,y\in[-1,1]\ ,\) we have that \(|x-y|+|x+y|\le2\ .\)

Proof. Squaring \(|x-y|+|x+y|\) we obtain \(2x^2+2y^2+2|x^2-y^2|\ ,\) which is either equal to \(4x^2\) or to \(4y^2\ ;\) in either case, it is less than or equal to 4.

For the experiment considered in Section 2 (spin measurements on a pair of particles in the singlet state), quantum theory predicts \(C(\alpha,\beta)=-\alpha\cdot\beta\) (where the dot denotes the Euclidean inner product and the oriented axes \(\alpha\ ,\) \(\beta\) are identified with their corresponding unit vectors). For this experiment, the CHSH–Bell inequality is maximally violated by the quantum predictions if \(\mathbf b\) and \(\mathbf c\) are mutually orthogonal, \(\mathbf a'\) bisects \(\mathbf b\) and \(\mathbf c\ ,\) and \(\mathbf a\) bisects \(\mathbf b\) and the opposite axis \(-\mathbf c\ .\) In that case, the left hand side is equal to \(2\sqrt2\ .\) We remark also that the original Bell's inequality is obtained from the CHSH–Bell inequality by setting \(\mathbf a'=\mathbf b\) and using \(C(\mathbf b,\mathbf b)=-1\ .\)

We have thus established again the incompatibility between locality and certain predictions of quantum theory: we have proven that the CHSH–Bell inequality, which is violated by the quantum predictions, follows from the assumption of locality (and the "no conspiracy" condition).

Let us now take advantage of the mathematical formulation of (a consequence of) locality presented above — the factorizability condition (4) — in order to formulate mathematically the version of Bell's theorem presented in Section 4. Since Bell's inequality theorem has already been formulated mathematically, it remains for us to do so for the EPR argument as well. The mathematical statement (which we will prove in a moment) corresponding to the EPR argument is the following: assuming (4) and the perfect anti-correlations \(P_{\alpha,\alpha}(A_1

e A_2)=1\ ,\) there exist random variables \(Z^i_\alpha\) on the probability space \((\Lambda,P)\) such that:

\[(5)\quad P_{\alpha_1,\alpha_2}\big(A_i=Z^i_{\alpha_i}(\lambda)|\lambda\big)\;\stackrel{(4)}=\;P_{\alpha_i}\big(A_i=Z^i_{\alpha_i}(\lambda)|\lambda\big)=1,\]

for \(i=1,2\) and all \(\lambda\ ,\) \(\alpha_1\ ,\) and \(\alpha_2\ .\)

Notice that (using integration over \(\lambda\)) equality (5) implies that, for all \(\alpha_1\ ,\) \(\alpha_2\ ,\) the probability distribution of the pair of random variables \((Z^1_{\alpha_1},Z^2_{\alpha_2})\) is equal to the (unconditional) probability distribution (3) of the pair of outcomes \((A_1,A_2)\) (the probability distribution observed in the experiment, for which quantum theory makes predictions). In particular, we have \(P_{\alpha_1,\alpha_2}(A_1

e A_2)=P(Z^1_{\alpha_1}

e Z^2_{\alpha_2})\ .\) The random variables \(Z^i_\alpha\) are precisely the ingredients necessary for the proof of Bell's inequality theorem and hence we obtain, as just announced, a mathematical formulation of the version of Bell's theorem presented in Section 4.

Here is the proof of the mathematical statement corresponding to the EPR argument: assume (4) and the perfect anti-correlations. It follows from \(P_{\alpha,\alpha}(A_1

e A_2)=1\) that \(P_{\alpha,\alpha}(A_1

e A_2|\lambda)=1\) holds for all19 \(\lambda\in\Lambda\ .\) When \(\alpha_1=\alpha_2=\alpha\ ,\) for each \(\lambda\in\Lambda\ ,\) the outcomes \(A_1\) and \(A_2\) given \(\lambda\) (whose joint probability distribution is \(P_{\alpha,\alpha}(\cdot|\lambda)\)) are at the same time independent (by (4)) and perfectly anti-correlated. An elementary lemma from probability theory shows that this can happen only if they are not really random, i.e., if they are constant. The constant may depend upon \(\alpha\) and \(\lambda\ ,\) and thus there are functions \(f_i\) such that \(P_{\alpha,\alpha}\big(A_i=f_i(\alpha,\lambda)|\lambda\big)=1\ .\) Define the random variables \(Z^i_\alpha\) by setting \(Z^i_\alpha(\lambda)=f_i(\alpha,\lambda)\ .\) In order to conclude the proof, observe that condition (4) implies:

\[P_{\alpha_1,\alpha_2}\big(A_i=Z^i_{\alpha_i}(\lambda)|\lambda\big)=P_{\alpha_i}\big(A_i=Z^i_{\alpha_i}(\lambda)|\lambda\big)=P_{\alpha_i,\alpha_i}\big(A_i=Z^i_{\alpha_i}(\lambda)|\lambda\big)=1.\]

Bell's definition of locality

As we have stressed above, the crucial assumption from which one can derive various empirically-testable Bell-type inequalities is locality. (Bell sometimes also used the term local causality instead of locality). Bell explained the "principle of local causality" as follows:

The direct causes (and effects) of events are near by, and even the indirect causes (and effects) are no further away than permitted by the velocity of light.20

In relativistic terms, locality is the requirement that goings-on in one region of spacetime should not affect — should not influence — happenings in space-like separated regions.

Although we have not presented any kind of careful mathematical definition of locality, we were able to prove in the previous sections that certain quantum predictions are incompatible with locality. This was achieved by means of the formulation of a mathematically precise necessary condition for locality (in the context of a particular type of experiment): namely, the factorizability condition (4). It is possible, however, to formulate locality itself in a rigorous way, at least for a certain class of physical theories. Bell actually proposed two (subtly different) such formulations, one in his 1975 paper "The theory of local beables"15 and the other — which we will explain here — in his 1990 paper "La nouvelle cuisine"21.

"Beable" is Bell's term for those elements of a theory which are "to be taken seriously, as corresponding to something real"22. As an example, Bell cites the electric and magnetic fields of classical electromagnetism:

In Maxwell's electromagnetic theory, for example, the fields \(\mathbf E\) and \(\mathbf H\) are 'physical' (beables, we will say) but the potentials \(\mathbf A\) and \(\phi\) are 'non-physical'. Because of gauge invariance the same physical situation can be described by very different potentials.23

As Bell points out, it is therefore no violation of locality "that in Coulomb gauge the scalar potential propagates with infinite velocity. It is not really supposed to be there."24

The beables of a theory have values that (according to the theory) are supposed to exist independently of any observation or experiment. In this regard Bell contrasts the notion of beable with the notion of "observable" which features prominently in orthodox quantum theory:

The concept of 'observable' lends itself to very precise mathematics when identified with 'self-adjoint operator'. But physically, it is a rather woolly concept. It is not easy to identify precisely which physical processes are to be given the status of 'observations' and which are to be relegated to the limbo between one observation and another. So it could be hoped that some increase in precision might be possible by concentration on the beables, which can be described in 'classical terms', because they are there.25

This woolliness suggests that the notion of "observation" should not appear in the formulation of (candidate) fundamental physical theories. Indeed, every aspect of a physical process (including those processes we humans classify as "observations") should be completely reducible to the actions and interactions of some physically real objects — some beables. In an "observation", both the "observed system" and the relevant experimental apparatus, for example, must be made of beables, and anything like a measurement outcome which (say) emerges anew from the system-apparatus interaction must be contained in the final disposition of those beables.

Locality is the idea that physical influences cannot propagate faster than light. It thus presupposes a clear identification, for a given candidate theory, of which elements are supposed to correspond to something that is physically real. Here is how Bell makes this point: "No one is obliged to consider the question 'What cannot go faster than light?'. But if you decide to do so, then the above remarks suggest the following: you must identify in your theory 'local beables'"26. (We will discuss this again later.)

Local beables are those elements of a theory which should correspond to elements of physical reality living within spacetime. Those should include the representation of the ordinary objects of our experience, such as tables, chairs and experimental equipment. As Bell puts this:

These are the mathematical counterparts in the theory to real events at definite places and times in the real world (as distinct from the many purely mathematical constructions that occur in the working out of physical theories, as distinct from things which may be real but not localized, ...).27

All the beables familiar from at least so-called "classical theories" are of this type — for example, the already mentioned fields in classical electromagnetism, or the positions of particles in classical mechanics. The possibility of non-local beables — corresponding to elements of physical reality which are not in spacetime — arises especially with respect to the several candidate versions of quantum theory, all of which involve a wave function (or quantum state) which, as a function on an abstract configuration space, will be a non-local beable if it is granted beable status at all. In the words of Bell:

... the wavefunction as a whole lives in a much bigger space, of \(3N\)-dimensions. It makes no sense to ask for the amplitude or phase or whatever of the wavefunction at a point in ordinary space. It has neither amplitude nor phase nor anything else until a multitude of points in ordinary three-space are specified.28

Thus, one can meaningfully talk about "the local beables living within a region \(R\) of spacetime" or, more simply, "the local beables in region \(R\)"29. Those represent, according to the theory, what is supposed to be really happening in \(R\ .\) On the other hand, there is no such thing as a non-local beable "living inside" a given region of spacetime.

Not surprisingly, it is less straightforward to assess the locality of theories positing non-local beables. Let us then turn to Bell's formulation of locality (which applies straightforwardly to theories of exclusively local beables) and then return to the question of non-local beables and how Bell's formulation can be extended to apply, for example, to theories positing quantum wave functions as non-local beables.

Figure 1: Spacetime diagram for Bell's definition of locality. "Full specification of what happens in 3 makes events in 2 irrelevant for predictions about 1 in a locally causal theory." 30

The thought motivating Bell's formulation is that a complete specification of the physical state of (i.e., the beables in) a spacetime region which closes off the past light cone of some event should include everything needed to make predictions about that event. More precisely, such a specification should render further information — about goings-on at space-like separation from the event in question — irrelevant and/or redundant for making predictions about that event. Referring to the spacetime diagram reproduced at right, Bell formulated this as follows:

A theory will be said to be locally causal if the probabilities attached to values of local beables in a space-time region 1 are unaltered by specification of values of local beables in a space-like separated region 2, when what happens in the backward light cone of 1 is already sufficiently specified, for example by a full specification of local beables in a space-time region 3.31

More precisely, the following equality of conditional probabilities must hold in a local theory: \[P(x_1|x_2,X_3)=P(x_1|X_3),\] where \(x_1\) (resp., \(x_2\)) denotes the value of a local beable in region 1 (resp., in region 2) and \(X_3\) denotes a full specification of the local beables in region 3.

As Bell goes on to explain, it is crucial that region 3 shields32 region 1 from the overlapping past light cones of 1 and 2, and also that the specification of events in region 3 be complete; otherwise information about events in region 2 could well alter the probabilities assigned to events in 1 without this implying any violation of locality. For example, in a local non-deterministic theory, an event might occur subsequent to region 3 which was not predictable on the basis of even a complete specification of the local beables in region 3; such an event could then influence events in its own future light cone, giving rise to correlations — not predictable on the basis of information about region 3 — between space-like separated events. Such a mechanism could make information about events at space-like separation from 1 highly relevant for making predictions about 1, even when those predictions are conditionalized on complete information about region 3. The requirement that region 3 shields 1 from the overlapping past light cones of 1 and 2, however, precludes this possibility: the only way for information about such a region 2 to be relevant for predictions about 1 (once complete information about 3 has been taken into account) is if something somewhere is influencing events outside its future light cone, i.e., violating locality. It is likewise clear that information about goings-on in region 2 may very well usefully supplement predictions about events in 1 made on the basis of an incomplete specification of the values of the local beables in region 3, without any violation of locality being implied.

It is important to appreciate that Bell's proposed definition of locality applies primarily to candidate theories. There is then no particular mystery (at least for clearly-formulated theories) about, for example, which elements have beable status, or what a complete specification of local beables in some spacetime region might involve.

As suggested earlier in this section, Bell's definition of locality does not apply to arbitrary theories; also, it is not clear how one should rigorously define locality for arbitrary theories. Nevertheless, Bell's formulation can be extended in order to provide necessary conditions for the locality of the theories to which it does not apply as a definition (and, of course, such necessary conditions can be used to establish non-locality).

To begin with, Bell's definition of locality does not apply to theories positing non-local beables. Namely, one should certainly expect that not only the local beables in region 3, but also the non-local beables, should be relevant for making predictions about region 1. And of course one cannot talk about "the non-local beables in region 3" since non-local beables do not live inside regions of spacetime. However, for the only seriously-suggested example of a non-local beable — the wave function or quantum state — one can talk about its value on a Cauchy surface and it is natural (for the purpose of assessing the locality of the theory) to take "the complete description of the physical state of region 3" to mean the values of all local beables in region 3 and of the wave function in a given family of Cauchy surfaces that cover region 3.

Problems with Bell's definition also arise for non-Markovian theories, i.e., theories in which influences might "jump" over space-like surfaces. In that case, the region 3 displayed in the figure might not work properly as a shield and local33 non-Markovian theories could be incorrectly diagnosed by Bell's definition of locality as being non-local. For the non-Markovian case, Bell's definition should then be modified so that the equality \(P(x_1|x_2,X_3)=P(x_1|X_3)\) is required to hold only when region 3 is "sufficiently thick", in some precise sense that would have to be specified, depending on how non-Markovian the theory is. In the worst case scenario, the equality \(P(x_1|x_2,X_3)=P(x_1|X_3)\) would be required to hold only when region 3 includes the entire interior of the past light cone of region 1 from some point down. We observe, however, that this modified form of Bell's definition might incorrectly diagnose some non-local theories as being local34 so that it works only as a necessary condition for locality.

Figure 2: Spacetime diagram for EPR–Bell type experiment. Region 3 closes off the past light cones of both regions 1 and 2 and shields each of those regions off from their overlapping past light cones. Hence, according to Bell's definition of locality, a complete specification of local beables in 3 should render information about goings-on in 2 (resp., 1) irrelevant for predictions about 1 (resp., 2).

Let us now apply Bell's proposed definition of locality to the kind of experiment considered in the previous sections. (For the sake of simplicity, in what follows we will consider only theories for which Bell's definition applies directly, though it should be obvious how to adapt the exposition to more general theories for which — as discussed above — only a necessary condition for locality is available.) Recall that in Section 5 we took as a consequence of locality the factorizability condition (4); this condition involves a random variable \(\lambda\) that, by a "no conspiracy" assumption, is independent of \((\alpha_1,\alpha_2)\ .\)

Consider the spacetime diagram at right. Regions 1 and 2 contain the experiments performed on the two systems and the star in the intersection of the interior of their past light cones indicates the source. (The "particle worldlines" in the diagram are merely an illustration and play no role in the argument.) Thus, the parameter \(\alpha_1\) and the outcome \(A_1\) are (functions of) local beables in region 1 and, similarly, \(\alpha_2\) and \(A_2\) are (functions of) local beables in region 2. Note that the indicated region 3 shields off both regions 1 and 2 from their overlapping past light cones, so Bell's locality condition will require that facts about region 1 (in particular, \(\alpha_1\) and \(A_1\)) must be irrelevant for predictions about region 2, once a complete specification of the local beables in region 3 is given (and vice versa, exchanging the role of 1 and 2).

Denoting a complete specification of the local beables in region 3 by \(X\ ,\) we start with the identity35: \[P_{\alpha_1,\alpha_2}(A_1,A_2|X)=P_{\alpha_1,\alpha_2}(A_1|A_2,X)P_{\alpha_1,\alpha_2}(A_2|X)\] and then we use locality to obtain \(P_{\alpha_1,\alpha_2}(A_1|A_2,X)=P_{\alpha_1}(A_1|X)\) and \(P_{\alpha_1,\alpha_2}(A_2|X)=P_{\alpha_2}(A_2|X)\ .\) It follows that: \[P_{\alpha_1,\alpha_2}(A_1,A_2|X)=P_{\alpha_1}(A_1|X)P_{\alpha_2}(A_2|X).\] The equality above looks like the factorizability condition (4), but there is a difference: the variable \(X\) includes much more data than the \(\lambda\) that we considered in Section 5. While it is reasonable to assume (as a "no conspiracy" condition) that \(\lambda\) is independent of \((\alpha_1,\alpha_2)\ ,\) it is not reasonable to assume that \(X\) is independent of \((\alpha_1,\alpha_2)\ .\) Namely, since \(X\) is the complete specification of the local beables in region 3, it is not only possible but likely that \(X\) will fail to be independent of \((\alpha_1,\alpha_2)\ .\)

Of course, assuming Bell's definition of locality alone, we cannot prove the existence of a subset \(\lambda\) of the data codified by \(X\) that is independent of \((\alpha_1,\alpha_2)\) and for which (4) holds. Namely, the existence of this \(\lambda\) is not a consequence of locality alone, as it depends also on the assumption of a "no conspiracy" condition. Unlike locality, the "no conspiracy" condition involves anthropocentric elements, such as the distinction between the parameters \(\alpha_1\ ,\) \(\alpha_2\) (instrument settings, controllable by human experimenters) and the various other beables that are relevant for the experiment. For this reason, it does not seem possible to write down a clean mathematical definition of "non-conspiratorial" theory (as Bell did for local theory) in terms of conditional probabilities for values of beables posited by the theory36. (As usual, anthropocentric conditions are vague.) In particular, it is not possible to give a mathematical proof that for a "non-conspiratorial" local theory, there exists a \(\lambda\) independent of \((\alpha_1,\alpha_2)\) for which condition (4) holds. (Obviously, a mathematical proof cannot relate a mathematically formulated condition to a condition that is not formulated mathematically37.)

Nevertheless, we can argue (without any pretension to mathematical formalization) that for a "non-conspiratorial" local theory, a subset \(\lambda\) of the data codified by \(X\) satisfying these properties does exist. We do that by analyzing the meaning of various subsets of the local beables living in region 3. To begin with, notice that (it is likely that) the vast majority of those beables are irrelevant for the experiment and can be ignored. Let us then focus on the beables that are relevant for the experiment. Some of these beables (call them \(\mathfrak a_1\)) will determine or influence the setting \(\alpha_1\ .\) Similarly, some of these beables (call them \(\mathfrak a_2\)) will determine or influence the setting \(\alpha_2\ .\) One can think about \(\mathfrak a_i\) as the beables describing a computer getting ready to choose the parameter \(\alpha_i\ .\) (In a deterministic theory, the parameter \(\alpha_i\) should be a function of \(\mathfrak a_i\ ,\) but for a stochastic theory there could be additional randomness in the process that generates \(\alpha_i\) from \(\mathfrak a_i\ .\)) We take \(\lambda\) to denote the remaining local beables in region 3 that are relevant for the experiment.

For a "non-conspiratorial" theory, one must be able to define the sets of local beables \(\mathfrak a_1\ ,\) \(\mathfrak a_2\ ,\) and \(\lambda\) in such a way that \(\lambda\) is independent of \((\alpha_1,\alpha_2)\ .\) Let us now argue that, if the theory is local, condition (4) must hold for this \(\lambda\ .\) Since, among the local beables in region 3, only \(\lambda\) and \(\mathfrak a_1\) are relevant for the outcome \(A_1\) and since \(\mathfrak a_1\) is relevant to \(A_1\) only through \(\alpha_1\ ,\) the same thoughts motivating Bell's definition of locality lead to the conclusion that, upon conditioning on \(\lambda\) and \(\alpha_1\ ,\) the outcome \(A_1\) should be independent of \((A_2,\alpha_2)\ ,\) i.e., \(P_{\alpha_1,\alpha_2}(A_1|A_2,\lambda)=P_{\alpha_1}(A_1|\lambda)\ .\) For similar reasons, we have \(P_{\alpha_1,\alpha_2}(A_2|\lambda)=P_{\alpha_2}(A_2|\lambda)\) and hence \(P_{\alpha_1,\alpha_2}(A_1,A_2|\lambda)=P_{\alpha_1,\alpha_2}(A_1|A_2,\lambda)P_{\alpha_1,\alpha_2}(A_2|\lambda)=P_{\alpha_1}(A_1|\lambda)P_{\alpha_2}(A_2|\lambda)\ ,\) i.e., (4) holds.

Experiments

Bell's theorem brings out the existence of a contradiction between the empirical predictions of quantum theory and the assumption of locality. Since locality has been widely taken to be an implication of relativity theory, one thus has some grounds for wondering if the relevant predictions of quantum theory are correct. This question can only be addressed through experiment.

The first really convincing experimental tests of the relevant quantum predictions were produced in 1981—1982 by Aspect et al.38. These experiments involved measuring the polarizations of pairs of photons emitted (in a state of total angular momentum zero analogous to the singlet state mentioned previously) during the decay from an excited state of calcium. Correlations between the outcomes of the two polarization measurements were monitored as the axes along which the polarizations were being measured were changed. Results consistent with the quantum predictions were observed and a Bell-type inequality was violated with high statistical significance. A subsequent experiment39 demonstrated that the quantum predictions continued to hold even when the apparatus settings (i.e., the axes along which the incoming photons' polarizations were measured) were not fixed until the last possible moment — after the photons had already been emitted by the source. (Rather than physically rotate a piece of measurement apparatus — a practical impossibility on the ten-nanosecond timescale involved in a photon's traversal of the several meters distance between the calcium source and a detector — Aspect et al. used an ingenious device that shunted each incoming photon — effectively randomly for the purpose at hand — to one of two polarization measurement devices of fixed orientation.)

The innovation of Aspect et al. represented an important first step toward closing the so-called locality loophole40. Recall that the locality assumption used in, for example, the derivation of the CHSH–Bell inequality, requires that the (conditional) probability distribution for possible outcomes of one of the measurements be independent of the choice of apparatus setting for the other measurement. But this is a consequence of the relativistic notion of locality only if each apparatus setting is made too late for it to affect (via influences propagating at the speed of light) the distant measurement. Fixing the final apparatus settings only after the photons (moving at the speed of light) have been emitted ensured this. However, the 1982 experiment of Aspect et al. involved, on each side of the apparatus, a periodic switching between the two possible settings (albeit with incommensurate frequencies on the two sides); one could thus conceivably still worry that the photon source and/or the nearby measurement were somehow "anticipating" the final distant apparatus setting — thus violating the formal locality assumption but without violating relativity's supposed prohibition on superluminal influences.

The locality loophole was closed much more convincingly in a more recent experiment in Innsbruck by Weihs et al.41 in 1998. The basic experimental procedure was analogous to the one of Aspect et al., but the Innsbruck group used entangled pairs of photons created in parametric down-conversion (instead of the decay of calcium atoms like in Aspect et al.) and high-speed electro-optic modulators to switch between two polarization measurement settings on each side. Importantly, the modulators could be controlled on a nanosecond timescale, allowing the choice between the two possible apparatus settings on each side to be made (by independent, spatially-separated quantum random number generators) only well after the window for possible light-speed influence on the distant measurement had passed. Leaving aside the possibility of a cosmic conspiracy, this setup thus guarantees that the formal locality assumption can be violated only if some data from the measurement on one side is being somehow broadcast, faster than light, to the photon and/or measuring device on the opposite side and influencing the results there. In light of Bell's theorem the experiment thus quite conclusively establishes the relativistic non-locality of the actual world.

Other experiments (Tittel et al.42) have shown that the quantum predictions remain accurate even when the particles are allowed to separate by several kilometers before their polarizations are measured. Also, in experiments designed to close the so-called detection loophole43 (Rowe et al.44 and Matsukevich et al.45), Bell-type inequalities were violated even when a much higher fraction of all emitted pairs was successfully detected.

Another interesting recent experiment (Salart et al.46) relates experimental violations of a Bell-type inequality to the motion of the earth in order to put lower limits on the speed (relative to some hypothetical preferred frame) of any involved superluminal influences.

Bell's theorem and non-contextual hidden variables

The most naive reading of standard presentations of quantum theory might lead one to the following view: the quantum observables, normally mathematically represented by self-adjoint operators on a Hilbert space, are the dynamical variables of the theory and represent elements of physical reality (beables). According to this view, when one talks about "measuring the observable \(A\)" one simply means that \(A\) has a value which is unknown to the experimenter and that the "measurement" makes the experimenter aware of this value (just as, say, measuring the cholesterol in my blood informs me of the pre-existing amount of cholesterol in my blood). A theory that assigns well-defined values to all quantum observables at all times (and for which "measurement" of an observable simply reveals that pre-existing assigned value) is usually known as a non-contextual hidden variables theory. There are several theorems implying that non-contextual hidden variables theories are incompatible with certain quantum predictions. (The various forms of) Bell's inequality can also be used to establish this incompatibility.

Let us explain the appropriate mathematical formulation of non-contextual hidden variables theories. Given a complex Hilbert space \(\mathcal H\ ,\) whose rays correspond to (pure) states of a quantum system, then a non-contextual hidden variables theory associates with each quantum state \(\psi\) and each self-adjoint operator \(A\) on \(\mathcal H\) a random variable \(Z_A\) on some probability space \((\Lambda,P)\) (that might depend on \(\psi\)). The value of \(Z_A\) at a point \(\lambda\) of \(\Lambda\) represents the value of the observable \(A\) for a system that, according to the theory, is described both by the quantum state \(\psi\) and by the extra variable \(\lambda\ .\) (Successive preparations of the quantum state \(\psi\) might generate different values of \(\lambda\ .\) The probability measure \(P\) on \(\Lambda\) describes their statistics.)

The compatibility condition between the non-contextual hidden variables theory and the empirical predictions of the given quantum theory is the following: if \(A_1, \ldots, A_n\) are mutually commuting self-adjoint operators on \(\mathcal H\ ,\) then the spectral measure47 on \(\mathbb R^n\) defined from the operators \((A_1,\ldots,A_n)\) and the state \(\psi\) should coincide with the distribution of the random vector \((Z_{A_1},\ldots,Z_{A_n})\ .\) Notice that both inequality (1) and Bell's original inequality follow from the assumption of the existence of a non-contextual hidden variables theory (that covers the relevant experiments); also the CHSH–Bell inequality can be derived from this assumption. The violation of such inequalities by the quantum predictions therefore shows that non-contextual hidden variables theories are incompatible with the quantum predictions for a state space \(\mathcal H\) having at least four dimensions. (Four is, of course, the number of dimensions of the Hilbert space associated to the spin degrees of freedom of two spin-1/2 particles.) Some authors call this result "Bell's theorem" and this has given rise to a few misunderstandings.

The term "non-contextual" is motivated by the following: consider observables \(A\ ,\) \(B\) and \(C\) with \([A,B]=0\ ,\) \([A,C]=0\) but \([B,C]

e0\ ,\) so that while \(A\) and \(B\) are jointly measurable and \(A\) and \(C\) are jointly measurable, \(B\) and \(C\) are not. (Here \([A,B]=AB-BA\) denotes the commutator of \(A\) and \(B\ .\)) Then one can perform an experiment which counts as a "measurement" of both \(A\) and \(B\) and one can also perform an experiment which counts as a "measurement" of both \(A\) and \(C\ ,\) but these experiments must be different. If one assumes that such experiments reveal pre-determined values (i.e., if one assumes that nothing truly random in the outcome is being generated by the interaction of the apparatus with the system) then, since the two experiments under consideration are different, there is no justification for assuming that the pre-determined outcomes for the "measurement" of the observable \(A\) must be the same for both experiments. More precisely: within a theory that describes the quantum system using — besides the quantum state \(\psi\) — an extra variable \(\lambda\) that determines the outcomes of experiments, one could have different functions of \(\lambda\) (for a given \(\psi\)) associated to different strategies for "measuring" an observable \(A\ .\) A non-contextual hidden variables theory is therefore one that ignores the possibility that the value assigned to \(A\) might depend on the experimental context.

In simple terms, the assumption of "non-contextuality" is the assumption that the outcome of an experiment for "measuring" an observable \(A\) does not depend on the experiment — just on the given observable. But what distinct experiments that count as "measurements" of a given observable \(A\) must have in common is only the probability distribution on the set of all possible outcomes, for every possible preparation procedure for the system on which the "measurement" is going to be performed. In other words: two different experimental arrangements \(\mathcal E\ ,\) \(\mathcal E'\) designed for "measuring" the observable \(A\) should (within a theory in which the outcomes are pre-determined) be associated to (possibly) different random variables \(Z_A^{\mathcal E}\ ,\) \(Z_A^{\mathcal E'}\) on the probability space \((\Lambda,P)\) in which \(\lambda\) takes values. Of course, agreement with the quantum predictions requires that these different random variables have the same probability distribution (for every \(\psi\)).

Since everyone knows that different random variables can have the same probability distribution, it is somewhat surprising that so many are surprised by the incompatibility between "non-contextuality" and the quantum predictions. A possible explanation for this surprise might be the fact that many quantum observables usually carry nicknames (such as "momentum" and "energy") which are motivated by their association with certain quantities that are physically real according to some classical theory from which the given quantum theory was obtained by "quantization". Of course, words such as "momentum" and "energy" are quite powerful and suggest that one is talking about some physically real quantity. However, the statement that every quantum observable corresponds to a physically real quantity that is revealed by a measurement of that observable is logically incompatible with quantum theory.

Another way to prove that non-contextual hidden variables theories are not compatible with the quantum predictions is to prove the impossibility of a value map, i.e., a map \(v\) associating with each self-adjoint operator \(A\) on \(\mathcal H\) an element \(v(A)\) of the spectrum of \(A\) in such a way that \(v(A+B)=v(A)+v(B)\) and \(v(AB)=v(A)v(B)\ ,\) whenever \(A\) and \(B\) are commuting self-adjoint operators48. It is easy to see that the existence of a non-contextual hidden variables theory compatible with the quantum predictions implies the existence of a value map; one must simply fix (a quantum state and) an element \(\lambda\) of the probability space \((\Lambda,P)\) where the random variables \(Z_A\) are defined and set \(v(A)=Z_A(\lambda)\)49. Thus, the impossibility of a value map implies the incompatibility of non-contextual hidden variables theories with the quantum predictions.

The impossibility of a value map when \(\mathrm{dim}(\mathcal H)\ge3\) follows from Gleason's theorem50 (after Andrew M. Gleason) and also from the Kochen–Specker theorem51 (after Simon B. Kochen and Ernst P. Specker). Another proof of the impossibility of a value map when \(\mathrm{dim}(\mathcal H)\ge3\) was given by Bell himself5, after Gleason and before Kochen–Specker. (See also Section IV of Mermin52 and references therein for other proofs.) When \(\mathrm{dim}(\mathcal H)=2\ ,\) the corresponding operator algebra is somewhat trivial and it turns out that a non-contextual hidden variables theory compatible with the quantum predictions is possible; a concrete example was constructed by Bell5. When \(\mathrm{dim}(\mathcal H)\ge4\ ,\) a much simpler proof of the impossibility of a value map can be obtained from Mermin's theorem53 (after David Mermin).

Bell's theorem without inequalities

There are approaches to establishing the incompatibility between locality and the quantum predictions that do not use probabilistic inequalities, but instead rely only on perfect correlations. In this section, we sketch three such approaches. The first is based on a generalization of the EPR argument given by Schrödinger; it has appeared in the general form presented here in Hemmick54, but particular cases of it have appeared before55. The second approach is based on a GHZ state56 (after Daniel M. Greenberger, Michael A. Horne, and Anton Zeilinger) and the third approach is based on Hardy states57 (after Lucien Hardy).

We start by presenting Hemmick's approach. It depends on the notion of maximally entangled state. Given finite-dimensional Hilbert spaces \(\mathcal H_1\ ,\) \(\mathcal H_2\) having the same dimension \(n\) and orthonormal bases \((e_1,\ldots,e_n)\ ,\) \((e'_1,\ldots,e'_n)\) of \(\mathcal H_1\) and \(\mathcal H_2\ ,\) respectively, one defines the maximally entangled state \(\psi\) associated to these bases by58:

\(\psi=\frac1{\sqrt n}\sum_{i=1}^n e_i\otimes e'_i\in\mathcal H_1\otimes\mathcal H_2.\)

If a composite system is in a maximally entangled state then to each observable \(A\) on \(\mathcal H_1\) there can be associated another observable \(\overline A\) on \(\mathcal H_2\) in such a way that a measurement of \(A\) on the system corresponding to \(\mathcal H_1\) and a measurement of \(\overline A\) on the system corresponding to \(\mathcal H_2\) must always give the same outcome59. We have thus a situation analogous to the one considered in the EPR argument, namely, perfect correlations between outcomes of measurements of \(A\) on the first system and outcomes of measurements of \(\overline A\) on the second system. Assuming locality and that the measurements are performed at space-like separation, we conclude that a measurement of \(A\) on the first system must actually be revealing a pre-existing value \(v(A)\ ,\) which must depend only on \(A\) and not on the experimental arrangement used to measure \(A\ .\) This map \(v\) is then a value map and any proof of the impossibility of a value map for the Hilbert space \(\mathcal H_1\) leads then to a proof of non-locality. As discussed above, such proofs can be given for \(\mathrm{dim}(\mathcal H_1)\ge3\ .\)

Let us now turn to the second approach, based on a GHZ state for three spin-1/2 particles. What we present here is a modification of a proof of the impossibility of a value map for eight-dimensional Hilbert spaces given in Mermin60. We consider a setup with space-like separated measurements of spin components being performed on three spin-1/2 particles. For the \(i\)-th particle, \(i=1,2,3\ ,\) the experimenter can choose between measuring spin either along the \(x\)-axis (the observable \(\sigma^i_x\)) or along the \(y\)-axis (the observable \(\sigma^i_y\)). As usual, the possible outcomes (for each particle) are either 1 or -1.

Consider the following four \(\pm1\)-valued observables:

\(U_1=\sigma^1_x\sigma^2_x\sigma^3_x,\quad U_2=\sigma^1_y\sigma^2_y\sigma^3_x,\quad U_3=\sigma^1_y\sigma^2_x\sigma^3_y,\quad U_4=\sigma^1_x\sigma^2_y\sigma^3_y.\)

A straightforward computation shows that these four observables are mutually commuting and that their product \(U_1U_2U_3U_4\) equals minus the identity61. Therefore, there exists a state \(\psi\) which is an eigenstate for all of them and, moreover, the corresponding eigenvalues \(u_1\ ,\) \(u_2\ ,\) \(u_3\ ,\) and \(u_4\) satisfy \(u_1u_2u_3u_4=-1\ .\) Assume that the state prepared by the source is this common eigenstate \(\psi\ .\)

Since \(\psi\) is an eigenstate of \(U_1\) with eigenvalue \(u_1\ ,\) if the measured observables on the three particles are chosen to be \(\sigma^1_x\ ,\) \(\sigma^2_x\ ,\) and \(\sigma^3_x\) then the product of the three outcomes obtained must be equal to \(u_1\ .\) We now use locality and a three-sided analogue of the EPR argument62 to infer that the measurement of the observables \(\sigma^i_x\) must be revealing pre-existing values \(v(\sigma^i_x)\) satisfying \(v(\sigma^1_x)v(\sigma^2_x)v(\sigma^3_x)=u_1\ .\) Analogous arguments based on the fact that \(\psi\) is an eigenstate of the other observables \(U_2\ ,\) \(U_3\ ,\) and \(U_4\) can be used to show that the measurement of any one of the six observables \(\sigma^i_\alpha\) must be revealing a pre-existing value \(v(\sigma^i_\alpha)\ ,\) with the suitable three factor products of these pre-existing values being equal to \(u_2\ ,\) \(u_3\ ,\) and \(u_4\ .\) It follows that the product \(u_1u_2u_3u_4\) — being the square of the product of the six values \(v(\sigma^i_\alpha)\) — is equal to 1. This contradicts the fact that \(u_1u_2u_3u_4=-1\ .\)

Finally, let us turn to the third approach, based on Hardy states. Hardy states constitute a large class of entangled states for two spin-1/2 particles: namely, every entangled state that is not maximally entangled is a Hardy state. We follow the notation of Goldstein63, where the reader can find the detailed description of the relevant states and observables. The experimental setup consists of two spin-1/2 particles. For the \(i\)-th particle, \(i=1,2\ ,\) the experimenter can choose between measuring either the observable \(U_i\) or the observable \(W_i\ .\) The possible outcomes (for each particle) are taken to be either 0 or 1. As usual, measurements are performed at space-like separation. In what follows, for simplicity, we use the same notation for a quantum observable and for the outcome of its measurement (which, of course, is not assumed a priori to be pre-determined). For a given Hardy state, the observables \(U_i\) and \(W_i\) can be constructed so that the following four facts hold: (i) \(U_1U_2=0\ ;\) (ii) if \(U_1=0\) then \(W_2=1\ ;\) (iii) if \(U_2=0\) then \(W_1=1\ ;\) (iv) with positive probability, \(W_1=W_2=0\ .\)

It is easy to obtain a contradiction between locality and these four facts. Namely, assume locality64. By (i), in a given run of the experiment, either \(U_1\) or \(U_2\) must carry a pre-existing value of 0 (but it does not follow — as it does in the EPR argument — that both of them carry pre-existing values). In a given run of the experiment in which \(U_1\) carries the pre-existing value 0, it follows from (ii) that \(W_2\) must carry the pre-existing value 1. Similarly, in a given run of the experiment in which \(U_2\) carries the pre-existing value 0, it follows from (iii) that \(W_1\) must carry the pre-existing value 1. Hence, in each run of the experiment, either \(W_1\) or \(W_2\) carries the pre-existing value 1 and this contradicts (iv).

Controversy and common misunderstandings

There are many misunderstandings and controversies surrounding Bell's theorem. To begin with, we should note that while "Bell's theorem" as we have presented it here conforms with Bell's own understanding of his theorem, many other authors have presented as "Bell's theorem" very different arguments with very different conclusions — and many of those authors are often not even aware that what they are presenting differs so radically from Bell's own views. In this section we will try to shed some light on this messy state of affairs.

Missing the role of the EPR argument entirely

Section II of Bell's original paper6 containing the celebrated theorem starts (after a short introduction, contained in the first section) with a one-paragraph recapitulation of the EPR argument (reformulated in terms of spin), i.e., it starts with the assumption of locality and it deduces from this assumption the existence of a "more complete specification of the state" (the kind of more complete specification of the state that Einstein thought would suffice to restore locality to quantum theory). Bell then claims that this more complete specification (the pre-existing values for the outcomes of spin measurements) leads to an incompatibility with the quantum predictions. The mathematical details of the proof of this incompatibility (i.e., the derivation of Bell's inequality) appears later, in Section IV.

It seems likely that many readers didn't pay sufficient attention to the first paragraph of Section II (the beginning of Bell's argument, i.e., the EPR argument) and jumped too quickly to the mathematical considerations of Section IV (the proof of the inequality). Indeed, Bell himself comments in a footnote of a later paper that "the commentators have almost universally reported that it [his original paper] begins with deterministic hidden variables"65. One should also take into account the fact that, by the time Bell's theorem came along, the EPR argument was about 30 years old and it had been forgotten by many (or considered to have been somehow refuted by Bohr66). Whatever the historical explanation for the misunderstanding might be, it turns out that the general understanding within the physics community regarding Bell's theorem was that it established the impossibility of "hidden variables" (or, for those a little better informed, of "local hidden variables") and the role of the EPR argument (i.e., the fact that the non-locality problem arises anyway if we regard quantum theory as complete) was missed entirely. Moreover, many authors took Bell's theorem to be a proof that, with regard to the EPR argument, Einstein was wrong and Bohr was right. While it is indeed true that Bell's theorem shows that Einstein was wrong, in that the assumption of the EPR argument (locality) turned out to be incorrect, it is not at all true that Bell's theorem shows that the EPR argument itself is not valid. In fact the EPR argument is correct and plays a crucial role in establishing that its main assumption is wrong. (That is, of course, a standard situation whenever a reductio ad absurdum is performed.)

Of course, not all commentators on Bell's theorem that disagree (knowingly or not) with Bell's conclusion have missed the EPR argument entirely. There are controversies and misunderstandings surrounding the EPR argument itself (or some poorly formulated version of it) and we shall discuss those in Subsection 10.3. But it should be recalled that not all presentations of Bell's theorem even require the EPR argument: for instance, the CHSH–Bell inequality can be proven directly from locality, as we have shown in Section 5. Of course, this alternative presentation of Bell's theorem generates controversies and misunderstandings of its own. One could, for instance, disagree with the claim that the mathematical formulation (4) of (a consequence of) the locality condition is adequate. (We will discuss some of those controversies and misunderstandings regarding the locality condition in Subsection 10.5.) In fact, though, since (as we have shown) the EPR argument becomes a simple mathematical theorem after (4) is accepted as a consequence of locality, it would be incoherent to accept that (4) is a consequence of locality but reject the EPR argument.

Bell's theorem proves the impossibility of "local realism"

One currently popular account of Bell's theorem has it showing that "local realism" is incompatible with the quantum predictions, so that one has to choose between abandoning locality or abandoning realism. Those who talk about "local realism" rarely explain what they mean by "realism". (Is "realism" related to "hidden variables" of some sort? What exactly is meant by "hidden variables"? Is "realism" related to determinism?) And when they do, it often becomes clear that the "realism" under consideration isn't among the actual assumptions of Bell's theorem, so that abandoning that kind of realism isn't a viable strategy for saving locality67 68. In what follows we discuss a type of "realism" that is actually relevant for Bell's theorem, but, as we will see, abandoning that kind of realism won't turn out to be a viable strategy for saving locality either.

Before we go any further, it should be pointed out that the advent of quantum theory has made many physicists quite suspicious of any analysis of what might be happening in nature when "no one is watching". The double-slit experiment, the so-called "delayed-choice" experiments, Bohr's principle of complementarity (and the EPR–Bell argument itself) are sometimes seen as evidence that certain aspects of the microscopic world transcend human understanding or, alternatively, that any discussion concerning elements of physical reality is meaningless or beyond the scope of science. (The use of the words "quantum mechanical system", Bell once noted, can have "an unfortunate effect on the discussion"69.) One should then allegedly settle for doing computations with operators and predicting the statistics of experimental outcomes. But, as discussed in Section 6, the very concept of locality involved in Bell's theorem cannot even be formulated without reference to elements of physical reality, i.e., to beables (and local beables)! Unfortunately, orthodox formulations of quantum theory are notoriously vague about which (if any) variables are to be taken seriously, as beables70. This unfortunate situation muddles discussions regarding the locality of orthodox quantum theory.

The fact that "locality" cannot be seriously discussed without reference to local beables can be illustrated, for instance, by the following simple example: if a married man dies then his wife instantly becomes a widow. Of course, no one takes that to be an instance of non-locality. On the other hand, if the death of the husband were to cause, say, an instantaneous increase in the body temperature of his wife then this would indeed be considered a violation of locality. The difference between the two cases is that, while the state of being a widow isn't associated with any element of physical reality localized around the wife, her body temperature is: it is a function of the local beables in the region of spacetime containing the wife71! Bell makes a similar point in his paper "La nouvelle cuisine":

When the Queen dies in London (may it long be delayed) the Prince of Wales, lecturing on modern architecture in Australia, becomes instantaneously King. (Greenwich Mean Time rules here.)72

Bell goes on to present an example directly related to physics, namely, the example of the infinite velocity of propagation of the scalar potential in Coulomb gauge that we mentioned above. Bell then concludes (before he begins discussing the concept of local beable):

Conventions can propagate as fast as may be convenient. But then we must distinguish in our theory between what is convention and what is not.73

While the EPR argument and Bell's theorem make no assumptions about what the elements of physical reality might be like, they cannot avoid talking about them. If one's criteria for accepting a sentence as being meaningful lead to the conclusion that any sentences that talk about "elements of physical reality" are meaningless74 then, according to such criteria, the relevant notion of locality for Bell's theorem (and thus Bell's theorem itself) becomes meaningless. Those who hold that position will avoid concluding that the quantum predictions imply non-locality, but they will also avoid the conclusion that the quantum predictions are compatible with locality! So refusing to talk about elements of reality is not a strategy by which one can defend the locality of quantum theory.

Hence, one possible notion of "realism" that is actually relevant for Bell's theorem is the willingness to accept statements about elements of physical reality as in principle meaningful. This "realism" isn't, however, an independent assumption that has to be taken together with "locality" for proving Bell's theorem; it is rather a precondition for the very meaningfulness of "locality". Thus, abandoning that sort of realism does not allow one to save locality; it merely prevents one from discussing it. (Another type of "realism" that is relevant for Bell's theorem will be discussed in Subsection 10.7.)

Some controversy regarding the EPR argument

Analyses of the EPR argument normally are focused on the presentation that appears in the original 1935 paper11 by Einstein, Podolsky, and Rosen. The EPR paper was developed in order to present an argument establishing the incompleteness of quantum theory, i.e., establishing that there are some elements of physical reality that are omitted by the standard quantum description (in the sense that they are not determined by the quantum state75).

With this goal in mind, the paper is careful about presenting a sufficient criterion for something to be an element of physical reality. The criterion presented is this: "If, without in any way disturbing a system, we can predict with certainty (i.e., with probability equal to unity) the value of a physical quantity, then there exists an element of physical reality corresponding to this physical quantity"76. This criterion simply reflects the fact that if the outcome of some experiment isn't pre-determined by some element of physical reality (i.e., if it is not a function of something that was an element of physical reality before the experiment) then its outcome involves some randomness and hence cannot be predicted with certainty. Some commentators, however, have taken Einstein's criterion to be an assumption of some sort or have objected to Einstein's use of the notion of "element of physical reality". (As discussed above, the use of such a notion could indeed conflict with someone's philosophical position regarding what sentences are to be considered meaningful.)

A somewhat different kind of criticism against the EPR argument involves the claim that it (allegedly) depends on some suspicious reasoning involving counterfactuals77. Here is an unfortunate formulation of the EPR argument that raises this kind of concern (under the setup with a pair of particles in the singlet state considered earlier): if the experimenter on one side chooses to measure spin along the \(z\)-axis then this experimenter can predict with certainty the outcome of the same measurement on the other side and therefore conclude that the outcome of this measurement corresponds to an element of physical reality there. The experimenter could, instead, choose to measure spin along the \(x\)-axis and, along the same lines, then conclude that the outcome of the same measurement on the other side corresponds to an element of physical reality. But the experimenter can only measure either the spin along the \(z\)-axis or the spin along the \(x\)-axis and thus (so the alleged rebuttal of the EPR argument goes) can't conclude that both the measurement outcomes (along the \(z\)-axis and along the \(x\)-axis) correspond to elements of physical reality on the other side, but rather only that one or the other (whichever one is in fact measured) does.

Considering this alleged rebuttal of the EPR argument, two observations are in order. First, for Einstein's original goal of establishing the incompleteness of quantum theory (assuming locality, of course), a simpler "single axis" version of the EPR argument is sufficient. The thesis of this "single axis" version of the EPR argument is merely that, when both experimenters choose to measure spin along the \(z\)-axis, then the outcomes of the measurements of spin along the \(z\)-axis are pre-determined. This "single axis" version of the EPR argument is (trivially) immune to the alleged rebuttal just discussed.

The second observation is that also the more general "several axes" version of the EPR argument — establishing the existence of pre-determined outcomes for measurements of spin along several axes at once — can be formulated without any counterfactuals and is therefore also immune to the alleged rebuttal discussed above. (Of course, it is this "several axes" version of the EPR argument which is needed for Bell's theorem.)

Here is the formulation of the "several axes" version of the EPR argument that does not involve counterfactuals: in order to explain (without violation of locality) the fact that the outcomes will be perfectly anti-correlated if the experimenters both measure spin along the \(z\)-axis, one has to assume that these outcomes are pre-determined. The same goes for measurements of spin along the \(x\)-axis. Even though, in each run of the experiment, either the \(z\)-axis or the \(x\)-axis is chosen along which to perform the measurements, the elements of physical reality that exist before the measurements cannot depend on choices that will be made later by the experimenters! This, indeed, doesn't follow from the assumption of locality itself but it does follow from the so-called "no conspiracy" assumption which states, roughly speaking, that the pair of particles prepared by the source does not "know" in advance what experiments are going to be performed on them later78.

Classical versus quantum probability (and logic)

Some authors regard the experiments yielding a violation of Bell-type inequalities as proving that classical probability theory is wrong and that it should be replaced by quantum probability theory. The term "quantum probability" is sometimes used simply to refer to the probabilities predicted by quantum theory for outcomes of experiments; those are, of course, distinct from the probabilities predicted by, say, classical mechanics. The term is, however, more often used to refer to the theory of quantum probability spaces79. A quantum probability space can be defined as a pair \((\mathcal H,\psi)\) where \(\mathcal H\) is a complex Hilbert space and \(\psi\) is a unit vector in \(\mathcal H\)80. One then uses the term (quantum) event to refer to a closed subspace \(\mathcal S\) of \(\mathcal H\ ;\) to each such subspace one can assign a probability which is the number \(\langle \psi,P_{\mathcal S}\,\psi\rangle\in[0,1]\ ,\) where \(P_{\mathcal S}\) denotes the orthogonal projection onto \(\mathcal S\ .\) Both the set of events of a classical probability space (i.e., the \(\sigma\)-algebra of measurable subsets of the sample space) and the set of (quantum) events of a quantum probability space carry the mathematical structure of a lattice, i.e., both are partially ordered sets (in both cases the partial order is inclusion) and any pair of elements admits a least upper bound (the "or" operation) and a greatest lower bound (the "and" operation). In both cases, the greatest lower bound is the intersection while, for the classical case, the least upper bound is the union and for the quantum case it is the closure of the sum81.

Some formulas involving probabilities and the lattice operations of events that are true in the classical case are not true in the quantum case. This fact should not, however, be blamed on "quantum queerness" but on the fact that when one uses the words "and" and "or" to refer to the lattice operations of a quantum probability space one is using these words with non-standard meanings! Of course, one can always change the truth value of a sentence by changing the meaning of its words and this is not evidence that physical systems are strange and counterintuitive. The motivation for calling a closed subspace \(\mathcal S\) of a Hilbert space a (quantum) event is that a "quantum measurement" of the observable \(P_{\mathcal S}\) is a \(\{0,1\}\)-valued experiment; it yields the result 1 with probability \(\langle \psi,P_{\mathcal S}\,\psi\rangle\in[0,1]\ .\) However, given two arbitrary closed subspaces \(\mathcal S_1\ ,\) \(\mathcal S_2\) of \(\mathcal H\ ,\) the \(\{0,1\}\)-valued experiments associated with \(\mathcal S_1\) and \(\mathcal S_2\) are in general mutually incompatible. Therefore a statement of the form "both the measurement of \(P_{\mathcal S_1}\) and the measurement of \(P_{\mathcal S_2}\) yield the value 1" does not correspond to any experiment and in particular is not in any way related to the experiment that is associated with the subspace \(\mathcal S_1\cap\mathcal S_2\) (except for the case in which \(P_{\mathcal S_1}\) and \(P_{\mathcal S_2}\) commute, of course).

The alleged need to abandon classical probability theory is sometimes also argued for on the basis of an incorrect analysis of the double slit experiment. However, as long as the usual meanings of words are kept, there is no need to get rid of classical probability theory (or classical logic). One should not confuse the use of the adjective "classical" as in "classical mechanics" with the use of the adjective "classical" as in "classical probability theory" or "classical logic". While classical mechanics is a physical theory which has been shown to be not empirically viable, classical probability theory and classical logic are methods of reasoning and cannot be tested empirically: such reasoning tools are what we use in order to draw conclusions from experiments so that we can decide which physical theories are or are not compatible with the results of those experiments.

Quantum probability theory is sometimes also seen as a new type of probability theory that allows for the possibility of non-commuting random variables which cannot be identified with (classical) random variables on a common probability space82. Of course, there is nothing "non-classical" or particularly strange about the fact that random variables on a common probability space are not always the right way to model outcomes of experiments; in fact, there is no reason why one should expect that random variables on a common probability space could be used to model the outcomes of incompatible experiments (unless one works under the assumption that the outcomes of those experiments reveal functions of elements of reality that exist independently of whether or not the experiments are performed). We will return to this point later (in Subsection 10.6 and again in Subsection 10.8) when we discuss again misunderstandings related to the role of non-commutativity.

Controversies and misunderstandings regarding the locality condition

The concept of locality that is relevant for Bell's theorem is sometimes mistakenly conflated with other concepts that appear in physics that are named "locality" by some authors. For instance, when one studies (classical or quantum) field theories, one learns that the Lagrangian of the theory should not contain terms of the form \(\phi(x)\phi(y)\ ,\) for example, involving the values of the field \(\phi\) at two or more different points of spacetime; Lagrangians not containing such terms are often referred to as being local. When one studies quantum field theories, one learns that space-like separated observables should commute, a requirement normally referred to as the locality condition or the local commutativity condition. Local commutativity is used to show that superluminal signalling is not possible within quantum field theory, i.e., the correlations predicted by the theory for outcomes of measurements performed at space-like separation cannot be used for communication between the experimenters. (In the notation of Section 5, this means that the unconditional marginal distribution of the outcome \(A_1\) does not depend on the parameter \(\alpha_2\) and, similarly, the unconditional marginal of \(A_2\) does not depend on \(\alpha_1\ .\))

The locality condition for the Lagrangian, local commutativity and the impossibility of superluminal signalling are all, of course, conditions that are related to the concept of locality that is relevant for Bell's theorem. But they are not equivalent to it. In fact, the very pair correlations between observables at space-like separation on the basis of which Bell concluded that quantum mechanics is non-local are well-defined (in a frame independent way) in quantum field theory precisely because, as a consequence of local commutativity, the observables do commute.

The fact that non-locality does not imply the possibility of superluminal signalling might appear particularly surprising; this fact will seem less surprising, however, if one keeps in mind that the concept of superluminal signalling involves anthropocentric notions such as controllability and observability that play no role in the concept of locality. In simpler words, the possibility of superluminal signalling is not just non-locality, it is a form of controllable non-locality. (Notice that, for instance, while the parameters \(\alpha_i\) are controllable by the experimenters, the outcomes \(A_i\) are not.)

Other misunderstandings are reflected by certain types of objections toward the adoption of the factorizability condition (4) as a consequence of locality. For instance, one might think that the \(\lambda\) appearing in (4) is a "hidden variable" or something suspicious of that sort. Nevertheless, the \(\lambda\) could, for instance, be nothing but the quantum state (which is taken to be fixed from one run of the experiment to the other, so that the probability space \((\Lambda,P)\) in which \(\lambda\) takes values is trivial in that case). Of course, if \(\lambda\) is nothing but the quantum state then condition (4) is not satisfied by the quantum predictions, as in that case there is nothing to explain the correlation between the outcomes. (This is precisely the point raised by the EPR paper11.)

Some authors (notably, Jon Jarrett83) have claimed that the locality condition proposed by Bell is too strong, i.e., it is more than just "locality". In order to understand the objection, one should notice first that condition (4) is equivalent to the conjunction of the following two sub-conditions:

\[(\text{OI})\quad P_{\alpha_1,\alpha_2}(A_1,A_2|\lambda)=P_{\alpha_1,\alpha_2}(A_1|\lambda)P_{\alpha_1,\alpha_2}(A_2|\lambda);\]

\[(\text{PI})\quad P_{\alpha_1,\alpha_2}(A_1|\lambda)=P_{\alpha_1}(A_1|\lambda),\quad P_{\alpha_1,\alpha_2}(A_2|\lambda)=P_{\alpha_2}(A_2|\lambda).\]

Condition (OI) says that, given \(\lambda\) and the parameters \(\alpha_1\ ,\) \(\alpha_2\ ,\) then the outcomes \(A_1\ ,\) \(A_2\) are independent. Condition (PI) says that, given \(\lambda\ ,\) then the marginal of the outcome \(A_1\) does not depend on the parameter \(\alpha_2\) and, similarly, the marginal of the outcome \(A_2\) does not depend on the parameter \(\alpha_1\ .\)

Condition (OI) is known in the literature as outcome independence and condition (PI) as parameter independence. Since the conjunction of (OI) and (PI) implies the CHSH–Bell inequality, it follows that any theory that makes the same predictions as quantum theory (and thus predicts the violation of the CHSH–Bell inequality) must violate either84 condition (OI) or condition (PI) (or both). Some authors claim that only condition (PI), rather than (4), is a consequence of locality. Since condition (PI) doesn't have to be violated by a theory that matches the quantum predictions, these authors conclude that there is no incompatibility between the quantum predictions and locality. The misunderstanding is likely to have originated from a misinterpretation of the meaning of \(\lambda\)85. Namely, recall that in Section 6 we have defined \(\lambda\) to be the complete specification of the local beables (relevant for the experiment, but not for the process that chooses \(\alpha_1\) and \(\alpha_2\)) in a region of spacetime that shields the measurements from the intersection of the interior of their past lightcones86. It is under this particular definition for \(\lambda\) that (4) is a consequence of locality. If one takes something else as a definition of \(\lambda\) then, indeed, a violation of condition (OI) might not imply a violation of locality.

A further note concerning the conditions (OI) and (PI): the distinction between \(A_i\) and \(\alpha_i\) (which allows one to separate (4) into (OI) and (PI)) is highly anthropocentric. Namely, the parameter \(\alpha_i\) is controllable by the human experimenter and the outcome \(A_i\) isn't. But such a distinction cannot play a role in the formulation of a fundamental concept such as locality. Nevertheless, there is a difference between violation of (OI) and violation of (PI) that is worth mentioning, since it might also have caused some authors to mistakenly regard only (PI) and not (OI) as a consequence of locality: if a theory satisfies (PI) and violates (OI), then it might be the case that the kind of non-local interaction between the two sides of the experiment can be thought of as a symmetrical interaction in which there is no objective fact about which side should be regarded as the "cause" and which side should be regarded as the "effect". In fact, because of this symmetry, one might argue that the cause/effect language should not be used in this case and that one should talk only about interactions. However, if (PI) is violated, the symmetry disappears, as it is reasonable to regard a parameter \(\alpha_i\) as one of the causes of the outcome on the other side of the experiment, but it is unreasonable to regard \(\alpha_i\) as a consequence of what happened on the other side of the experiment.

Locality versus non-contextual hidden variables

We have seen in Section 5 that the CHSH–Bell inequality can be proven from the assumption of locality. It is easy to see that the CHSH–Bell inequality can also be proven from the assumption of the existence of a non-contextual hidden variables theory (one that covers the relevant experiments, of course). Namely, the assumption of non-contextual hidden variables is mathematically formulated in terms of the existence of random variables \(Z^i_{\alpha_i}\) defined over a probability space \((\Lambda,P)\) such that the outcome \(A_i\) of the experiment with parameter choice \(\alpha_i\) is equal to the value of \(Z^i_{\alpha_i}\ .\) In other words, by conditioning on a given \(\lambda\in\Lambda\ ,\) we obtain a degenerate joint probability distribution for \((A_1,A_2)\) supported by the single outcome \(\big(Z^1_{\alpha_1}(\lambda),Z^2_{\alpha_2}(\lambda)\big)\ :\) \[P_{\alpha_1,\alpha_2}\big(A_1=Z^1_{\alpha_1}(\lambda),\ A_2=Z^2_{\alpha_2}(\lambda)|\lambda\big)=1.\] This degenerate joint probability distribution of \((A_1,A_2)\) given \(\lambda\) obviously satisfies the factorizability condition (4). The purely mathematical part of the argument showing that the CHSH–Bell inequality is a consequence of locality is nothing but a proof of the CHSH–Bell inequality from condition (4). Thus, this same mathematical reasoning (restricted to the particular deterministic case in which \(A_i\) is a function of \(\alpha_i\) and \(\lambda\)) is also a proof of the CHSH–Bell inequality from the assumption of non-contextual hidden variables.

This fact has caused certain misunderstandings. To begin with, the fact that the CHSH–Bell inequality can be proven from an assumption distinct from locality leads some authors into believing that the violation of the CHSH–Bell inequality does not imply non-locality. Of course, the correct thing to say is that we have two implications:

(i) non-contextual hidden variables \(\Rightarrow\) CHSH–Bell inequality, (ii) locality \(\Rightarrow\) CHSH–Bell inequality,

and the logical conclusion is that the violation of the CHSH–Bell inequality by the quantum predictions gives (by (i)) yet another proof of the incompatibility of non-contextual hidden variables with the quantum predictions and (by (ii)) a proof of the incompatibility between locality and the quantum predictions.

Unfortunately, misunderstandings go beyond that. The presentation that maximizes the misunderstanding requires a different notation from what we have been using, so let us make some adaptations. Assume that the experimenter on one side can choose between measuring either the \(\pm1\)-valued quantum observable \(X_1\) or the \(\pm1\)-valued quantum observable \(Y_1\ .\) (In our old notation, the choice between \(X_1\) and \(Y_1\) corresponds to two distinct values for the parameter \(\alpha_1\ .\)) Similarly, assume that on the other side the experimenter chooses between measuring the \(\pm1\)-valued quantum observable \(X_2\) or the \(\pm1\)-valued quantum observable \(Y_2\ .\) The quantum prediction for the left hand side of the CHSH–Bell inequality — with the absolute values removed — is given by the expected value \(\langle S\rangle\) of the quantum observable87: \[S=X_1X_2-X_1Y_2+Y_1X_2+Y_1Y_2=X_1(X_2-Y_2)+Y_1(X_2+Y_2),\] so that the existence of a quantum state for which the CHSH–Bell inequality88 \(\vert\langle S\rangle\vert\le2\) is violated is equivalent to the condition that the operator norm \(\Vert S\Vert\) be greater than 2. Taking into account that the observables \(X_i\ ,\) \(Y_j\) are \(\pm1\)-valued (so that their squares are equal to the identity) and that the observables carrying the index 1 commute with the observables carrying the index 2, a straightforward computation shows that: \[S^2=4+(X_1Y_1-Y_1X_1)(X_2Y_2-Y_2X_2)=4+[X_1,Y_1][X_2,Y_2].\] Since (because \(S\) is self-adjoint) \(\Vert S^2\Vert=\Vert S\Vert^2\ ,\) it follows that the existence of a quantum state for which the CHSH–Bell inequality \(\vert\langle S\rangle\vert\le2\) is violated is equivalent to the condition that \(\Vert S^2\Vert\) be greater than 4 and that condition is equivalent to the requirement that the commutators \([X_1,Y_1]\ ,\) \([X_2,Y_2]\) both be non-vanishing89.

Now, let us assume that we have a non-contextual hidden variables theory and, with some (here, deliberate) abuse of notation, let us use the same symbol to denote a given quantum observable and to denote the corresponding random vari