Colin Beveridge tweeted about the following Formula for Pi the other day:

\(\pi = 1 + \frac{1}{2} + \frac{1}{3} + \frac{1}{4} – \frac{1}{5} + \frac{1}{6} + \ldots\)

With the exact formula being:

After the first two terms, the signs are determined as follows: If the denominator is a prime of the form 4m – 1, the sign is positive; if the denominator is a prime of the form 4m + 1, the sign is negative; for composite numbers, the sign is equal the product of the signs of its factors.

This is due to Euler.

(If you just want to see a proof of this, skip to near the end of this post)

I opined that it wasn’t that surprising, because for any real number \(x\) you can get a sequence of alternating signs for this sum which converge to it as follows:

Inductively define \(\epsilon_n\) as follows: \(\epsilon_{n+1}\) is \(1\) if \(\sum\limits_{i \leq n} \frac{\epsilon_i}{n} < x\), else \(\epsilon_{n+1} = -1\).

After the first time the partial sums \(s_n = \sum\limits_{i \leq n} \frac{\epsilon_i}{n} \) cross \(x\), you must have \(|s_n – x| \leq \frac{1}{n}\), so the sum converges to \(x\).

There are many inequivalent sums that lead to \(x\) too. e.g. You can make the convergence arbitrarily slow if you like: If you have some sequence of non-negative numbers \(b_n \to 0\) then you can construct a sequence \(\epsilon_n\) as above where instead of changing the sign every time you cross \(x\), you change the sign to negative when you cross \(x + b_n\), then back to positive when you cross \(x – b_{n+1}\), etc.

There’s nothing special about \(\frac{1}{n}\) here either – all that matters is that it’s a non-negative sequence tending to zero whose sum does not converge.

But there is something special about \(\frac{1}{n}\) here that means we can probably expect a particularly rich set of “natural” examples from it due to a slightly stronger condition it satisfies: It’s in \(l^2\). That is, \(\sum\frac{1}{n^2}\) converges (specifically to \(\frac{\pi^2}{6}\), but that doesn’t matter for now).

Why does this matter?

Well it turns out to be interesting for the following reason: If \(a_n\) is a sequence in \(l^2\), and \(B_1, \ldots B_n, \ldots\) are independent Bernoulli random variables, then \(\sum (-1)^{B_n} a_n\) converges to a finite value with probability 1. That is, if we just randomly pick each sign then then result will converge to something.

This follows from the following stronger theorem: Let \(A_i\) be a sequence of independent random variables with \(E(A_i) = 0\) and \(\sum E(A_i)^2 < \infty\). Then \(\sum A_i\) converges to a finite value with probability one.

First a thing to note: Because of Kolmogorov’s zero-one law, such a sequence will converge with probability \(0\) or \(1\) – no in between values are possible. I think this makes the result significantly less surprising (though we won’t actually use this fact in the proof).

The key idea of the proof is this: We’ll use the bound on the tail behaviour that convergence gives us to look at the \(\lim\inf\) and the \(\lim\sup\) (the limit inferior and limit superior) of the sums of the random variables and show that these are equal almost surely. This implies that the limit exists and is equal to their common value (which might be infinite).

And the way we’ll do that is we’ll consider the possibility that they are well separated: Let \(D_{a, b} = \{ \lim\inf S_n < a < b < \lim\sup S_n\}\), where \(S_n = \sum\limits_{i \leq n} A_i\). Then \(\{\lim\inf S_n < \lim\sup S_n\} = \bigcup\limits_{a, b \in \mathbb{Q}} D_{a, b}\), which is a countable union, so if each \(D_{a, b}\) has probability zero then so does the whole set, and the limit exists.

So let’s now establish an upper bound on the probability of lying in \(D_{a, b}\). Pick any \(N\). If \(D_{a, b}\) occurs then the sums lie below \(a\) and above \(b\) (not at the same time) infinitely often after \(N\) because of the definitions of limit inferior and superior. So we can find some \(N < m < n\) with \(S_m < a\) and \(S_n > b\).

But then \(\sum\limits_{i=m}^n A_i > b – a\), and hence \(\sum\limits_{i=m}^n |A_i| > b – a\).

By the Cauchy-Schwartz inequality, for any \(t_1, \ldots t_n \geq 0\) we have \(n \sum t_i = (1, \ldots, 1) \cdot (t_1, \ldots, t_n) \leq ||(1, \ldots, 1)|| ||(t_1, \ldots, t_n)|| = \sqrt{n} (\sum t_i^2)^{\frac{1}{2}}\), so \(\sum t_i^2 \geq n (\sum t_i)^2\). Edit: This step is wrong. The result is still true because of deeper theory than I go into here (it comes from Martingale theory) and this step may be recoverable. I’ll try to fix it later. Thanks to @sl2c for pointing this out.

In particular applied to the above this means that \(\sum\limits_{i=m}^n |A_i|^2 \geq (n – m) (\sum |A_i|)^2 \geq (n – m)(b – a)^2 \geq (b – a)^2 \).

But \(\sum\limits_{i=m}^n |A_i|^2 \leq \sum\limits_{i=N}^\infty |A_i|^2\). So we must have \(\sum\limits_{i=N}^\infty |A_i|^2 \geq (b – a)^2\).

But for any non-negative random variable, \(P(X \geq x) \leq \frac{E(X)}{x}\). This means we must have \(P(D_{a, b}) \leq P\left(\sum\limits_{i=N}^\infty |A_i|^2 \geq (b – a)^2\right) \leq \frac{E(\sum\limits_{i=N}^\infty |A_i|^2)}{(b – a)^2} = \frac{\sum\limits_{i=N}^\infty E(|A_i|^2))}{(b – a)^2} \).

But \(N\) was arbitrary, and we know \(\sum\limits_{i=N}^\infty E(|A_i|^2) < \infty\), so as \(N \to 0\) the right hand side tends to zero. Therefore \(P(D_{a, b}) = 0\) as desired and the result is almost proved.

Note that we’ve not used the independence of the \(A_i\) so far! It’s only necessary to prove that the sums converge to a finite value. To see that we need some sort of condition for that, consider the following: Let \(X\) be a random variable that takes the values \(1, -1\) with equal probability. Let \(A_i = \frac{X}{n}\). Then \(E(A_i) = \) and \(\sum E(A_i^2) < \infty\), but \(\sum A_i\) takes the values \(\infty, -\infty\) with equal probability (and no other values).

But with independence this can’t happen for the following reason: For any random variable we have \(E(X) \leq E(X^2)^{\frac{1}{2}}\) (because \(0 \leq \mathrm{Var}(X) = E(X^2) – E(X)^2\), though it also follows from other less elementary results). So we have \(E(|S_n|) \leq E(S_n^2)^{\frac{1}{2}}\). But \(E(S_n)^2 = \sum\limits_{i, j \leq n} A_i A_j = \sum\limits_{i \leq n} A_n^2\), because \(A_i, A_j\) are independent for \(i

eq j\) so \(E(A_i A_j) = E(A_i) E(A_j) = 0\).

This means that \(E(|\sum A_i|) = \lim E(|S_n|) \leq \sqrt{\sum E(A_i)^2} < \infty\). Thus, necessarily, \(P(|\sum A_i| = \infty) = 0\).

And now the result is proved.

The proper setting for the above result is really the theory of Martingales, and the proof I gave above is mostly a modified proof of Doob’s Second Martingale convergence theorem, where I’ve used a stronger hypothesis to bound the difference in the inferior and the superior limits.

In fact, for the special case of our coin tossing a stronger statement is true: Suppose \(a_n\) is a sequence of positive numbers such that \(\sum a_n = \infty\) and \(\sum a_n^2 < \infty\). Let \(A_n\) be a sequence of independent random variables with \(A_n = \pm a_n\), taking each value with equal probability. Then for any interval \(a < b\), \(P(a < \sum A_n < b) > 0\).

To see this, write \((a, b) = (x – \epsilon, x + \epsilon)\) by letting \(x = \frac{a + b}{2}\) and \(\epsilon = \frac{b – a}{2}\).

Run the process above for constructing a sequence converging to \(x\) until we have \(N\) such that our choices so far have lead us to \(|\sum_{i \leq N} A_i – x| < \frac{\epsilon}{2}\) and \(\sum_{i > N} a_i^2 < \frac{\epsilon^2}{8}\). The initial sequence of choices that lead us here happens with probability \(2^{-N}\), and the condition on the sum of the tail guarantees via Chebyshev’s inequality that \(P(|\sum_{i \geq N} A_i| \leq \frac{\epsilon}{2}) \geq \frac{1}{2}\) so we have

\(P(|\sum A_i – x| < \epsilon) \geq P(\sum\limits_{i=0}^N A_i – x < \frac{\epsilon}{2}) P(\sum\limits_{i > N} A_i) < \frac{\epsilon}{2}) \geq 2^{-N-1} > 0\).

So any value is not only possible but at least somewhat probable.

But this is a true continuous distribution, in the sense that \(P(\sum A_i = u) = 0\) for any value \(u\). To see this, consider the values taken only on a sum of indices. Let \(A(T) = \sum\limits_{i \in T} A_i\). Then \(A(T)\) and \(A(\mathbb{N} \setminus T)\) are independent, so if \(A\) has atoms then both of \(A(T)\) and \(A(\mathbb{N} \setminus T)\) must also. But if we pick \(T\) to be some subsequence \(t_n\) where \(a_{t_n} < 2^{-n}\) then all assignments of signs produce a unique result, so the probability of achieving any given value is zero. (Thanks to Robert Israel for this proof).

But still, \(\pi\) is a bit of an outlier: The variance of this distribution is of course \(\sum \frac{1}{n^2} = \frac{\pi^2}{6}\), so \(\pi\) is \(\sqrt{6} \approx 2.45\) standard deviations out. I don’t have an exact estimate of how probable that is for this distribution, but again by Chebyshev’s inequality we know that we can’t get a result at least this extreme more than a sixth of the time.

So why do we get this particular value here?

Well it turns out to be an unsurprising result for another reason, which is that it comes from a general technique for producing interesting infinite sums by starting from interesting products.

Suppose we have \(\sum f(n) \frac{1}{n}\) where \(f(n)\) is some multiplicative function. i.e. \(f(mn) = f(m) f(p)\). Then we can write this as \(\prod (1 – f(p_i) \frac{1}{p_i})^{-1}\), with \(p_i\) being the i’th prime. There are issues of convergence we should worry about here, but in the finest Eulerian style we won’t. Both sides converging is probably a sufficient condition. A lot of this can probably be fixed by introducing a \(t\) parameter, assuming \(|t| < 1\) and taking the limit \(t \to 1\) of \(\sum t^n f(n) \frac{1}{n}\) and \(\\prod (1 – t f(p_i) p_i)^{-1}\), then because mumble mumble analytic continuation the two limits are equal. I haven’t checked the details, so I won’t do this here and will just follow Euler’s convention of assuming infinity isn’t a problem.

To prove this, we’ll consider the following sequence of infinite sums, \(P_0 = \sum\limits_{n \in R_n} f(n) \frac{1}{n} \), where \(R_n\) is the set of numbers not divisible by any of the first \(n\) primes (so \(R_0 = \mathbb{N}\)).

If we split \(R_n\) up into the set of numbers which are a multiple of \(p_{n+1}\) and those that are not, we get the recurrence relationship that \(P_n = \frac{f(p_{n+1})}{p_{n+1}} P_n + P_{n+1}\), by just taking out a single factor of \(\frac{f(p_{n+1})}{p_{n+1}} \) from the components of the sum over the values that are divisible by \(p_{n+1}\). So \((1 – \frac{f(p_{n+1})}{p_{n+1}} ) P_n = P_{n+1}\)

Iterating this, we get \(P_0 \prod\limits_1^\infty (1 – (\frac{a_{p_{n+1}}}{p_{n+1}} ) = \lim\limits_{n \to \infty} P_n = 1\).

i.e. \(\sum f(n) \frac{1}{n} = P_0 = \prod\limits_1^\infty (1 – \frac{f(p_{n+1)}}{p_{n+1}})^{-1} \) as desired.

We can also go from products to sums: If we have \(\prod (1 – b_i )^{-1} \) then we can write it as \(\sum S_n \), where \(S_n\) is the sum of all products of sequences of \(n\) of the \(b_i\) (allowing repetitions).

If we then have \(b_i = f(p) \frac{1}{p_i}\), this becomes \(\prod (1 – f(p_i) \frac{1}{p_i})^{-1}\), the products all become unique, and we have \(\sum \frac{a_n}{n}\) again, where we extend \(f\) to all numbers by multiplying its values on their prime factors.

Using these tools, proving the sequence result for \(\pi\) becomes a matter of (somewhat tedious) computation.

We start with \(\frac{\pi}{4} = \arctan(1) = 1 – \frac{1}{3} + \frac{1}{5} – \ldots\) using the standard power series for arctan. This can be written in the above form with \(f(n) = 0\) if \(n\) is even, \(f(n) = -1\) if \(n \mod 4 = 3\) and \(f(n) = 1\) if \(n = 1 \mod 4\). So we have:

\(\frac{\pi}{4} = \prod (1 – f(p_i) \frac{1}{p_i})^{-1}\).

We can do the same with the standard sum \(\frac{\pi^2}{6} = \sum \frac{1}{n^2} = \prod (1 – \frac{1}{p_i^2})^{-1}\) (because \(n \to \frac{1}{n}\) is certainly multiplicative).

Now, divide the second product by the first and we get:

\(\frac{2}{3} \pi = \frac{1}{1 – \frac{1}{2^2}} \prod\limits_{p_i > 2} \left( \frac{1 – \frac{1}{p_i^2}}{1 – f(p_i) \frac{1}{p_i}} \right)^{-1}\)

The term outside the product comes from the fact that there’s no \(2\) term in our product for \(\frac{\pi}{4}\) and is equal to \(\frac{4}{3}\), so rearranging we get \(\frac{\pi}{2} = \prod\limits_{p_i > 2} \ldots\).

The term inside the product actually splits up fairly nicely: Because we can factor \(1 – \frac{1}{p_i^2} = (1 – \frac{1}{p_i})(1 + \frac{1}{p_i})\). The bottom is now one of these two terms, so this factors into whichever of the two the bottom is not. i.e. as \(1 + f(p_i)\).

So from this we conclude that

\(\frac{\pi}{2} = \prod_{p_i > 2} (1 + f(p_i))^-1\), or \(\pi = \frac{1}{1 – \frac{1}{2}} \prod_{p_i > 2} (1 + f(p_i))^-1\).

If we define \(g\) as \(g(2) = 1\), \(g(p) = -f(p)\) for \(p\) an odd prime, and extend to composite numbers multiplicatively, this then becomes \(\pi = \prod (1 – g(p_i))^{-1}) = \sum g(n) \frac{1}{n}\), which was our desired sum.

This proof more or less follows Euler’s, via the translation by Springer (which I borrowed off a friend and would honestly recommend you do the same rather than paying more than £100 for). The details are a bit different mostly because it was the only way I could follow them – in particular he tends to eschew the detailed algebra and just manipulate examples – and unlike him I feel guilty about the lack of precise analysis of convergence, but it’s certainly derived from his.

The product representation also allows us to strengthen our original result: Not only can any number be the limit of the sum by choosing primes appropriately, we can insist that the signs are multiplicative in the sense that the sign assigned to \(mn\) is the product of the signs assigned to \(m\) and \(n\).

The reason for this is that \(\sum \frac{1}{p_i}\) diverges. As a result, \(\prod\limits_{p_i > n} (1 + \frac{1}{p_i}) = \infty\) and \(\prod\limits_{p_i > n} (1 – \frac{1}{p_i}) = 0\). This means we can just repeat our original construction with the product instead of the sum: Assign positive signs as long as the partial product is less than the desired result, assign negative ones as long as it’s greater. The result is a product converging to the desired value, which we can then turn into a sum converging to the desired value with multiplicative signs.

So that more or less completes the journey down this particular rabbit hole: You can get any number by manipulating the signs, you should expect most choices of numbers to be at least reasonably plausible, and we have some useful machinery for picking good choices of signs for leading to specific numbers.

Whether or not it’s still surprising is up to you.

(This post turned out to be a ridiculous amount of work to write. If you want more like it, or just want to say thanks, I do have a Patreon for this blog. Any donations you want to send my way would be warmly appreciated)