Is This a Proof?



Seeking the limits of mathematical proofs



Ivan Vinogradov was a famous mathematician who created a whole method, named after him, that allowed him to make major advances on many open number theoretic problems. These included his theorem on the Weak Goldbach Conjecture and many others.

Today Ken and I wish to talk about a new type of “mathematical proof,” illustrating it for the Weak Goldbach Conjecture.



It is strange, pushes the limits of what we usually call a proof, but still might be considered a proof. Our plan is to present it first abstractly, then give a concrete example of the new style “proof.” The central questions today are: Is this new type of proof really a proof, and is it new?

What Is A Proof?

We know what a mathematical proof is. It is a convincing argument that some mathematical statement is correct. The rigor of proofs has changed slowly over the years, yet today we would all agree what constitutes a proof and what does not. Of course some are short and straightforward, some are short and complex, some are long and complex, few are long and straightforward.

The acid test is: can a proof be checked? If it cannot then most would argue it is not a proof. A poorly written proof may fail this test, even if the writer of the proof is convinced of its correctness. Some proofs over the years have been rejected because they failed this checking test, and some that failed were later judged to be correct. Many “correct” proofs still have typos or minor errors that are easy for a checker to fix. A typo of for may be easy to spot, and thus a proof that is technically incorrect is really fine.

Computer Proofs

Let be some statement we wish to prove. A standard proof would be a written argument that convinces you and others that is true. If the proof uses computation then what usually happens is:

For some statement you prove that . Then you use your computer to check that is correct.

This of course then shows that is proved. Well, this is the issue with computer-based proofs. The fear is that the checking of may be faulty due to a program error or even a hardware error. This can be reduced by having people review the programs, or even better having them independently write their own programs to check . But there remains some feeling—for many—that such proofs could be wrong. We now have many proofs that no human can check, including proofs of the Four-Color Theorem (until recently) and this year’s advance on Paul Erdős’s Discrepancy problem (EDP). Many are uncomfortable with proofs of this kind, proofs that rely on massive computation. In any case we will consider human proofs and machine proofs as “standard” types of proofs. For thoughtful comments on machine proofs see this by Robert Pollack.

We think we have discovered a new type of proof—it fits into neither of these categories.

Monte Carlo Proofs

We will now describe our new type of proof. It is a type of computer-based proof, but one that ups the ante and is even more problematic. We still think it is quite interesting and should be studied.

So let look at a new way to prove . Let and be additional statements for . The keys are the following:

We can prove that for each that . We can also prove that is true for at least of the indices . The statement is easy to check for any index by computation.

Then we can proceed as follows. Pick a set of random indices. For each in this set, check that is true. We claim that if all check correctly, then the probability that is true is at least .

The point is that each time we select an index there is a probability at least that is true. Since we check that is true, follows. But we may have selected an index so that is not true. The chance that this happens times is clearly at most , and so we have the claimed probability.

An Example

We return to the Weak Goldbach Conjecture. Vinogradov, in the 1930’s, famously proved that every sufficiently large odd number is the sum of at most three primes. His “sufficiently large” was not effective at first: he proved there was a constant so that all odd numbers larger than it were the sums of three primes. In 1956 it was shown that was upper-bounded by —a pretty large number. This was reduced in 2002 by Liu Ming-Chit and Wang Tian-Ze to approximately —still pretty large. Computer searches still cannot even get near such a bound.

The long-term goal of course has been to reduce this constant and prove eventually the following theorem:

Theorem 1 Every odd integer greater than can be expressed as the sum of three primes.

Recently this has been achieved by Harald Helfgott, who technically proved that the theorem is true for all odd numbers greater than . Computer searches had already verified the theorem up to . But let us cycle back to the situation before last year where was still on the order of . Let be the statement that all odd numbers below are the sum of at most three odd primes. The set of prime numbers below has density order-of . The key concept is:

Definition 2 A subset of the even numbers below is complementary to if every odd number below can be written as where and .

By a probabilistic argument, most subsets of size are complementary, where is a small known constant. (Indeed some finer results are now known.) The size is not too large, and the numbers themselves have only order- size. The final ingredient comes from the known heuristic bounds on Goldbachs’ Conjecture itself. For most even numbers , one is highly likely to find primes and summing to within trials.

Executing the Proof

Given the above facts, the following procedure is feasible to execute. It is a computer procedure, but the work involved is orders of magnitude smaller than the EDP proof, and the predicates needing to be checked are simpler than for the Four-Color Theorem.

Generate -many sets of -many even numbers below .

-many sets of -many even numbers below . For each , for each , search for primes summing to .

, for each , search for primes summing to . If each -case succeeds, accept and output the entire table of as the proof.

We could weaken the last step to accept if for a two-thirds majority of , every -case succeeds, but let us grant that the heuristic bounds on Goldbach’s Conjecture are good enough that this stricter and simpler policy has a high chance of success.

So granting that it succeeds and we get the output table—note that it is easy to verify afterward that the and are prime and sum to —as the “proof,” what do we really have? Define the statements as follows:

The statement says: “the set is indeed complementary for up to .”

The statement says: “the set has all its even numbers the sum of two primes.”

We claim that this fits our framework; let’s check. At least half the time . We suppose that some is a complement—we only need one. Let that is odd. Then where is a prime and is in . But we checked that is true, so all the are sums of two primes. This means that is proved.

One aspect is that if our routine failed, then we would have a significant “probabilistic” result of a completely different kind. It would indicate that the heuristic bounds on Goldbach’s Conjecture are incorrect. But once the routine succeeds, all that is water under the bridge. We have already proved each , and our routine has bypassed the need to prove , which would ostensibly be computationally prohibitive.

Note also that we are of course not assuming Goldbach is true overall—we are only needing to verify it for the tiny sample of numbers that go into each . That is, call “good” if is true, and “hyper-good” if every obeys the heuristic bounds on Goldbach. We don’t actually need to argue that with probability at least a random is hyper-good—once is checked the “hyper” part is dispensed with. Hence we only need that a random is good with probability at least . Then the confidence in being a proof is . Is that enough?

Comparing Other Probabilistic Proofs

Consider the problem of proving that a given arithmetic formula computes the zero polynomial over some finite field . Whenever does not compute the zero polynomial, a random subset of known small size will, with probability at least , include an argument giving . Now choose -many such subsets , and run a routine exhibiting that for every and . Let be the resulting table of zero values. (Of course, we could have just picked one random of size , but we are comparing with the above.) Is this exactly the same kind of “proof”?

We argue no. The main reason is that the statement corresponding to , namely that is “good,” is counterfactual. It is saying that if were a non-zero polynomial, then would have had a counter-example. The only way to avoid the problem of “which ?” seems to be for to be a universal hitting set for all non-zero -variable polynomials over . The size bounds on universal hitting sets, especially in arithmetical not Boolean complexity, are murkier to argue.

The cases of randomized algorithms for deciding primality and other problems in or (etc.) look similar. For a language in the Arthur-Merlin class , there is a predicate decidable in time such that for all :

Here we would want a bunch of pairs to be the proof of the theorem “ .” By the perfect completeness theorem for , we can replace by such that the first line holds for all , which is analogous to the idea of our routine always succeeding and outputting a set of such pairs.

The problem again, however, is that the goodness of the pairs is still defined with respect to counterfactual cases. Whereas in our proof, goodness is a positive statement, namely that the randomly-chosen set really is complementary to the set of primes.

Open Problems

Is this a new type of proof? What should we call it? Are there novel applications of the method? Note that the size of the range is less important than having concrete density estimates on intervals, since the point is needing to check relatively few members.