Prime numbers play a central role in analytic number theory, and are well known to be very well distributed among the reduced residue classes ( mod q ) . Surprisingly, the same does not appear to be true for sequences of consecutive primes, with different patterns occurring with wildly different frequencies. We formulate a precise conjecture, based on the Hardy−Littlewood conjectures, which explains this phenomenon. In particular, we predict that all patterns do occur their fair share of the time in the limit, but that there are secondary terms only very slowly tending to zero that create the observed biases.

Although the sequence of primes is very well distributed in the reduced residue classes ( mod q ) , the distribution of pairs of consecutive primes among the permissible ϕ(q) 2 pairs of reduced residue classes ( mod q ) is surprisingly erratic. This paper proposes a conjectural explanation for this phenomenon, based on the Hardy−Littlewood conjectures. The conjectures are then compared with numerical data, and the observed fit is very good.

1. Introduction

The prime number theorem in arithmetic progressions shows that the sequence of primes is equidistributed among the reduced residue classes ( mod q ) . If the Generalized Riemann Hypothesis is true, then this holds in the more precise form π ( x ; q , a ) = li ( x ) ϕ ( q ) + O ( x 1 / 2 + ϵ ) , where li ( x ) ≔ ∫ 2 x d t log ⁡ t , and π ( x ; q , a ) denotes the number of primes up to x lying in the reduced residue class a ( mod q ) . Nevertheless, it was noticed by Chebyshev that certain residue classes seem to be slightly preferred; for example, among the first million primes, we find that π ( x 0 ; 3,1 ) = 499,829 and π ( x 0 ; 3,2 ) = 500,170 , π ( x 0 ) = 10 6 . Chebyshev’s bias is beautifully explained by the work of Rubinstein and Sarnak (1) (see ref. 2 for a survey of related work), who showed (in a certain sense and under some natural conjectures) that π ( x ; 3,2 ) > π ( x ; 3,1 ) for 99.9 % of all positive x.

What happens if we consider the patterns of residues ( mod q ) among strings of consecutive primes? Let p n denote the sequence of primes in ascending order. Let r ≥ 1 be an integer, and let a = ( a 1 , a 2 , … , a r ) denote an r-tuple of reduced residue classes ( mod q ) . Define π ( x ; q , a ) ≔ # { p n ≤ x : p n + i − 1 ≡ a i ( mod q ) for each 1 ≤ i ≤ r } , which counts the number of occurrences of the pattern a ( mod q ) among r consecutive primes the least of which is below x. When r ≥ 2 , little is known about the distribution of such patterns among the primes. When r = 2 and ϕ ( q ) = 2 (thus q = 3 , 4, or 6), Knapowski and Turán (3) observed that all of the four possible patterns of length 2 appear infinitely many times. The main significant result in this direction is due to Shiu (4), who established that, for any q ≥ 3 , a reduced residue class a ( mod q ) , and any r ≥ 2 , the pattern ( a , a , … , a ) occurs infinitely often. Recent progress in sieve theory has led to a new proof of Shiu’s result (see ref. 5), and, moreover, Maynard (6) has shown that π ( x ; q , ( a , … , a ) ) ≫ π ( x ) .

Despite the lack of understanding of π ( x ; q , a ) , any model based on the randomness of the primes would suggest strongly that every permissible pattern of r consecutive primes appears roughly equally often; that is, if a is an r-tuple of reduced residue classes ( mod q ) , then π ( x ; q , a ) ∼ π ( x ) / ϕ ( q ) r . However, a look at the data might shake that belief! For example, among the first million primes (for convenience, restricting to those greater than 3), we find π ( x 0 ; 3 , ( 1,1 ) ) = 215,873 , π ( x 0 ; 3 , ( 1,2 ) ) = 283,957 , π ( x 0 ; 3 , ( 2,1 ) ) = 283,957 , and π ( x 0 ; 3 , ( 2,2 ) ) = 216,213. These numbers show substantial deviations from the expectation that all four quantities should be roughly 250,000 . Further, Chebyshev’s bias ( mod 3 ) might have suggested a slight preference for the pattern ( 2,2 ) over the other possibilities, and this is clearly not the case.

The discrepancy observed above persists for larger x, and also exists for other moduli q. For example, among the first hundred million primes modulo 10, there is substantial deviation from the prediction that each of the 16 pairs ( a , b ) should have about 6.25 million occurrences. Specifically, with π ( x 0 ) = 10 8 , we find the following.

Apart from the fact that the entries vary dramatically (much more than in Chebyshev’s bias), the key feature to be observed in these data is that the diagonal classes ( a , a ) occur significantly less often than the nondiagonal classes. Chebyshev’s bias ( mod 10 ) states that the residue classes 3 and 7 ( mod 10 ) very often contain slightly more primes than the residue classes 1 and 9 ( mod 10 ) , but curiously in our data the patterns ( 3,3 ) and ( 7,7 ) appear less frequently than ( 1,1 ) and ( 9,9 ) ; this suggests again that a different phenomenon is at play here.

The purpose of this paper is to develop a heuristic, based on the Hardy−Littlewood prime k-tuples conjecture, which explains the biases seen above. We are led to conjecture that although the primes counted by π ( x ; q , a ) do have density 1 / ϕ ( q ) r in the limit, there are large secondary terms in the asymptotic formula which create biases toward and against certain patterns. The dominant factor in this bias is determined by the number of i for which a i + 1 ≡ a i ( mod q ) , but there are also lower-order terms that do not have an easy description.