$\begingroup$

The complex-analytic proof of the prime number theorem can help inform the elementary one.

The von Mangoldt function $\Lambda$ is related to the Riemann zeta function $\zeta$ by the formula $$ \sum_n \frac{\Lambda(n)}{n^s} = -\frac{\zeta'(s)}{\zeta(s)},$$ at least in the region $\mathrm{Re}(s) > 1$. The right-hand side has a pole of residue 1 at the simple pole $s=1$ of $\zeta$, and a pole of residue -1 at each zero $s=\rho$ of $\zeta$. Applying Perron's formula carefully, one eventually arrives at the von Mangoldt explicit formula

$$ \sum_{n \leq x} \Lambda(n) = x - \sum_\rho \frac{x^\rho}{\rho} + \dots$$ where the remaining terms $\dots$ (as well as the way in which the infinite sum over zeroes is interpreted) will be ignored for this discussion. This formula strongly suggests that zeroes $\rho$ with real part strictly less than one will eventually only make a lower order contribution to $\sum_{n \leq x} \Lambda(n)$, whereas zeroes $\rho$ with real part equal to one would likely destroy the prime number theorem. In fact one can pursue this line of reasoning more carefully and show that the prime number theorem is equivalent to the absence of zeroes of zeta on the line $\mathrm{Re}(s)=1$.

But what if there was a way to make the contribution of the zeroes of zeta smaller than those of the pole, even when said zeroes lie on the line $\mathrm{Re}(s)=1$? One way to do this is to replace the log-derivative $-\frac{\zeta'(s)}{\zeta(s)}$ by the variant $\frac{\zeta''(s)}{\zeta(s)}$. Assuming for simplicity that all zeroes are simple, this function still has a simple pole at every zero $\rho$ (with residue $\frac{\zeta''(\rho)}{\zeta'(\rho)}$), but now has a double pole at $s=1$ (with residue $2$). As it turns out, this variant also has a Dirichlet series representation $$ \sum_n \frac{ \Lambda_2(n) }{n^s} = \frac{\zeta''(s)}{\zeta(s)}$$ which by Perron's formula suggests an asymptotic of the form $$ \sum_{n \leq x} \Lambda_2(n) = 2 x \log x + \sum_\rho \frac{\zeta''(\rho)}{\zeta'(\rho)} \frac{x^\rho}{\rho} + \dots$$ Now, even if there are zeroes on $\mathrm{Re}(s)=1$, their contribution should still be smaller than the main term, and one may now be led to predict an asymptotic of the form $$ \sum_{n \leq x} \Lambda_2(n) = 2 x \log x + O(x) \quad (4)$$ and furthermore this asymptotic should be easier to prove than the prime number theorem, as it is not reliant on preventing zeroes at the edge of the critical strip.

The functions $\Lambda_2$ and $\Lambda$ are related via the identity $$ \frac{\zeta''(s)}{\zeta(s)} = -\frac{d}{ds} (-\frac{\zeta'(s)}{\zeta(s)}) + (-\frac{\zeta'(s)}{\zeta(s)})^2$$ which after inverting the Dirichlet series leads to $$ \Lambda_2(n) = \Lambda(n) \log n + \sum_{d|n} \Lambda(d) \Lambda(\frac{n}{d}) \quad (5)$$ and from this and (4) we soon obtain (2) and hence (1).

One can also think about this from a "music of the primes" standpoint. The explicit formula is morally resolving $\Lambda$ into "harmonics" as $$ \Lambda(n) \approx 1 - \sum_\rho n^{\rho-1} + \dots \quad (6)$$ where I am deliberately being vague as to how to interpret the symbols $\approx$, $\sum$, and $\dots$. The Dirichlet convolution of $1$ with itself resembles $\log n$, while the Dirichlet convolution of $-n^{\rho-1}$ with itself resembles $n^{\rho-1} \log n$; also, cross-terms such as the convolution of $1$ with $-n^{\rho-1}$ experience significant cancellation and should be lower order. Thus the right-hand side of (5) amplifies the "1" harmonic of $\Lambda$ by a factor of about $2 \log n$, while damping out the contributions of each of the zeroes $\rho$ due to cancellation between the two terms, leading one to (1) or (2) as a smoothed out version of the prime number theorem. Conversely, (2) combined with (5) can be viewed as an elementary way to encode (the largest terms of) the explicit formula (4), without explicit mention of zeroes.

The most difficult part of the Erdos-Selberg elementary proof of the PNT is the passage from (1) to the prime number theorem, as here we must actually eliminate the scenario in which there is a zero on $\mathrm{Re}(s)=1$. The key to doing so is to exploit the non-negativity of $\Lambda$; morally, the expansion (6) prevents $\rho$ (and hence also $\overline{\rho}$) from lying on $\mathrm{Re}(s)=1$ as this would make the right-hand side RHS negative "on average" for certain sets of $n$. There are several ways to make this intuition precise; the standard complex-analytic proofs often proceed by testing $\Lambda$ against $n^{\pm it}$ and $n^{\pm 2it}$, where $1+it$ is a hypothetical zero of $\zeta$. Very roughly speaking, the idea of the elementary proof is to exploit not only the non-negativity of $\Lambda$, but also upper bounds coming from sieve-theoretic bounds such as the Brun-Titchmarsh inequality, which roughly speaking tell us that $\Lambda$ behaves "on average" as if it lies between $0$ and $2$. (Actually, the Selberg formula (2) already gives an adequate substitute for this inequality for the purposes of this argument.) On the other hand, from (6) we see that if there was a zero $\rho = 1+it$ on the line $\mathrm{Re}(s)=1$, then $\Lambda-1$ should have an extremely large correlation with $\mathrm{Re} n^{\rho-1} = \cos(t \log n)$, and these two facts can be placed in contradiction with each other, basically because the average value of $|\cos(t \log n)|$ is strictly less than one. However, because the zeroes $\rho$ are not explicitly present in the Selberg formula (2), one has to work quite a bit in the elementary proof to make this argument rigorous. In this blog post of mine I use some Banach algebra theory to tease out the zeroes $\rho=1+it$ from the Selberg formula (they can be interpreted as the "Gelfand spectrum" of a certain Banach algebra norm associated to that formula) to make this connection more explicit.