Ben Krause, Mariusz Mirek, and I have uploaded to the arXiv our paper Pointwise ergodic theorems for non-conventional bilinear polynomial averages. This paper is a contribution to the decades-long program of extending the classical ergodic theorems to “non-conventional” ergodic averages. Here, the focus is on pointwise convergence theorems, and in particular looking for extensions of the pointwise ergodic theorem of Birkhoff:

Theorem 1 (Birkhoff ergodic theorem) Let be a measure-preserving system (by which we mean is a -finite measure space, and is invertible and measure-preserving), and let for any . Then the averages converge pointwise for -almost every .

Pointwise ergodic theorems have an inherently harmonic analysis content to them, as they are closely tied to maximal inequalities. For instance, the Birkhoff ergodic theorem is closely tied to the Hardy-Littlewood maximal inequality.

The above theorem was generalized by Bourgain (conceding the endpoint , where pointwise almost everywhere convergence is now known to fail) to polynomial averages:

Theorem 2 (Pointwise ergodic theorem for polynomial averages) Let be a measure-preserving system, and let for any . Let be a polynomial with integer coefficients. Then the averages converge pointwise for -almost every .

For bilinear averages, we have a separate 1990 result of Bourgain (for functions), extended to other spaces by Lacey, and with an alternate proof given, by Demeter:

Theorem 3 (Pointwise ergodic theorem for two linear polynomials) Let be a measure-preserving system with finite measure, and let , for some with . Then for any integers , the averages converge pointwise almost everywhere.

It has been an open question for some time (see e.g., Problem 11 of this survey of Frantzikinakis) to extend this result to other bilinear ergodic averages. In our paper we are able to achieve this in the partially linear case:

Theorem 4 (Pointwise ergodic theorem for one linear and one nonlinear polynomial) Let be a measure-preserving system, and let , for some with . Then for any polynomial of degree , the averages converge pointwise almost everywhere.

We actually prove a bit more than this, namely a maximal function estimate and a variational estimate, together with some additional estimates that “break duality” by applying in certain ranges with , but we will not discuss these extensions here. A good model case to keep in mind is when and (which is the case we started with). We note that norm convergence for these averages was established much earlier by Furstenberg and Weiss (in the case at least), and in fact norm convergence for arbitrary polynomial averages is now known thanks to the work of Host-Kra, Leibman, and Walsh.

Our proof of Theorem 4 is much closer in spirit to Theorem 2 than to Theorem 3. The property of the averages shared in common by Theorems 2, 4 is that they have “true complexity zero”, in the sense that they can only be only be large if the functions involved are “major arc” or “profinite”, in that they behave periodically over very long intervals (or like a linear combination of such periodic functions). In contrast, the average in Theorem 3 has “true complexity one”, in the sense that they can also be large if are “almost periodic” (a linear combination of eigenfunctions, or plane waves), and as such all proofs of the latter theorem have relied (either explicitly or implicitly) on some form of time-frequency analysis. In principle, the true complexity zero property reduces one to study the behaviour of averages on major arcs. However, until recently the available estimates to quantify this true complexity zero property were not strong enough to achieve a good reduction of this form, and even once one was in the major arc setting the bilinear averages in Theorem 4 were still quite complicated, exhibiting a mixture of both continuous and arithmetic aspects, both of which being genuinely bilinear in nature.

After applying standard reductions such as the Calderón transference principle, the key task is to establish a suitably “scale-invariant” maximal (or variational) inequality on the integer shift system (in which with counting measure, and ). A model problem is to establish the maximal inequality

whereranges over powers of two andis the bilinear operator

The single scale estimate

or equivalently (by duality) is immediate from Hölder’s inequality; the difficulty is how to take the supremum over scales

The first step is to understand when the single-scale estimate (2) can come close to equality. A key example to keep in mind is when , , where is a small modulus, are such that , is a smooth cutoff to an interval of length , and is also supported on and behaves like a constant on intervals of length . Then one can check that (barring some unusual cancellation) (2) is basically sharp for this example. A remarkable result of Peluse and Prendiville (generalised to arbitrary nonlinear polynomials by Peluse) asserts, roughly speaking, that this example basically the only way in which (2) can be saturated, at least when are supported on a common interval of length and are normalised in rather than . (Strictly speaking, the above paper of Peluse and Prendiville only says something like this regarding the factors; the corresponding statement for was established in a subsequent paper of Peluse and Prendiville.) The argument requires tools from additive combinatorics such as the Gowers uniformity norms, and hinges in particular on the “degree lowering argument” of Peluse and Prendiville, which I discussed in this previous blog post. Crucially for our application, the estimates are very quantitative, with all bounds being polynomial in the ratio between the left and right hand sides of (2) (or more precisely, the -normalized version of (2)).

For our applications we had to extend the inverse theory of Peluse and Prendiville to an theory. This turned out to require a certain amount of “sleight of hand”. Firstly, one can dualise the theorem of Peluse and Prendiville to show that the “dual function”

can be well approximated inby a function that has Fourier support on “major arcs” ifenjoycontrol. To get the required extension toin theaspect one has to improve the control on the error fromto; this can be done by some interpolation theory combined with the useful Fourier multiplier theory of Ionescu and Wainger on major arcs. Then, by further interpolation using recentimproving estimates of Han, Kovac, Lacey, Madrid, and Yang for linear averages such as, one can relax thehypothesis onto anhypothesis, and then by undoing the duality one obtains a good inverse theorem for (2) for the function; a modification of the arguments also gives something similar for

Using these inverse theorems (and the Ionescu-Wainger multiplier theory) one still has to understand the “major arc” portion of (1); a model case arises when are supported near rational numbers with for some moderately large . The inverse theory gives good control (with an exponential decay in ) on individual scales , and one can leverage this with a Rademacher-Menshov type argument (see e.g., this blog post) and some closer analysis of the bilinear Fourier symbol of to eventually handle all “small” scales, with ranging up to say where for some small constant and large constant . For the “large” scales, it becomes feasible to place all the major arcs simultaneously under a single common denominator , and then a quantitative version of the Shannon sampling theorem allows one to transfer the problem from the integers to the locally compact abelian group . Actually it was conceptually clearer for us to work instead with the adelic integers , which is the inverse limit of the . Once one transfers to the adelic integers, the bilinear operators involved split up as tensor products of the “continuous” bilinear operator

on, and the “arithmetic” bilinear operator

on the profinite integers , equipped with probability Haar measure. After a number of standard manipulations (interpolation, Fubini’s theorem, Hölder’s inequality, variational inequalities, etc.) the task of estimating this tensor product boils down to establishing animproving estimate

for some. Splitting the profinite integersinto the product of the-adic integers, it suffices to establish this claim for eachseparately (so long as we keep the implied constant equal tofor sufficiently large). This turns out to be possible using an arithmetic version of the Peluse-Prendiville inverse theorem as well as an arithmeticimproving estimate for linear averaging operators which ultimately arises from some estimates on the distribution of polynomials on the-adic field, which are a variant of some estimates of Kowalski and Wright