If the random walk for the immunogenic cells is unbiased (e.g., for the random walk describing population size, if cell division and cell death are equally likely), then the probability for an immunogenic cell population to reach a threshold K is given by 1/K ( Methods ). This gives a first-approximation prediction that the risk of immune escape, which we denote by R, rises exponentially with age at the same rate that T cell production declines. This defines a model for disease incidence with one fitting parameter, that being an overall prefactor ( Table 1 ). If the random walk is biased (e.g., if the rates of cell division and cell death are not equal), a similar calculation produces a more general prediction for incidence with one additional parameter ( Table 1 ). We will refer to these one- and two-parameter model predictions as, respectively, immune model I (IM-I) and immune model II (IM-II). The additional fitting parameter of IM-II can be interpreted as a “pivot age,” which marks a transition from very low to relatively much higher risk ( Methods ). We stress that α is not a fitting parameter, but the empirically derived rate from thymus involution, given by 0.044 y −1 , which we use for all of our analysis.

We developed a mathematical model of cancer incidence based on two assumptions: first, that potentially cancerous cells arise with equal probability at any age, and, second, that there exists an immune escape threshold (IET), proportional to T cell production, above which immunogenic cells can overwhelm the immune system and result in a clinically detectable disease ( Fig. 1 and Fig. S3 ). For the sake of generality, as the model can also relate to age-related incidence of infectious diseases, the immunogenic cells could be mutated somatic cells or a population of infectious pathogens. We do not define the biological interaction between the T cell pool and the nascent tumor/infection; however, the concept of declining immune competence is consistent with several known mechanisms: for instance, both T cell repertoire diversity and the proliferative capacity of naive T cells decrease with age ( 9 ). Our model is thus derived as follows: once immunogenic cells arise, the population of such cells will change over time, leading to stochastic dynamics in population size, clonal diversity, and potentially other properties. The simplest way to capture these dynamics is through a birth–death process, and to a first approximation this can be modeled as a biased random walk ( 10 ). Fig. 1 provides a schematic view of the model dynamics in terms of population size. If the random walk exceeds the IET, the immune system will no longer be able to respond effectively and immune escape occurs.

Infectious disease incidence. Log-linear plots of incidence (per 100,000 person-years) by age group for all ABC bacterial infections, West Nile virus (WNV) disease, and Influenza A, ordered from best fit to worst. Bacterial and viral diseases are shaded yellow and green, respectively. The two-parameter IM-II is in red, while the one-parameter IM-I is in orange. Incidence often decreases initially from birth due to an underdeveloped immune system in infants; therefore, models are fitted only to data points for ages greater than 18 y. Error bars show 99% CIs for all diseases.

For most infectious diseases, the increase in risk with age is believed to be due to changes in the immune system and therefore provides a good first test for our model. The assumption that the immunogenic cells arise with equal probability at any age amounts to assuming constant exposure across age groups. We found that six of the seven bacterial infections monitored by the Active Bacterial Core (ABC) surveillance program ( Data Sources ) fit IM-II well (R 2 > 0.9), with better fitting for those incidence curves underpinned by higher incidence and larger population sizes and hence associated with a smaller relative uncertainty [i.e., smaller confidence intervals (CIs)]. Turning to viral diseases, the incidence of West Nile virus (WNV) disease is particularly well fit by IM-II (and indeed IM-I). However, influenza A is not fit well, instead rising exponentially at a faster rate ( Fig. 2 ). Prevalence of tuberculosis infection in Cambodia also fits the model well ( Fig. S4 ). Indeed, even IM-I fits these infectious diseases very well, which confirms the importance of the thymic involution timescale. This provides confidence in applying our approach further.

Cancer Incidence.

We next tested our model against cancer incidence curves, across 101 cancer types under the ICDO3 WHO2008 classification (11). Fitting IM-II to the incidence curves, the median R2 was found to be 0.956, with 57 cancer types fitting very well (R2 > 0.95). Since IM-II has the same number of fitting parameters as the widely used PLM of cancer incidence, a direct comparison is possible. The PLM performs slightly worse overall (R2 > 0.95 for 48 cancer types, median R2 = 0.947; R2 and associated fitting measures for each cancer type can be found in Dataset S1), with cancers whose incidence rises exponentially, such as chronic myeloid leukemia (CML) and brain cancer, fitting IM-I and IM-II better than the PLM. Many cancer types, including colon and gallbladder, fit both the PLM and IM-II very well (Fig. 3). There are no examples of PLM fitting well and notably better than IM-II. The ability of IM-II to capture the power law behavior seen in cancer incidence curves is an unexpected feature of the model and is discussed further in SI Theory [where we show that IM-II exhibits an apparent power law with power e/(e − 2) ∼ 3.78 in the age range of 33–82 y]. We note that, of the top 10 best-fit cancers, the 9 carcinomas have pivot ages tightly clustered from 56.3 to 60.5 y (Dataset S1), suggesting a clinical significance of the mid- to late fifties as an age of particular importance for screening and intervention. In contrast, the PLM by definition is “scale-free” and thus has no associated age range of particular importance from a clinical perspective.

Fig. 3. Cancer incidence. Log-linear plots of incidence (per 100,000 person-years). Data taken from SEER (11). (A and B) Some cancer types rise exponentially fitting IM-I (A), while some cancer types rise like power laws, although can still be fit by IM-II (B). Fitting curves for IM-II and PLM are shown in red and green, respectively. (C) The top 20 best-fitting incidence curves as measured by Akaike Information Criterion (AIC) for IM-II. (D and E) Universal scaling functions for all cancers with defined pivot ages (84 out of 101 cancer types) plotted in gray with the top 20 incidence curves highlighted. Data shown for both genders (D) and gender-separated data (E), with dotted lines showing the model predictions for IM-I and IM-II. The gender-separated curves are fitted with higher independently determined values for α in males than females, reflecting the gender bias in T cell production (Methods). A purely exponential incidence curve would correspond to a pivot age of negative infinity, and therefore, for the purposes of plotting, we set a minimum pivot age of −50 y. Models are fitted only for ages greater than 18 y. Error bars show 95% CIs for all diseases.

From the form of the equation of IM-II we can see that, up to a shift in age and an overall multiplicative factor, all incidence curves should follow the same function (Methods and Fig. 3D). This “universal scaling function” shows the range of behaviors possible within the model. Indeed, the quality of data collapse of incidence data onto the universal scaling function for IM-II is excellent, giving strong support to our model and highlighting those cancer types that fit the model particularly well. One such cancer, CML, is characterized by a single translocation event resulting in the formation of the Philadelphia chromosome (12). This is a good candidate for the type of initiating event featured in our model. Assuming this translocation event can happen at any age, on neglecting the IET one might expect that incidence would be approximately constant. Instead, incidence doubles every 16 y, mirroring the exponential decay of T cell production, consistent with our model.

Examining which incidence curves fit poorly can give insight into the underlying diseases (Fig. S11, R2 < 0.9 for 28 cancer types with IM-II and 34 cancer types with PLM). For example, breast and thyroid cancer both rise rapidly and then plateau from middle age onward, possibly due to the significant hormonal influences for these cancers. Many cancer types have a plateau or even a dip in incidence around age 80. This cannot be explained by either IM-II or the PLM, since both give strictly increasing incidence with age. One can speculate that this decrease might be explained by declining tissue turnover. If this were the case, one would then expect cancer of the population of developing T cells itself (T cell lymphoblastic leukemia) to have an approximately constant risk profile with age, due to an exact cancellation of increasing risk from immunosenescence and decreasing risk from reduced cell production. This behavior is indeed observed when looking at adults above age 18 (Fig. S5). This finding supports the idea that both immune decline and decreasing tissue turnover contribute significantly to changes in cancer risk with age.

Our model has the potential to provide clinical insight into differences between cancer types. For example, those cancer types with a higher pivot age could be linked to tissues with a higher IET (Methods). This would decrease the probability of cancer initiation per se but would also imply that such cancers are larger or more advanced at the point of immune escape. From this, one would expect that pivot age should be inversely correlated with survivability, which indeed we observe (r = −0.6; value of P < 10−8; Fig. S7). To further test our model, we compared groups for which there are measurable differences in T cell production and disease incidence. While disease incidence is known to increase in immune-compromised groups, comparing males to females in the general population is more easily quantifiable. There is a gender bias in quantities of T cell receptor excision circle (TREC) DNA with age (4), which can be used to infer differences in naive T cell production between males and females. Interestingly, cancer is more common overall in males than females by a factor of 1.33 (13). We calculate that the TREC measurements from males and females have a similar gender bias, with females having 1.46 ± 0.31 (mean ± SD) more TREC DNA overall (4). As well as overall TREC counts, there is a difference in the rate of decline, with male TRECs falling faster (4). Consistently, the incidence data shows that 70 out of the 87 cancers with gender separation (i.e., observed in both genders) rise more steeply in males (Methods). To illustrate this bias, we constructed universal scaling functions for each gender (Fig. 3E and Fig. S9D), each showing good data collapse. Interestingly, a similar gender bias is found in the average mutation burden in cancer biopsies, showing a steeper increase with age in males (14). WNV, the only infectious disease in our dataset with gender separation, also shows steeper increase in risk for males (value of P < 0.01).