Assembling the CO 2 compilation

The new CO 2 compilation here consists of 1,241 independent estimates coming from five proxy methods and 112 published studies. In putting this compilation together, we applied a uniform set of criteria described next to screen potential CO 2 records. By following these criteria, the CO 2 values associated with some records are revised, while other records are excluded all together. This standardization process helps ensure a high-quality compilation whose individual records can be more cleanly compared to one another. Of course, some of our criteria may be judged at some point to be incorrect. It is in this light, and in the spirit of full transparency, that we describe our criteria. We consider our compilation to be a ‘living document’, not only because in the future new records will be added but because existing records will be interpreted in new ways. The compilation can be found in Supplementary Data 1.

First, we excluded all goethite-based CO 2 estimates45,46,47,48 due to uncertainties in modelling some of the isotopic fractionation factors49. We also excluded all B/Ca-based estimates50 because the environmental controls on B/Ca are currently not well understood51. The nahcolite proxy is based on well-described mineral phase equilibria52, but we exclude the published estimates because it is not possible to assign mean values (all values within a defined CO 2 interval are equally probable). It is important to note, however, that the range described by these nahcolite estimates agree well with the other data in our compilation. We do not include the boron-based estimates of Pearson and Palmer53 owing to problems related to potential diagenesis, potential analytical issues, vital effects of extinct species and the evolution of seawater δ11B and alkalinity (as discussed in ref. 19). For the alkenone-based records of Pagani and colleagues54,55,56,57,58, we adopt the compilation of ref. 58 who imposed a uniform set of quality controls on the original data sets and used a TEX 86 temperature record (versus the δ18O of ref. 54). We update all Cenozoic liverwort-based estimates59 with the atmospheric δ13C model of ref. 60.

For estimates calculated from stomatal ratios (SR, where SR=ratio of stomatal frequency in nearest living equivalent to fossil), two transfer functions to CO 2 have been proposed: 1 SR=1 RCO 2 and 1 SR=2 RCO 2 , where RCO 2 =ratio of atmospheric CO 2 in the past to pre-industrial conditions (300 p.p.m.; refs 61, 62). Following Beerling and Royer63, we use these two functions to establish the uncertainty bounds in estimated CO 2 . Owing to the ad hoc nature of these functions, CO 2 estimates from stomatal ratios should be considered semiquantitative only. We modify many of the stomatal-based estimates of Retallack64. First, we only use estimates associated with five or more cuticle fragments, as this is the minimum sampling required in most cases for robust estimates65. We use the quantitative transfer function of Royer65 for estimates coming from Ginkgo adiantoides, the fossil species considered conspecific with extant G. biloba66. For estimates coming from other species within the Ginkgo genus we use the stomatal ratio approach, and we exclude estimates not coming from Ginkgo as these other groups are too distantly related to Ginkgo and have not been calibrated correctly67.

A recent development that we have incorporated is the lowering of CO 2 estimates from the pedogenic carbonate method29. This is because one of the key input parameters, the concentration of biologically derived CO 2 in the soil (S(z)), had been overestimated in most paleosols by a factor of two or more29,68. Here we adjust most CO 2 estimates assuming a S(z) of 2,000 p.p.m. (ref. 68). This is a simple correction that masks variability in S(z)69, but it is nonetheless a useful place to start and in most cases represents an improvement over previous estimates70. This is an area where we anticipate substantial revision in the near future. Along these lines, several proxies for S(z) have been proposed69,70,71,72,73,74,75,76, and we use estimates of S(z) derived from these methods when available. We also replace some of the pedogenic carbonate-based estimates of ref. 73 with those based on the same sediments74,75 because these newer studies provide better constraints on organic matter δ13C. Lastly, we collapse the high-resolution stomatal-based record of ref. 76 into one estimate calculated from the mean of the individual estimates.

Uncertainties in paleo-CO 2 estimates have been assessed in different ways. Most earlier uncertainties associated with the pedogenic carbonate method, for example, only incorporate uncertainty in S(z); potential uncertainties in other input variables are ignored. In addition, errors associated with stomatal-based methods that involve quantitative regression equations usually only reflect uncertainty in the regressions; uncertainties in the fossil measurements are ignored. This means that error bars can convey different meanings, which hinders comparisons across methods. In recent years, Monte Carlo simulations for propagating uncertainty in multiple input variables have been developed for all five proxy systems11,55,59,77. So long as the reported percentile ranges are identical, these simulations facilitate better cross-method comparisons.

We attempt here to standardize reported uncertainties so that they incorporate uncertainty in all input variables; we also standardize the percentile levels (16 and 84; equivalent to ±1 s.d. for a normally distributed population). For uncertainties derived from Monte Carlo simulations that propagate uncertainties in most or all input variables (this covers most estimates from the alkenone, boron and liverwort proxies, as well as most recent estimates from pedogenic carbonates and stomata), we adjust if needed to the 16 and 84 percentile levels: for example, if the 2.5 and 97.5 percentiles are reported, we assume the population has a normal distribution and cut the uncertainties in half. Because most paleo-CO 2 estimates have a right-skewed distribution, this means that we are overestimating the magnitude of the upper uncertainty and underestimating the magnitude of the lower uncertainty. For uncertainties with the pedogenic carbonate method derived by varying only S(z), we adopt for the 16 and 84 percentiles the generic recommendation of +100%/−50% of the median value calculated from multifactorial Monte Carlo simulations78; these uncertainties are larger and probably more representative than the reported uncertainties. Similarly, for stomatal estimates whose uncertainties only take into account variance in the transfer function, we assume 16 and 84 percentiles equal to +100%/−40% of the median value, a range that takes into account representative uncertainty in both the transfer function and the fossil measurements77.

All ages have been updated to the 2012 Geologic Time Scale79. Many CO 2 estimates from terrestrial proxies have biostratigraphic (for example, ‘Maastrichtian’), magnetostratigraphic or chronostratigraphic constraints. For records with biostratigraphic or magnetostratigraphic constraints, we update their ages following the new Time Scale. Many CO 2 estimates have no reported age uncertainties. In almost all cases, these estimates have excellent age control, the most common example being estimates from marine sediment cores (for example, all boron- and alkenone-based estimates). Some terrestrial-based data fall under this category too, for example, the Triassic–Jurassic sediments in the Newark Supergroup80. When no age uncertainty is reported we take the 1 s.d. to be 0.001 Myr for marine records and 4% for terrestrial records.

Monte Carlo sampling and LOESS fitting

Monte Carlo approach was used to generate 1,000 realizations of the multiproxy CO 2 time series with each data point randomly perturbed within its age and CO 2 uncertainty as given in Supplementary Data 1. For CO 2 estimates with asymmetric uncertainties, we force symmetry by assuming ±1 s.d. is equal to the average offset between the median and the 16 and 84 percentiles. Each realization of the time series was then interpolated to a regular 0.5 Myr spacing and LOESS curves were fitted to each interpolated realization using the programme R81. The optimal degree of smoothing was determined for each interpolated realization using generalized cross-validation30. At each 0.5 Myr time step the distribution of LOESS curves was evaluated and the probability distribution was determined. From this probability distribution the most likely value (or probability maximum) and the upper and lower limits corresponding to 68 and 95% confidence limits were identified. These variables are given in Supplementary Data 2.

Owing to the nature of the proxies used the make-up and coverage of our new CO 2 compilation varies considerably as a function of time. By grouping the data into 2.5 million year bins, Supplementary Fig. 1 shows that the number of observations in each 2.5 million year bin generally increases towards the present, with a notable exception of a peak at ∼200 million years ago at the Triassic–Jurassic boundary. Supplementary Fig. 1 also shows that the boron isotope and alkenone δ13C methods are restricted to the Cenozoic and liverwort δ13C proxy are concentrated in the middle part of the last 420 million years. Estimates based on fossil plant stomata appear focused in the last 200 million years, with only sporadic coverage over the early part of the interval. Only CO 2 estimates from pedogenic carbonate δ13C method are relatively evenly spread across the last 420 million years. To examine the influence of this uneven distribution of proxy types we performed the LOESS fitting a further two times but without one of the dominant proxy systems (no-pedogenic carbonate, no-stomata), these alternatives are shown with the complete data set in Supplementary Fig. 3. While subtle variations exist, overall the large-scale structure remains intact, although if the pedogenic carbonate estimates are discarded the early part of the Phanerozoic is largely under sampled. In terms of long-term CO 2 change, however, discarding all the estimates from either the pedogenic carbonate method or the stomatal method does not change our principal conclusions. For instance, the long-term CO 2 decline over the last 420 million years without the pedogenic carbonates is 2 p.p.m. per million years, and without the estimates from the plant stomata method it is 4 p.p.m. per million years. Both of which are similar to the CO 2 decline we determine for the complete data set (3.4 p.p.m. per million years), indicating that a long-term decline in CO 2 over the last 420 million years is a robust observation. In terms of ΔF CO2,sol the long-term trends are also similar being 0.008±0.001 Wm−2 (R2=0.03 and P<0.001) and 0.0001±0.001 Wm−2 (R2=<0.001, P=0.94) for no-stomata and no-pedogenic carbonate, respectively.

Data availability

The authors declare that all data supporting the findings of this study are available within the Supplementary Information files associated with this manuscript.