In 2012, the then much ballyhoo-ed Australian temperature reconstruction of Gergis et al 2012 mysteriously disappeared from Journal of Climate after being criticized at Climate Audit. Now, more than four years later, a successor article has finally been published. Gergis says that the only problem with the original article was a “typo” in a single word. Rather than “taking the easy way out” and simply correcting the “typo”, Gergis instead embarked on a program that ultimately involved nine rounds of revision, 21 individual reviews, two editors and took longer than the American involvement in World War II. However, rather than Gergis et al 2016 being an improvement on or confirmation of Gergis et al 2012, it is one of the most extraordinary examples of data torture (Wagenmakers, 2011, 2012) that any of us will ever witness.

Also see Brandon S’s recent posts here here.

The re-appearance of Gergis’ Journal of Climate article was accompanied by an untrue account at Conversation of the withdrawal/retraction of the 2012 version. Gergis’ fantasies and misrepresentations drew fulsome praise from the academics and other commenters at Conversation. Gergis named me personally as having stated in 2012 that there were “fundamental issues” with the article, claims which she (falsely) said were “incorrect” and supposedly initiated a “concerted smear campaign aimed at discrediting [their] science”. Their subsequent difficulty in publishing the article, a process that took over four years, seems to me to be as eloquent a confirmation of my original diagnosis as one could expect.

I’ve drafted up lengthy notes on Gergis’ false statements about the incident, in particular, about false claims by Gergis and Karoly that the original authors had independently discovered the original error “two days” before it was diagnosed at Climate Audit. These claims were disproven several years ago by emails provided in response to an FOI request. Gergis characterized the FOI requests as “an attempt to intimidate scientists and derail our efforts to do our job”, but they arose only because of the implausible claims by Gergis and Karoly to priority over Climate Audit.

Although not made clear in Gergis et al 2016 (to say the least), its screened network turns out to be identical to the Australasian reconstructions in PAGES2K (Nature 2013), while the reconstructions are nearly identical. PAGES2K was published in April 2013 and one cannot help but wonder at why it took more than three years and nine rounds of revision to publish something so similar.

In addition, one of the expectations of the PAGES2K program was that it would identify and expand available proxy data covering the past two millennia. In this respect, Gergis and the AUS2K working group failed miserably. The lack of progress from the AUS2K working group is both astonishing and dismal, a failure unreported in Gergis et al 2016 which purported to “evaluate the Aus2k working group’s regional consolidation of Australasian temperature proxies”.

Detrended and Non-detrended Screening

The following discussion of data torture in Gergis et al 2016 draws on my previous and similar criticism of data torture in PAGES2K.

Responding to then recent scandals in social psychology, Wagenmakers (2011 pdf, 2012 pdf) connected the scandals to academics tuning their analysis to obtain a “desired result”, which he classified as a form of “data torture”:

we discuss an uncomfortable fact that threatens the core of psychology’s academic enterprise: almost without exception, psychologists do not commit themselves to a method of data analysis before they see the actual data. It then becomes tempting to fine tune the analysis to the data in order to obtain a desired result—a procedure that invalidates the interpretation of the common statistical tests. The extent of the fine tuning varies widely across experiments and experimenters but is almost impossible for reviewers and readers to gauge… Some researchers succumb to this temptation more easily than others, and from presented work it is often completely unclear to what degree the data were tortured to obtain the reported confession.

As I’ll show below, it is hard to contemplate a better example of data torture, as described by Wagenmakers, than Gergis et al 2016.

The controversy over Gergis et al, 2012 arose over ex post screening of data, a wildly popular technique among IPCC climate scientists, but one that I’ve strongly criticized over the years. Jeff Id and Lucia have also written lucidly on the topic (e.g. Lucia here and, in connection with Gergis et al, here). I had raised the issue in my first post on Gergis et al on May 31, 2012. Closely related statistical issues arise in other fields under different terminology e.g. sample selection bias, conditioning on post-treatment variable, endogenous selection bias. The potential bias of ex post screening seems absurdly trivial if one considers the example of a drug trial, but, for some reason, IPCC climate scientists continue to obtusely deny the bias. (As a caveat, objecting to the statistical bias of ex post screening does not entail that opposite results are themselves proven. I am making the narrow statistical point that biased methods should not be used.)

Despite the public obtuseness of climate scientists about the practice, shortly after my original criticism of Gergis et al 2012, Karoly privately recognized the bias associated with ex post screening as follows in an email to Neukom (June 7, 2012; FOI K,58):

If the selection is done on the proxies without detrending ie the full proxy records over the 20th century, then records with strong trends will be selected and that will effectively force a hockey stick result. Then Stephen Mcintyre criticism is valid. I think that it is really important to use detrended proxy data for the selection, and then choose proxies that exceed a threshold for correlations over the calibration period for either interannual variability or decadal variability for detrended data…The

criticism that the selection process forces a hockey stick result will be valid if the trend is not excluded in the proxy selection step.

Gergis et al 2012 had purported to avoid this bias by screening on detrended data, even advertising this technique as a method of “avoid[ing] inflating the correlation coefficient”:

For predictor selection, both proxy climate and instrumental data were linearly detrended over the 1921-1990 period to avoid inflating the correlation coefficient due to the presence of the global warming signal present in the observed temperature record. Only records that were significantly (p<0.05) correlated with the detrended instrumental target over the 1921-1990 period were selected for analysis. This process identified 27 temperature-sensitive predictors for the SONDJF warm season.

As is now well known, they didn’t actually perform the claimed calculation. Instead, they calculated correlation coefficients on undetrended data. This error was first reported by CA commenter Jean S on June 5, 2012 (here). Two hours later (nearly 2 a.m. Swiss time), Gergis coauthor Raphi Neukom notified Gergis and Karoly of the error (FOI 2G, page 77). Although Karoly later (falsely) claimed that his coauthors were unaware of the Climate Audit thread, emails obtained through FOI show that Gergis had sent an email to her coauthors (FOI 2G, page 17) drawing attention to the CA thread, that Karoly himself had written to Myles Allen (FOI 2K, page 11)about comments attributed to him on the thread (linking to the thread) and that Climate Audit and/or myself are mentioned in multiple other contemporary emails (FOI 2G).

When correlation coefficients were re-calculated according to the stated method, only a handful actually passed screening, a point reported at Climate Audit by Jean S on June 5 and written up by me as a head post on June 6. According to my calculations, only six of the 27 proxies in the G12 network passed detrended screening. On June 8 (FOI 2G, page 112), Neukom reported to Karoly and Gergis that eight proxies passed detrended screening (with the difference between his results and mine perhaps due to drawing from the prescreened network or to difference in algorithm) and sent them a figure (not presently available) comparing the reported reconstruction with the reconstruction using the stated method:

Dashed reconstruction below is using only the 8 proxies that pass detrended screening. solid is our original one.

This figure was unfortunately not included in the FOI response. It would be extremely interesting to see.

As more people online began to be aware of the error, senior author Karoly decided that they needed to notify Journal of Climate. Gergis notified the journal of a “data processing error” on June 8 and their editor, John Chiang, immediately rescinded acceptance of the paper the following day as follows, stating his understanding that they would redo the analysis to conform with their described methodology:

After consulting with the Chief Editor regarding your situation, my decision is to rescind the acceptance of your manuscript for publication. My understanding is that you will be redoing your analysis to conform to your original description of the predictor selection, in which case you may arrive at a different conclusion from your original manuscript. Given this, I request that you withdraw the manuscript from consideration.

Contrary to her recent story at Conversation, Gergis tried to avoid redoing the analysis, instead she tried to persuade the editor that the error was purely semantic (“error in words”), rather than a programming error, invoking support for undetrended screening from Michael Mann, who was egging Gergis on behind the scenes:

Just to clarify, there was an error in the words describing the proxy selection method and not flaws in the entire analysis as suggested by amateur climate skeptic bloggers…People have argued that detrending proxy records when reconstructing temperature is in fact undesirable (see two papers attached provided courtesy of Professor Michael Mann) .

The Journal of Climate editors were unpersuaded and pointedly asked Gergis to explain the difference between the first email in which the error was described as a programming error and the second email describing the error as semantic:

Your latest email to John characterizes the error in your manuscript as one of wording. But this differs from the characterization you made in the email you sent reporting the error. In that email (dated June 7) you described it as “an unfortunate data processing error,” suggesting that you had intended to detrend the data. That would mean that the issue was not with the wording but rather with the execution of the intended methodology. would you please explain why your two emails give different impressions of the nature of the error?

Gergis tried to deflect the question. She continued to try to persuade the Journal of Climate to acquiesce in her changing the description of the methodology, as opposed to redoing the analysis with the described methodology, offering only to describe the differences in a short note in the Supplementary Information:

The message sent on 8 June was a quick response when we realised there was an inconsistency between the proxy selection method described in the paper and actually used. The email was sent in haste as we wanted to alert you to the issue immediately given the paper was being prepared for typesetting. Now that we have had more time to extensively liaise with colleagues and review the existing research literature on the topic , there are reasons why detrending prior to proxy selection may not be appropriate. The differences between the two methods will be described in the supplementary material, as outlined in my email dated 14 June. As such, the changes in the manuscript are likely to be small, with details of the alternative proxy selection method outlined in the supplementary material .

The Journal of Climate editor resisted, but reluctantly gave Gergis a short window of time (to end July 2012) to revise the article, but required that she directly address the sensitivity of the reconstruction to proxy selection method and “demonstrate the robustness” of her conclusions:

In the revision, I strongly recommend that the issue regarding the sensitivity of the climate reconstruction to the choice of proxy selection method (detrend or no detrend) be addressed. My understanding that this is what you plan to do, and this is a good opportunity to demonstrate the robustness of your conclusions.

Chiang’s offer was very generous under the circumstances. Gergis grasped at this opportunity and promised to revert by July 27 with a revised article showing the influence of this decision on resultant reconstructions:

Our team would be very pleased to submit a revised manuscript on or before the 27 July 2012 for reconsideration by the reviewers . As you have recommended below, we will extensively address proxy selection based on detrended and non detrended data and the influence on the resultant reconstructions.

Remarkably, this topic is touched on only in passing in Gergis et al 2016 and the only relevant diagram conceals, rather than addresses, its effect.

Gergis’ Trick to Hide the Discrepancy

We know that Neukom had sent Gergis a comparison of the “original” reconstruction to the reconstruction using the stated method as early as June 8, 2012. It would have been relatively easy to add such a figure to Gergis et al 2012 and include a discussion, if the comparison “demonstrate[d] the robustness of your conclusions”. This obviously didn’t happen and one has to ask why not.

Nor is the issue prominent in Gergis et al 2016. The only relevant figure is Figure S1.3 in the Supplementary Information. Gergis et al asserted that this figure suggested that “decadal-scale temperature variability is not highly sensitive to the predictor screening methods”. (In the following text, “option #4”, with “nine predictors”, is a variation of the network calculated using the stated G12 methodology.)

Figures S1.3-S1.5 compare the R28 reconstruction for just the PCR method presented in the main text, with the results based on the range of alternative proxy screening methods. They show that the variations reconstructed for each option are similar. The results always lie within the 2 SE uncertainty range of our final reconstruction (option #1), except for a few years for option #4 (Figure S1.3), which only uses nine predictors. This suggests that decadal-scale temperature variability is not highly sensitive to the predictor screening methods.

If, as Gergis et al say here, their results were “not highly sensitive” to predictor screening method – even the difference between detrended and non-detrended screening, then Gergis’ failure to comply with editor Chiang’s offer by July 31, 2012 is all the more surprising.

However, there’s a trick in Gergis’ Figure S1.3. On the left is Gergis’ original Figure S1.3. It gives a strong rhetorical impression of coherence between the four illustrated reconstructions. (“AR1 detrending fieldmean” corresponds to the reconstruction using the stated method of Gergis et al 2012). On the right is a blowup showing that one of the four reconstructions (“AR1 detrending fieldmean”) has been truncated prior to AD1600 when it is well outside the supposed confidence interval.

CA readers are familiar with this sort of truncation in connection with the trick to “hide the decline” in the IPCC AR4 chapter edited by Mann. One can only presume that earlier values were also outside the confidence interval on the high side and that Gergis truncated the series at AD1600 in order to “hide” the discrepancy.

Although I haven’t seen the the “dashed” reconstruction in Neukom’s email of June 8, I can only assume that it also diverged upward before AD1600 and that Gergis et al had been unable to resolve within editor Chiang’s deadline of July 2012.

Torturing and Waterboarding the Data

In the second half of 2012, Gergis and coauthors embarked on a remarkable program of data torture in order to salvage a network of approximately 27 proxies, while still supposedly using “detrended” screening. Their eventual technique for ex post screening bore no resemblance to the simplistic screening of (say) Mann and Jones, 2003.

One of their key data torture techniques was to compare proxy data correlations not simply to temperatures in the same year, but to temperatures of the preceding year and following year.

To account for proxies with seasonal definitions other than the target SONDJF season (e. g., calendar year averages), the comparisons were performed using lags of -1, 0, and +1 years for each proxy (Appendix A).

This mainly impacted tree ring proxies. In their practice, a lag of -1 year meant that a tree ring series is assigned one year earlier than the chronology (+1 is assigned one year later.) For a series with a lag of -1 year (e.g. Celery Top East), ring width in the summer of (say) 1989-90 is said to correlate with summer temperatures of the previous year. There is precedent for correlation to previous year temperatures in specialist studies. For example, Brookhouse et al (2008) (abstract here) says that the Baw Baw tree ring data (a Gergis proxy), correlates positively with spring temperatures from the preceding year. In this case, however, Gergis assigned zero lag to this series, as well as a negative orientation.

The lag of +1 years assigned to 5 sites is very hard to interpret in physical terms. Such a lag requires that (for example) Mangawhera ring widths assigned to the summer of 1989-1990 correlate to temperatures of the following summer (1990-1991) – ring widths in effect acting as a predictor of next year’s temperature. Gergis’ supposed justification in the text was nothing more than armwaving, but the referees do not seem to have cared.

Of the 19 tree ring series in the 51-series G16 network, an (unphysical) +1 lag was assigned to five series, a -1 lag to two series and a 0 lag to seven series, with five series being screened out. Of the seven series with 0 lag, two had inverse orientation in the PAGES2K. In detail, there is little consistency for trees and sites of the same species. For example, New Zealand LIBI composite-1 had a +1 lag, while New Zealand LIBI composite-2 had 0 lag. Another LIBI series (Urewara) is assigned an inverse orientation in the (identical) PAGES2K and thus presumably in the CPS version of G16. Two LIBI series (Takapari and Flanagan’s Hut) are screened out in G16, though Takapari was included in G12. Because the assignment of lags is nothing more than an ad hoc after-the-fact attempt to rescue the network, it is impossible to assign meaning to the results.

In addition, Gergis also borrowed from and expanded a data torture technique pioneered in Mann et al 2008. Mann et al 2008 had been dissatisfied with the number of proxies passing a screening test based on correlation to local gridcell, a commonly used criterion (e.g. Mann and Jones 2003). So Mann instead compared results to the two “nearest” gridcells, picking the highest of the two correlations but without modifying the significance test to reflect the “pick two” procedure. (See here for a contemporary discussion.) Instead of comparing only to the two nearest gridcells, Gergis expanded the comparison to all gridcells “within 500 km of the proxy’s location”, a technique which permitted comparisons to 2-6 gridcells depending both on the latitude and the closeness of the proxy to the edge of its gridcell:

As detailed in appendix A, only records that were significantly (p < 0.05) correlated with temperature variations in at least one grid cell within 500 km of the proxy’s location over the 1931-90 period were selected for further analysis.

As described in the article, both factors were crossed in the G16 comparisons. Multiplying three lags by 2-6 gridcells, Gergis appears to have made 6-18 detrended comparisons, retaining those proxies for which there was a “statistically significant” correlation. It doesn’t appear that any allowance was made in the benchmark for the multiplicity of tests. In any event, using this “detrended” comparison, they managed to arrive at a network of 28 proxies, one more than the network of Gergis et al 2012. Most of the longer proxies are the same in both networks, with a shuffling of about seven shorter proxies. No ice core data is included in the revised network and only one short speleothem. It consists almost entirely of tree ring and coral data.

Obviously, Gergis et al’s original data analysis plan did not include a baroque screening procedure. It is evident that they concocted this bizarre screening procedure in order to populate the screened population with a similar number of proxies to Gergis et al 2012 (28 versus 27) and to obtain a reconstruction that looked like the original reconstruction, rather than the divergent version that they did not report. Who knows how many permutations and combinations and iterations were tested, before eventually settling on the final screening technique.

It is impossible to contemplate a clearer example of “data torture” (even Mann et al 2008).

Nor does this fully exhaust the elements of data torture in the study, as torture techniques previously in Gergis et al 2012 were carried forward to Gergis et al 2016. Using original and (still) mostly unarchived measurement data, Gergis et al 2012 had re-calculated all tree ring chronologies, except two, using an opaque method developed by the University of East Anglia. The two exceptions were the two long tree ring chronologies reaching back to the medieval period:

All tree ring chronologies were developed based on raw measurements using the signal-free detrending method (Melvin et al., 2007; Melvin and Briffa, 2008) …The only exceptions to this signal-free tree ring detrending method was the New Zealand Silver Pine tree ring composite (Oroko Swamp and Ahaura), which contains logging disturbance after 1957 (D’Arrigo et al., 1998; Cook et al., 2002a; Cook et al., 2006) and the Mount Read Huon Pine chronology from Tasmania which is a complex assemblage of material derived from living trees and sub-fossil material. For consistency with published results, we use the final temperature reconstructions provided by the original authors that includes disturbance-corrected data for the Silver Pine record and Regional Curve Standardisation for the complex age structure of the wood used to develop the Mount Read temperature reconstruction (E. Cook, personal communication, Cook et al., 2006).

This raises the obvious question why “consistency with published results” is an overriding concern for Mt Read and Oroko, but not for the other series, which also have published results. For example, Allen et al (2001), the reference for Celery Top East, shows the chronology at left for Blue Tier, while Gergis et al 2016 used the chronology at right for a combination of Blue Tier and a nearby site. Using East Anglia techniques, the chronology showed a sharp increase in the 20th century and “consistency” with the results shown in Allen et al (2001) was not a concern of the authors. One presumes that Gergis et al had done similar calculations for Mount Read and Oroko, but had decided not to use them. One can hardly avoid wondering whether the discarded calculations didn’t emphasize the desired story.

Nor is this the only ad hoc selection involving these two important proxies. Gergis et al said that their proxy inventory was a 62-series subset taken from the inventory of Neukom and Gergis, 2011. (I have been unable to exactly reconcile this number and no list of 62 series is given in Gergis et al 2016.) They then excluded records that “were still in development at the time of the analysis” (though elsewhere they say that the dataset was frozen as of July 2011 due to the “complexity of the extensive multivariate analysis”) or “with an issue identified in the literature or through personal communication”:

Of the resulting 62 records we also exclude records that were still in development at the time of the analysis .. and records with an issue identified in the literature or through personal communication

However, this criterion was applied inconsistently. Gergis et al acknowledge that the Oroko site was impacted by “logging disturbance after 1957” – a clear example of an “issue identified in the literature” but used the data nonetheless. In some popular Oroko versions (see CA discussion here), proxy data after 1957 was even replaced by instrumental data. Gergis et al 2016 added a discussion of this problem, arm-waving that the splicing of instrumental data into the proxy record didn’t matter:

Note that the instrumental data used to replace the disturbance-affected period from 1957 in the silver pine [Oroko] tree-ring record may have influenced proxy screening and calibration procedures for this record. However, given that our reconstructions show skill in the early verification interval, which is outside the disturbed period, and our uncertainty estimates include proxy resampling (detailed below), we argue that this irregularity in the silver pine record does not bias our conclusions.

There’s a sort of blind man’s buff in Gergis’ analysis here, since it looks to me like G16 may have used an Oroko version which did not splice instrumental data. However, because no measurement data has ever been archived for Oroko and a key version only became available through inclusion in a Climategate email, it’s hard to sort out such details.

PAGES2K

The precise timing of Gergis’ data torture can be constrained by the publication of the PAGES2K compilation of regional chronologies used in IPCC AR5. The IPCC First Order Draft had included a prominent graphic with seven regional reconstructions, one of which was the Australian reconstruction of Gergis et al, 2012 (cited as under review). The AR5 Second Order Draft, published in July 2012 after the withdrawal of Gergis et al 2012, included a more or less identical reconstruction, this time cited to PAGES2K, under review.

The PAGES2K compilation had been submitted to Science in July 2012, barely meeting the deadline. Remarkably, it was rejected. Mann, one of the reviewers, argued that it was impossible to review so many novel regional reconstructions and that they should be individually reviewed in specialist journals before attempting a compilation. This left IPCC in rather a jam. However, Nature stepped in and agreed to publish the rejected article. Keith Briffa, one of the Nature reviewers, “solved” the problem of trying to review so many novel reconstructions by suggesting that the article be published as a “Progress Article”, a type of article which had negligible peer review requirements. Everyone readily agreed to this diplomatic solution and thus the sausage was made (also see discussion by Hilary Ostrov here).

The Gergis contribution to PAGES2K screened the AUS2K proxy network down to 28 proxies – exactly the same selection as Gergis et al 2016, published three years later. The PAGES2K Paico reconstruction is identical to the G16 Paico reconstruction up to a slight rescaling: the correlation between the two versions is exactly 1. Their “main” reconstruction used principal components regression – a technique harking back to Mann et al 1998, which is commonly defended on the grounds that later article use different techniques. The G16 version is nearly identical to the PAGES2K version, as shown below.

The PAGES2K article was mentioned on a variety of occasions in Gergis et al 2016, but I’m not sure how a reader of G16 could become aware of the identity of the networks and reconstructions.

Given that the PAGES2K network was accepted with no more than cursory peer review, it’s interesting that it took nine rounds of revision for the Journal of Climate to accept Gergis et al 2016 with its identical network and virtually identical reconstruction.

The Dismal Lack of Progress by the AUS2K Working Group

Despite the long-standing desire for more “long” SH proxies, the AUS2K working group provided Gergis with only three records (Law Dome d18O, Mt Read Tasmania tree rings, Oroko NZ tree rings) in the target geographical area which began prior to AD1100, with the Law Dome series being screened out. None of these are new records.

Closely related versions of all three series were used in Mann and Jones (2003), which also selected series by screening against gridcell temperatures, but with different results. Mann and Jones screened according to “decadal correlations”, resulting in selection of Tasmania (r=0.79) and Law Dome (r=0.76) and exclusion of Oroko (r=-0.25) – a different screening result than Gergis et al.

All three series have been discussed at Climate Audit from time to time over the years: tags tasmania oroko lawdome. Two of the three series (Mt Read, Oroko) were illustrated in AR4 (which didn’t show Oroko values after 1957), but AR4 lead authors snickered at my request that they also show Law Dome (see here.) The authors realized that the Law Dome series had very elevated values in the late first millennium (see figure below from Jones and Mann, 2004) and there was no way that there were going to show a series which “diluted the message”. Compare the two series used in Mann and Jones 2003 in the first figure below with the two series shown in AR4 in the second figure below.

Figure ^. Excerpt from Mann and Jones 2003, showing Law Dome and Mount Read series.

Figure ^. Excerpt from IPCC AR4, showing Oroko and Mount Read series.

Thus, despite any aspirations for AR5, Gergis et al 2016 contained no long series which had not been used in Mann and Jones 2003.

It is also obvious that long results from combining Law Dome and Mt Read will have a considerably different appearance than long results from combining Mt Read and Oroko. Although Gergis et al claimed that screening had negligible impact on results, Law Dome was excluded from all such studies.

Nor did Gergis et al actually use the Tasmania “Regional Curve Standardisation” series, as claimed. Cook archived two versions of his Tasmania chronology in 1998, one of which (“original”) was the RCS chronology, while the other (“arfilt”) was a filtered version of the RCS chronology. Gergis used the “arfilt” rather than “original” version – perhaps inheriting this version from Mann et al 2008, which also used the arfilt version. Cook’s original article (Cook et al 2000) also contained an interesting figure showing mean ring widths in Tasmania prior to adjustment (for juvenile growth). This data is plotted below (showing Cook’s figure as an insert). It shows a noticeable increase in 20th century ring widths, which, however, are merely returning to levels achieved earlier in the millennium and surpassed in the first millennium. High late first millennium values are also present in the Law Dome data.

Many of the Gergis series are very short – with coral series nearly all starting in the 18th and even late 19th centuries. To the extent that the Gergis reconstruction shows a 20th century hockey stick, it’s not because this is a feature that is characteristic of the long data, but through the splicing of short strongly trending coral data with the longer tree ring data. The visual result will depend on how the coral data is scaled relative to the tree ring data.

While Gergis and coauthors made no useful contribution to understanding past climate change in the Australasian region, in the interest of sounding a more positive note, large and interesting speleothem datasets have been recently published, though not considered by Gergis et al, including very long d18O series from Borneo and Indonesia (Liang Luar), both located in the extended G16 region. I find the speleothem data particularly interesting since some series provide data on both a Milankowitch scale and through the 20th century. For example, the Borneo series (developed by Jud Partin, Kim Cobb and associates) has very pronounced Milankowitch variability and comes right to the present. In ice age extremes, d18O values are less depleted (more depleted in warm periods.) Modern values do not appear exceptional. Results at Liang Luar are similar.

Conclusions

Gergis has received much credulous praise from academics at Conversation, but none of them appear to have taken the trouble to actually evaluate the article before praising it. Rather than the 2016 version being a confirmation of or improvement on the 2012 article, it constitutes as clear an example of data torture as one could ever wish. We know Gergis’ ex ante data analysis plan, because it was clearly stated in Gergis et al 2012. Unfortunately, they made a mistake in their computer script and were unable to replicate their results using the screening methodology described in Gergis et al 2012.

In order to get a reasonably populated network and a reconstruction resembling the Gergis et al 2012 reconstruction, Gergis and coauthors concocted a baroque and ad hoc screening system, requiring a complicated and implausible combination of lags and adjacent gridcells. A more convincing example of “fine tun[ing] the analysis to the data in order to obtain a desired result” (data torture) is impossible to imagine. None of the supposed statistical tests have any significance under the weight of such extreme data torture.

Because IPCC AR5 had used results of Gergis et al 2012 in a prominent diagram that it was committed to using, and continued to use the results even after Journal of Climate rescinded acceptance of Gergis et al 2012 (see here), Gergis et al had considerable motivation, to say the least, to “obtain” a result that looked as much like Gergis et al 2012 as possible. The degree to which they subsequently tortured the data is somewhat breathtaking.

One wonders whether the editors and reviewers of Journal of Climate fully understood the extreme data torture that they were asked to approve. Clearly, there seems to have been some resistance from editors and reviewers – otherwise there would not have been nine rounds of revision and 21 reviews. Since the various rounds of review left the network unchanged even one iota from the network used in the PAGES2K reconstruction (April 2013), one can only assume that Gergis et al eventually wore out a reluctant Journal of Climate, who, after four years of submission and re-submission, finally acquiesced.

As noted above, Wagenmakers defined data torture as succumbing to the temptation to “fine tune the analysis to the data in order to obtain a desired result” and diagnosed the phenomenon as being particularly likely when the authors had not “commit themselves to a method of data analysis before they see the actual data”. In this case, Gergis et al had, ironically, committed themselves to a method of data analysis not just privately, but in the text of an accepted article, but they obviously didn’t like the results.

One can understand why Gergis felt relief at finally getting approval for such a tortured manuscript, but, at the same time, the problems were entirely of her own making. Gergis took particular umbrage at my original claim that there were “fundamental issues” with Gergis et al 2012, a claim that she called “incorrect”. But there is nothing “incorrect” about the actual criticism:

One of the underlying mysteries of Gergis-style analysis is one seemingly equivalent proxies can be “significant” while another isn’t. Unfortunately, these fundamental issues are never addressed in the “peer reviewed literature”.

This comment remains as valid today as it was in 2012.

In her Conversation article, Gergis claimed that her “team” discovered the errors in Gergis et al 2012 independently of and “two days” before the errors were reported at Climate Audit. These claims are untrue. They did not discover the errors “independently” of Climate Audit or before Climate Audit. I will review their appropriation of credit in a separate post.



