Anthony Watts of WattsUpWithThat.com and SurfaceStations.org published a 30 page white paper in 2009 with the help of the Heartland Institute titled “Is the U.S. Surface Temperature Record Reliable?” His conclusion was that the temperature record was not reliable due to problems with where thermometers are located.

If Watts were correct, this would be a major problem. If the entire US temperature record was unreliable, then conclusions drawn from the temperature record could also be similarly flawed. At a minimum, the scientific papers using the temperature record would have to be revisited. So a thorough investigation of Watts’ conclusion by scientists was warranted. And now a new peer-reviewed paper by scientists at the National Climate Data Center (NCDC) have analyzed the temperature record and found that Watts’ conclusion of a flawed temperature record runs contrary to the actual data.

Let’s start by looking a little closer at how Watts’ reached his conclusion that the US temperature data was unreliable.

According to Watt’s white paper, 89% of the surveyed temperature stations in the United States Historical Climatology Network (USHCN) do not meet new NOAA standards for proximity to heat sources, location away from shade and the crest of hills, and so on. Watts chose a station in Bainbridge, Georgia, as his main example (pictured at right). It shows that the thermometer is located about 9 feet from an air conditioning unit and in the shade rather than the desired 100 meters from any heat sources. Furthermore, the original thermometer enclosure can be seen just above “14.3′” distance indicator in a much better, but still not ideal location. Given the photographic evidence, it’s impossible to claim that the new thermometer location is ideal. As Watts points out, “the new station may report higher temperatures than the old station even if ambient temperatures remain unchanged.”

But this statement presents a problem. Watts says that the new station “may” report higher temperatures. But do we know for certain that it will? Determining what effect the AC unit and shade tree have on the temperature measurement requires an actual analysis of the temperature data from the new thermometer and location. Watts’ white paper has no such analysis. In fact, in the entire paper, Watts presents a brief analysis of only a single station’s temperature record, and it’s not this station. One station out of a total 865 stations that had been surveyed at the time of the white paper’s publication, and out of a total of 1221 USHCN stations in the continental United States, is not enough to cast doubt on the entire network no matter how bad the analysis turned out.

Watts uses words like “may” and “likely” and “could have” throughout his white paper. In fact, just about the only firm conclusion that Watts reaches is that the temperature record is unreliable. But he’s based that conclusion entirely on qualitative information known as “metadata” (information that may or may not affect the accuracy of a measurement) rather than on quantitative (mathematical) data analysis. With respect to thermometer measurements, the proximity of the thermometer to a heat source like an AC unit or an electrical transformer is metadata. So is the type of thermometer used. And the time of day that the temperature measurement was taken. And the color and composition of the thermometer enclosure. And whether or not the thermometer moved from one place to another. And so on.

The problem is that metadata is a tool to determine if there might be a problem in the real data, but it takes actual data analysis to establish if there’s a problem. And analyzing a single station (Watts used Lampasas, Texas) isn’t enough to draw any statistically valid conclusions, such as reliability or unreliablility, about any other station or about the temperature monitoring network as a whole.

Watts makes a number of other mistakes in his white paper as well. One of the larger errors is that he claims, based exclusively on qualitative metadata, that “89% of the stations surveyed produce unreliable data by NOAA’s own definition (emphasis original).” It’s not possible to make that claim without a detailed mathematical analysis of the temperature record for the supposedly unreliable stations, and Watts shows no such analysis. Watts also claims that “the reported increase in temperature during the twentieth century falls well within the margin of error of the instrument record,” but doesn’t take into account the simple techniques that can be utilized to reduce error in a measurement – techniques like averaging multiple samples, correcting for known biases in equipment, filtering, homogenization of station errors, and so on.

Watts does, however, make a couple of good recommendations in his white paper. One of them is that “a pristine dataset should be produced from the best stations and then compared to the remainder of the USHCN network to quantify the total magnitude of bias.” While this is something that Watts himself probably should have done before making a blanket declaration that the US temperature record was bad, it’s still necessary to quantitatively assess the impacts of all the metadata on real temperature measurements. And that analysis is what the NCDC team undertook in their new paper titled “On the reliability of the U.S. Surface Temperate Record”.

What the NCDC scientists found was that, contrary to Watts’ claim of unreliability, the difference between good and poor sited thermometers was small and thus the US temperature record is reliable.

The NCDC scientists reached this conclusion by looking at thermometer stations scattered around the continental US that were in the surfacestations.org database and broke them up into two groups, one each for good and poor thermometer siting. Then the scientists calculated the monthly temperatures at each station and compared the results of the good stations to the poor sited stations, both before and after adjusting for discontinuities (aka “homogenization”) in the records. When they did this, they discovered that, contrary to what Watts expected, the unadjusted data showed that poor sites showed cooler maximum temperatures and only slightly warmer minimum temperatures, while the adjusted data showed almost no difference whatsoever. This is shown in the image below.

Furthermore, when the scientists continued their analysis, they found that the vast majority of the difference in the unadjusted temperature came not as a result of the location of the thermometer as Watts had claimed, but rather from changes in the technology used to measure temperature (liquid-in-glass thermometers vs. electronic) and from a widespread change from taking measurements in the afternoon to taking them in the morning. In fact, these two changes represented 90% of the adjustment required for good sited temperature stations and 72% of the adjustment needed for poorly sited stations.

The fact that the two transitions mentioned above represents so much of the overall adjustment disproves another claim of Watts’, namely that the homogenization process itself transferred hot temperatures from poorly sited stations to good stations. Had Watts’ claim been correct, then the time of day adjustment would account for a much smaller percentage of the total adjustment. In fact, the data shows that time of day adjustments account for less of the adjustments made to poorly sited stations (72% vs. 90% for good sited stations), suggesting that the good stations are actually correcting the poor ones.

Watts also claimed that the transition from LiG thermometers to electronic thermometers took too long to correct and wouldn’t show up in the data. The analysis in the NCDC paper shows that this claim is also incorrect. In fact, the transition occurred mostly in the mid 1980s, and the transition is clearly visible in the maximum temperature graphs of the figure below where the “adjusted maximum” crosses the red line (0.0 degrees C).

In addition, Watts’ surfacestation.org project classified USHCN temperature stations by using criteria developed for a new generation of climate monitoring stations, known as the United States Climate Reference Network (USCRN). The USCRN stations and the criteria by which they’re gauged as “good” or “poor” are newer and significantly more restrictive than the quality criteria for the USHCN. So Watts’ use of the CRN standards for USHCN stations is something of an apples/oranges comparison. However, the USCRN has 60 months of good data that can be compared to the most recent 60 months of USHCN data. The result is a statistical correlation (r2) of 0.998 and 0.996 for the maximum and minimum monthly temperatures respectively. While this is a short period of correlation, it shows that, at least recently, the USHCN data is clearly reliable. As the NCDC scientists point out,

the value of the USCRN as a benchmark for reducing the uncertainty of history observations from the USHCN and other networks will only increase with time.

The image below visually illustrates the close correlation of the USCRN (black dashes) data to the USHCN data.

Finally, Watts claimed that if the US surface temperature record was unreliable, then by extension, the entire global surface temperature record must be similarly unreliable, since “the U.S. temperature record is widely regarded as being the most reliable of the international databases.” While Watts offered no documentary support for this statement, if we accept his logic, then the results of the NCDC paper clearly show that the international records must be reliable because as the US records have been shown to be. However, it’s certainly possible that the international databases are less reliable than the US database, and so the accuracy of Watts’ original statement is questionable at best.

Ultimately, the paper’s conclusion represents a clear rejection of Watts’ conclusions:

[O]ur analysis and the earlier study by Peterson 2006 illustrate the need for data analysis in establishing the role of station exposure characteristics on temperature trends no matter how compelling the circumstantial evidence of bias may be. In other words, photo and site surveys do not preclude the need for data analysis, and concerns over exposure must be evaluated in light of other changes in observation practice such as new instrumentation.

The NCDC scientists directly acknowledge Watts’ effort at documenting and categorizing the USHCN sites via the surfacestations.org project. And even though Watts’ conclusions in the Heartland Institute white paper cannot be supported, the work he organized and accomplished via a legion of volunteers at surfacestations.org represents a significant contribution to climate science and the surface temperature record in the United States. Unfortunately for Watts, he rushed his white paper to print before he had verified that his conclusions were justified by the measured data.

Ever since Watts and the Heartland Institute published Watts’ white paper, a large number of self-described climate disruption skeptics have been using the white paper as “proof” that that temperature records are riddled with errors. These so-called skeptics claim that the qualitative metadata about the surface stations make strong conclusions about the state of the global climate impossible. The new paper authored the NCDC scientists shows those claims to be wishful thinking. The temperature record clearly shows that the U.S. climate has warmed significantly over the last 130 years, and this paper serves as yet another proof of the robustness of that observation.

Other voices discussing this paper:

Image credits:

Journal of Geophysical Research – Atmospheres

surfacestations.org