A guest post from Ryan O. This post discusses some of our concerns about the global temperature data. I hope to continue this or Ryan can down the road. Ryan is took a brute force approach to looking at pre-homogenization trends of GHCN with a simple script. I think you’ll find the results interesting.



———————-



So lately, a lot of people have been musing about the accuracy of the temperature indices. One of the oft-repeated things is that three independent indices (CRU, GISS, NOAA) all yield similar results. This is presented as confirmation that they cannot be that far off.

Of course, the world is not so simple. All three indices depend in a large part on GHCN. They are not independent. If there is something wrong with GHCN, it will carry through to all three indices. The “raw” data for GHCN consists of 13,472 stations. You can download it here: ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v2/v2.mean.Z Unpack using WinZip or similar (I use 7-Zip since it’s free), import into Excel to be able to split the year from the station identifier, and save as a tab-delimited file. At the end of this, I will provide a script that will read that tab-delimited file into R.

Of those 13,472 stations in GHCN, some go back as far as 1701. I arbitrarily selected 1900 as a cutoff date. You are, of course, free to select other dates. I then wanted to see what the data looked like. To do this, I simply calculated anomalies using the period of 1900-2009, averaged, and also plotted the station density. That yields this graphic:

Of course, this looks quite hoaky (and most of you knew it would ahead of time). The reason for this is simple – you can’t calculate anomalies without figuring out offsets for records that are incomplete during the baseline period. Note how the shape changes greatly as stations are added/deleted (red line maxes at ~8,000 stations). The purpose for showing this graph is to illustrate that, in order to use the greatest amount of data in the GHCN database, you MUST make adjustments. There’s nothing untoward about the fact that adjustments are made . . . if you want to work with anomalies and you want to use as much of the data as possible, you have to make adjustments.

However, you now have a problem: How do you make the adjustments?

To avoid this complication, I simply discarded all series that did not have at least 1,000 points since 1900 (or about 80% complete). This allows me to compute anomalies using a common baseline, and it avoids the complication of having the geographical weight changing over time. The locations in 1900 are the same as the locations in 2000. This won’t necessarily be representative of the whole world, as the network is heavily weighted toward US stations, but I’m just taking an initial look and not trying to come up with my own index yet. Just looking and thinking about what I see.

The 1,000 point cutoff yields 1,793 stations. Unfortunately, there is still another problem. A lot of stations dropped out after 1990. So unless I want to just look at 1900 – 1990, I have to cull some more. For this next cull, I required that from 1990 – 2009, stations had to have at least 180 points (or about 70% complete). That yields 894 stations. You could use other numbers; it doesn’t change the results much. Anyway . . . now that we have a group of long record length stations that are fairly complete, let’s take a look at what we see (red line maxes at ~890 stations):

Hmm. Some interesting things to note. First, the overall warming from 1900 is only about 0.22 Deg C/Century, using the raw data. Now, as stated before, this is not a geographically representative sample. Would a representative sample yield another 1 Deg C of warming? Dunno. What is certainly interesting about this sample is that it has the same general shape as CRU/GISS/NOAA – but the 1930 – 1980 decline is more pronounced and the 1980 – present rise is less pronounced.

Also note that a large number of stations still drop out in ~2006. In fact, there are only about 100 stations from 2006 – 2009 that also have 1,000 points since 1900 and 180 points since 1990. I am not sure why, but as we know this can affect the analysis, let’s take a look at what happens if we just go through 2006:

More hmm. The warming trend is under a tenth of a degree C from 1900 and the 2006 point on the smooth is about the same as the 1936 point on the smooth. Perhaps the drop off in stations is causing a problem; perhaps not. Not enough information in this analysis to tell.

So what does this mean? I don’t know. It certainly doesn’t mean that the temperature indices are wrong because this is a really basic analysis that does not take into account geographical representation or station moves (when the same station identifier was kept). It also is interesting that the high warming trend is not present in this subset of raw GHCN data – meaning that, if the warming is due to adjustments, the three indices must be adjusting in very similar ways. I have no thoughts on how likely that would be at the moment. Regardless, it’s interesting enough to make me want to dig deeper. Maybe I will find that GISS/CRU/NOAA are fine. Maybe not.

Either way, I’m sure there will be more to come.

Script: