In his blog post, Giorgio Gilestro claims to show that the adjustments made by GHCN to temperature data do not induce artificial trends into the data based mainly on this histogram of adjustments.

Arguments were made on the blog that the temporal order of the adjustments makes a difference and in fact these arguments seem to be correct. I checked this through the following reasonably simple analysis:

I downloaded the GHCN datasets that he used and read them into R line by line. The next step was to identify which lines in the adjusted set corresponded to specific lines in the raw data set. There were one obvious error in the adjusted set: the last 10 lines were identical (but continued nothing but missing value codes. After fixing that, it was found that 31 adjusted lines had no corresponding raw partners – including a full station record. These were also removed.

The individuals differences were calculated for each month and station to produce a set of 5068104 values. Of these 4% were not available and 32% were zeroes. The remainder were adjusted in one way or another, some by fairly large amounts. The resulting differences were averaged over each year for each station:

Nothing obvious here except for the size of some of the adjustments.

Finally, these averages were again averaged over the stations to calculate an average for each year.:

The graph has some interesting features. First of all, there is a fairly linear trend from about 1900 to the present. The increase is about 0.25o. A second unexpected feature was the fact that there seems to be a fairly constant reduction of 0.10o C from about 1990 to 2006! The expression “hiding the decline” comes to mind and I believe this would need some sort of explanation.

The R program is available as a pdf here.

Update: December 13, 2009

Since posting this yesterday, I have become aware of a post doing a similar analysis by blogger hpx83. His results are basically the same as the ones I have posted. However, using further information about the stations, he has also included a graph of the number of stations available in a given year:

What I find particularly interesting about this in relation to the adjustment plot directly above is the post 1990 portion which parallels the sudden drop in the adjustments at that time. Further looks are probably in order.

As well, Jean Demesure commented on the use of a pdf file for posting the R script. I appreciate the problem, however quotes are not properly handled if the script is pasted into the blog post and WordPress limits my ability as to what types of files I can upload to the blog.The only extensions allowed for files containing text are doc, docx, odt, and pdf. Even txt files cannot be uploaded. I have just done a simple experiment which seems to work. A text which is renamed as .doc will upload. I have done this with the current script and it is now here: ghcnR . It opened without difficulty in a simple text editor. I will do this in the future.