I mentioned in this post that Ed Caryl’s “normalization” procedure was an erroneous way to align temperature station records when their time spans were not coincident. There seems to be some question about the reason why.



You can read Caryl’s description here:



After putting all the stations into one spreadsheet with the total year span in the leftmost column, and each station with its own column, aligned with the correct years, each station column was averaged using the SUM of the column divided by the COUNT of the cells in each column with data. Then the average of all the columns was computed. This number is then the average of all the temperatures in all the stations over the whole time period. Call that the “table” average. The next step was to “normalize” the data for each station by subtracting the “table” average from each column average. This results in a normalization factor for each column. That normalization factor was then subtracted from each value in that column. The normalization factor will be different for each station.



Caryl’s procedure ensures that each station record, after adjustment, will have the same overall average. But that doesn’t mean they’re properly aligned! The reason is that if station records cover different time spans, then the relevant average (global, regional, whatever) may not be the same for those different time spans. In fact we rather expect that to be the case.

Allow me to illustrate. Let’s take some artificial data for a hypothetical planet which is warming, consistently and uniformly, at a rate of 2 deg.C/century (0.02 deg.C/yr). We’ll use three stations, the first in the far north, very cold, covering the time span 1900 to 1950. The second is at midlatitudes, covering the time span 1925 to 1975. The third is tropical, covering the time span 1950 to 2000. This is an imaginary planet (not the actual earth), so the temperature records show pure trend with no noise. And here’s the raw data:

Each record shows exactly the same trend — increase at 2 deg.C/century — so clearly this limited data set indicates overall warming also at 2 deg.C/century, which is consistent throughout the century.

If we didn’t align the records at all, instead simply estimating the trend from the raw data, we’d get a whopping 25.7 deg.C/century! That would probably be fatal, even on this imaginary planet. But it’s obviously not right — the different stations are at different locations, and the fact that the coldest station reported earliest while the warmest reported last is purely accidental.

Instead, let’s align them by Caryl’s method. This will reset them so that all station records have the same average, and that produces this:

This too is not right, and using artificial noise-free data makes that obvious. The station records should not have the same average value, because the “planet” was not at the same temperature during the different time spans they cover. Incidentally, trend analysis of this misaligned data indicates warming at a mere 0.68 deg.C/century.

The right way is to align them so that different station records have the best match to each other during their period of overlap. Using the “Berkely method” gives this:

Note that they’re aligned so well that the data points from different stations end up being plotted right on top of each other, since the planet is warming uniformly and there’s no noise in these data. And, just as it should, this aligned data set indicates warming at a rate of 2 deg.C/century.

Caryl’s method will generally tend to suppress trends, whether warming or cooling, biasing different time spans to have the same average value when that may not be the case. It’s only a problem when different data sets don’t cover exactly the same times of observation — but that seems to be a rather ubiquitous condition for real temperature data.