The question arose on another blog, when analyzing the Berkeley data, why not use weighted least squares with weights determined by the uncertainty levels listed in the Berkeley data file?



Weighted least squares uses knowledge of the noise level in the data to improve the precision of least-squares regression. If the noise level of each individual data point is , then we should assign weights given by , and our regression will be more precise. That’s great! It requires knowing the noise level for each individual data point. The problem is, the uncertainties listed with the Berkeley data are not those noise levels. They’re only part of the noise — the part we could call measurement error or estimation error.

But there’s more noise in temperature data than just estimation error. There’s also physical noise, which has nothing to do with the uncertainty in instruments or area-weighted averages. It’s noise that arises as an actual physical phenomenon. Even if our instruments were perfectly precise and we had total area coverage of the planet, so that all the uncertainty levels listed were zero, there would still be noise in the data. When you do weighted least squares but omit a major part of the noise, you can get into some very serious trouble.

Let’s create some artificial data. We’ll start with 35 years of data with a perfectly linear trend at a rate of 0.02 deg.C/yr — climate change! Then we’ll add physical noise — weather! — and just to keep things simple we’ll make it Gaussian white noise with standard deviation 0.2 deg.C. That gives us this:

The thin black line gives the actual temperature — not an estimate. But as you can see, it’s noisy.

However, there’s more noise because our measurements are imperfect. This is the noise which is indicated by the “uncertainty” numbers in the Berkeley data file. Let’s simulate that by adding more Gaussian white noise — not weather, but estimation error. However, let’s suppose that the final two data points are much more precise than those which precede them. Maybe we used superior thermometers, or had much better spatial coverage, but the estimation error in those last two data points is far less than in the others, only 0.001 C. This gives us the measured data:

We can estimate the trend by ordinary least squares (not weighted). In fact we get a good estimate, indicated by the fact that the estimated trend (the red line) is so close to the actual trend (the green line):

But if we estimate it by weighted least squares using only the estimation error to determine the weights, we get this:

That’s not just less precise. It’s wrong.

The root of the problem is that the last two data points get so much more weight than the others. That would be OK if their low error levels were truly reflective of the noise in those data points. But they’re not, because even though those final points have far less measurement error, they have just as much physical noise as all the others.

Weighted least squares really is better than unweighted if you know the true noise level. But if you use the wrong noise level, then weighted least squares is just wrong.