There has been a great deal of discussion on a recent CA thread on the efficacy of screening proxies for use in reconstructions by selecting on the size of the correlation between the proxy and the temperature during the calibration time period. During the discussion I asked Nick Stokes the following questions in a comment:

Do you think that it would be appropriate to use correlation to screen tree rings in a particular site or region when doing a temperature reconstruction for that site? Would you not be concerned that the process could bias the result even if the trees did contain actual information about the temperature?

His reply was

Roman, let me counter with a question, “bias the result” Bias from what? You’re using the language of population statistics. But what is the population? And why do you want to know its statistics? What is it biased from? But to answer yours, yes I do. The proxy result is going to track instrumental in the training period. That’s intended, and means it isn’t independent information. But how does it bias what you want, which is the pre-training signal?

The Example

In order to examine whether Nick is correct or not, I thought it might be a nice idea to get some data for which we could compare the effects of screening to those of using all of the proxies. However, it should be clear that no real proxy data exists for doing this, so I decided we could use pseudo-proxies generated for exactly the purpose of testing the paleo reconstruction methodology. A paper (DISCUSSION OF: A STATISTICAL ANALYSIS OF MULTIPLE TEMPERATURE PROXIES: ARE RECONSTRUCTIONS OF SURFACE TEMPERATURES OVER THE LAST 1000 YEARS RELIABLE? by By Gavin A. Schmidt, Michael E. Mann and Scott D. Rutherford) written in the Annals of Applied Statistics to rebut criticism of their previous work by McShane et al (discussed on CA starting here) used pseudo-proxies generated from modeled temperature series with various forms or noise. The data from that paper is available in a 26 MB zipped supplement.

For my example, I chose to use a set of 59 proxies based on the NCAR CSM model temperatures which had generated “temperature” values from 850 to the year 1980. The “noise” is apparently autoregressive, but I don’t know the specific details of the parameters. Other sets of proxies along with another model (GKSS) temperature series are available in the supplement and people should be able to use the function in the R script to carry out further tests on the other pseudo-proxies or on their own generated proxies should they wish to do so.

For a reconstruction method, I wrote a simple script to perform CPS as described in the 2008 paper, Proxy-based reconstructions of hemispheric and global surface temperature variations over the past two millennia (Michael E. Mann, et al). Two methods are available for the calibration step – simple scaling and regression. Since this post is not intended to evaluate which reconstructions are better than others, I though CPS would suffice.

First, the CSM series looks like this:

and the first 10 pseudo-proxies:

The calibration period was chosen (for no particular reason) to be 1901 to 1980. I ran the comparison with a cutoff correlation of .12, mainly to ensure that there were sufficient “screened” proxies to create a reasonable reconstruction. The actual critical value used is not germane to the discussion below – the character of the impact of the screening would be the same with only the magnitude of the effect increasing as the critical value increased.

The reconstruction results for CPS with simple scaling:

The reconstruction results for CPS with regression:

The graphs clearly show that all of the reconstructions are biased, but as I mentioned before, that is not the point of the post. If one looks at mean squared errors of the reconstructions, the screened proxies have done a poorer job:

MSE(All, Scale) = 0.07511459

MSE(Screen, Scale) = 0.15536520

MSE(All, Regression) = 0.1329907

MSE(Screen, Regression) = 0.1855703

Why is this the case? Well, one reason might lie in a plot of the correlation of the proxies with temperature in the Calibration time period and in the pre-calibration era:

It is quite obvious from the plot that a higher correlation in the calibration period is no guarantee of a similarly high correlation prior to calibration. In this case, to categorically state that it is always a good idea to select only proxies which have a correlation higher than some cutoff value is clearly unwarranted here. However, to understand why, we need to do some mathematics.

The Mathematics

Suppose that we have a homogeneous set of proxies that respond linearly to the same temperature series. We can describe the average relationship with a statistical model:

Y k (t) = A + B T(t) + e(t)

where Y k (t) = the value of the kth proxy at time t, T(t) = the temperature at time t, e(t) = random noise at time t, and A and B are constants which relate the numerical effect of the temperature to the proxy. This model might be a simple description of (age adjusted) tree ring sequences from a collection of trees from a single species at a given site or in a small region.

The model would be set up this way (rather than with the roles of Y and T reversed) because it more naturally reflects the reality that the tree rings are reacting to temperature, not the other way around. Without loss of generality, the error (noise) e(t)’s are assumed to average out to zero to make the definition of the constant A unique. The e(t)’s are also assumed to be independent of the temperature sequence so that the relationship between the proxies and the temperature is encapsulated by the linear equation A + B T(t).

Now, some simple mathematics can show that B (which is the slope of the linear relationship) can be expressed as

B = Correlation(T, Y k ) * sqrt(Variance(Y k (t)) / Variance(T(t))

This implies two things. First, the slope B is a simple multiple of the correlation. In fact, if the temperature and the proxies are standardized (i.e. rescaled to all have mean zero and standard deviation 1), B is exactly equal to the correlation. The second is that, since B is the same for all of the proxies, the correlation with temperature (and calculated slopes) for our model should be exactly the same for all proxies.

Despite this fact , in practice, taking a sample of proxies from this model would produce a variety of results for the different proxies. Why? Because some of these proxies are better than others … or is it because in this case all of the variation is due to the noise component of the proxy? It is easy to show that the calculated least squares slope estimate for proxy k can be written as B k = B + B e,k where B e,k is the slope of the noise regressed on the temperatures.

It is also important to note that the coefficient B has a physical interpretation. It represents the change in the proxy due to a unit change in the temperature. Thus, in order to get a valid reconstruction, it is necessary to be able to properly estimate the value of B. Suppose that we form the average of all the slope estimates. Then, it is easy to see that:

Ave(B k ) = B + Ave(B e,k )

so that as we use more and more proxies, due to the independence of the noise and the temperature, Ave(B e,k ) becomes consistently closer to zero and our estimate converges to the correct value B to be used in the calibration.

However, when the proxies are screened using the correlation coefficient, this is not the case. The screening process removes all of the proxies whose correlation with temperature is below a specific threshold. From our observation above that the slope is a multiple of that correlation, this is tantamount to removing all of the proxies for which B e,k is below a particular level. Now, Ave(B e,k ) will converge to a positive non-zero value and our estimate of B will be a biased overestimate of the correct value. This statistical problem is not easily correctable.

So what is the resulting effect on a reconstruction? In order to estimate the temperatures in the past, we need to invert the relationship. If B is known, we can easily remove A from the equation by subtracting the means from both the temperature and the proxy series. Thus we get Y k (t) = B T(t) + e(t) or T(t) = (1/B) Y k (t) – (1/B)e(t). If B is overestimated, then 1/B is too small. Thus the reconstructed temperature series is flattened toward the mean of the temperatures in the calibration period.

In a model where the proxies are theoretically not identical, the effect will still be the same even though the magnitude of the bias may vary. Choosing proxies with higher correlation will invariably increase the proportion of proxies with higher spurious noise-temperature correlation and reduce the opposite. This will contribute toward producing a flatter straighter shaft as in the example.

UPDATE – June18, 2012

Maybe an analogy might give some insight.

Suppose that you want to estimate the mean weight of a population of fish in a body of water. We will use nets to catch fish and weigh them. To avoid dealing with an specious argument from NS (wink,wink), we will assume that each fish is equally likely to enter a net and the weight of a fish is unrelated to where the fish is found. The only problem is that the holes in the net are quite large and all of the fish below a critical size escape from the net and are not caught. Our sample is the set of fish that do not escape.

Is the sample of fish reasonable representative of the entire population? Obviously not because we have no information on the distribution of the fish below the critical “escape” value.

Is the average of the weights of fish in our sample a reasonable estimate of the mean of the entire population? Under the assumptions we have made, the average weight of all of the fish that enter a net is known to have good properties. It is unbiased (i.e. will not systematically over- or under-estimate the mean and larger samples will generally have averages closer to the mean of the population. However, the average weight of the fish that do not escape will be larger than the average weight of the fish who have entered the net since all of the “critically small” fish will be removed. The difference between the two averages will remain approximately the same regardless of how large the number of fish entering the net will be.

How does this relate to our situation?

Each proxy contains information from which we can calculate an estimate B prox of the slope B (which relates the size of the change of this type of proxy to a unit change in the temperature). Due to the random noise, this estimate can be larger than B or smaller. Because B prox is calculated using least squares, it is known to be unbiased and the sample of B prox ‘s is similar to the sample of fish who enter the net. The screening procedure removes all proxies which have a correlation lower than the critical value, but because the B-value is directly related to that correlation, all of the “small” proxies “escape” leaving an unrepresentative sample from which to calculate the reconstruction relationship.

Why is this important? The correct value to use in the reconstruction is the population parameter B in the equations above – any deviation from that value will produce errors. With our screened sample, we cannot get a proper estimate of B and the effect of this is passed on to the reconstruction methodology.

If the proxies are a mix of different types with in some cases only one of a given type, the problem still remains in the same way that the fish can be of different species. Whales will not escape and their information contribution to the average will be unaffected. However, the information from species whose size is closer to the hole size (proxies whose actual correlation is closer to the critical value) will be filtered to appear more temperature responsive.



