Guest Post by Willis Eschenbach

Loehle and Scafetta recently posted a piece on decomposing the HadCRUT3 temperature record into a couple of component cycles plus a trend. I disagreed with their analysis on a variety of grounds. In the process, I was reminded of work I had done a few years ago using what is called “Periodicity Analysis” (PDF).

A couple of centuries ago, a gentleman named Fourier showed that any signal could be uniquely decomposed into a number of sine waves with different periods. Fourier analysis has been a mainstay analytical tool since that time. It allows us to detect any underlying regular sinusoidal cycles in a chaotic signal.

Figure 1. Joseph Fourier, looking like the world’s happiest mathematician

While Fourier analysis is very useful, it has a few shortcomings. First, it can only extract sinusoidal signals. Second, although it has good resolution as short timescales, it has poor resolution at the longer timescales. For many kinds of cyclical analysis, I prefer periodicity analysis.

So how does periodicity analysis work? The citation above gives a very technical description of the process, and it’s where I learned how to do periodicity analysis. Let me attempt to give a simpler description, although I recommend the citation for mathematicians.

Periodicity analysis breaks down a signal into cycles, but not sinusoidal cycles. It does so by directly averaging the data itself, so that it shows the actual cycles rather than theoretical cycles.

For example, suppose that we want to find the actual cycle of length two in a given dataset. We can do it by numbering the data points in order, and then dividing them into odd- and even-numbered data points. If we average all of the odd data points, and we average all of the even data, it will give us the average cycle of length two in the data. Here is what we get when we apply that procedure to the HadCRUT3 dataset:

Figure 2. Periodicity in the HadCRUT3 global surface temperature dataset, with a cycle length of 2. The cycle has been extended to be as long as the original dataset.

As you might imagine for a cycle of length 2, it is a simple zigzag. The amplitude is quite small, only plus/minus a hundredth of a degree. So we can conclude that there is only a tiny cycle of length two in the HadCRUT3.

Next, here is the same analysis, but with a cycle length of four. To do the analysis, we number the dataset in order with a cycle of four, i.e. “1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4 …”

Then we average all the “ones” together, and all of the twos and the threes and the fours. When we plot these out, we see the following pattern:

Figure 3. Periodicity in the HadCRUT3 global surface temperature dataset, with a cycle length of 4. The cycle has been extended to be as long as the original dataset.

As I mentioned above, we are not reducing the dataset to sinusoidal (sine wave shaped) cycles. Instead, we are determining the actual cycles in the dataset. This becomes more evident when we look at say the twenty year cycle:

Figure 4. Periodicity in the HadCRUT3 dataset, with a cycle length of 20. The cycle has been extended to be as long as the original dataset.

Note that the actual 20 year cycle is not sinusoidal. Instead, it rises quite sharply, and then decays slowly.

Now, as you can see from the three examples above, the amplitudes of the various length cycles are quite different. If we set the mean (average) of the original data to zero, we can measure the power in the cyclical underlying signals as the sum of the absolute values of the signal data. It is useful to compare this power value to the total power in the original signal. If we do this at all possible frequencies, we get a graph of the strength of each of the underlying cycles.

For example, suppose we are looking at a simple sine wave with a period of 24 years. Figure 5 shows the sine wave, along with periodicity analysis in blue showing the power in each of the various length cycles:

Figure 5. A sine wave, along with the periodicity analysis of all cycles up to half the length of the dataset.

Looking at Figure 5, we can see one clear difference between Fourier analysis and periodicity analysis — the periodicity analysis shows peaks at 24, 48, and 72 years, while a Fourier analysis of the same data would only show the 24-year cycle. Of course, the apparent 48 and 72 year peaks are merely a result of the 24 year cycle. Note also that the shortest length peak (24 years) is sharper than the longest length (72-year) peak. This is because there are fewer data points to measure and average when we are dealing with longer time spans, so the sharp peaks tend to broaden with increasing cycle length.

To move to a more interesting example relevant to the Loehle/Scafetta paper, consider the barycentric cycle of the sun. The sun rotates around the center of mass of the solar system. As it rotates, it speeds up and slows down because of the varying pull of the planets. What are the underlying cycles?

We can use periodicity analysis to find the cycles that have the most effect on the barycentric velocity. Figure 6 shows the process, step by step:

Figure 6. Periodicity analysis of the annual barycentric velocity data.

The top row shows the barycentric data on the left, along with the amount of power in cycles of various lengths on the right in blue. The periodicity diagram at the top right shows that the overwhelming majority of the power in the barycentric data comes from a ~20 year cycle. It also demonstrates what we saw above, the spreading of the peaks of the signal at longer time periods because of the decreasing amount of data.

The second row left panel shows the signal that is left once we subtract out the 20-year cycle from the barycentric data. The periodicity diagram on the second row right shows that after we remove the 20-year cycle, the maximum amount of power is in the 83 year cycle. So as before, we remove that 83-year cycle.

Once that is done, the third row right panel shows that there is a clear 19-year cycle (visible as peaks at 19, 38, 57, and 76 years. This cycle may be a result of the fact that the “20-year cycle” is actually slightly less than 20 years). When that 19-year cycle is removed, there is a 13-year cycle visible at 13, 26, 39 years etc. And once that 13-year cycle is removed … well, there’s not much left at all.

The bottom left panel shows the original barycentric data in black, and the reconstruction made by adding just these four cycles of different lengths is shown in blue. As you can see, these four cycles are sufficient to reconstruct the barycentric data quite closely. This shows that we’ve done a valid deconstruction of the original data.

Now, what does all of this have to do with the Loehle/Scafetta paper? Well, two things. First, in the discussion on that thread I had said that I thought that the 60 year cycle that Loehle/Scafetta said was in the barycentric data was very weak. As the analysis above shows, the barycentric data does not have any kind of strong 60-year underlying cycle. Loehle/Scafetta claimed that there were ~ 20-year and ~ 60-year cycles in both the solar barycentric data and the surface temperature data. I find no such 60-year cycle in the barycentric data.

However, that’s not what I set out to investigate. I started all of this because I thought that the analysis of random red-noise datasets might show spurious cycles. So I made up some random red-noise datasets the same length as the HadCRUT3 annual temperature records (158 years), and I checked to see if they contained what look like cycles.

A “red-noise” dataset is one which is “auto-correlated”. In a temperature dataset, auto-correlated means that todays temperature depends in part on yesterday’s temperature. One kind of red-noise data is created by what are called “ARMA” processes. “AR” stands for “auto-regressive”, and “MA” stands for “moving average”. This kind of random noise is very similar observational datasets such as the HadCRUT3 dataset.

So, I made up a couple dozen random ARMA “pseudo-temperature” datasets using the AR and MA values calculated from the HadCRUT3 dataset, and I ran a periodicity analysis on each of the pseudo-temperature datasets to see what kinds of cycles they contained. Figure 6 shows eight of the two dozen random pseudo-temperature datasets in black, along with the corresponding periodicity analysis of the power in various cycles in blue to the right of the graph of the dataset:

Figure 6. Pseudo-temperature datasets (black lines) and their associated periodicity (blue circles). All pseudo-temperature datasets have been detrended.

Note that all of these pseudo-temperature datasets have some kind of apparent underlying cycles, as shown by the peaks in the periodicity analyses in blue on the right. But because they are purely random data, these are only pseudo-cycles, not real underlying cycles. Despite being clearly visible in the data and in the periodicity analyses, the cycles are an artifact of the auto-correlation of the datasets.

So for example random set 1 shows a strong cycle of about 42 years. Random set 6 shows two strong cycles, of about 38 and 65 years. Random set 17 shows a strong ~ 45-year cycle, and a weaker cycle around 20 years or so. We see this same pattern in all eight of the pseudo-temperature datasets, with random set 20 having cycles at 22 and 44 years, and random set 21 having a 60-year cycle and weak smaller cycles.

That is the main problem with the Loehle/Scafetta paper. While they do in fact find cycles in the HadCRUT3 data, the cycles are neither stronger nor more apparent than the cycles in the random datasets above. In other words, there is no indication at all that the HadCRUT3 dataset has any kind of significant multi-decadal cycles.

How do I know that?

Well, one of the datasets shown in Figure 6 above is actually not a random dataset. It is the HadCRUT3 surface temperature dataset itself … and it is indistinguishable from the truly random datasets in terms of its underlying cycles. All of them have visible cycles, it’s true, in some cases strong cycles … but they don’t mean anything.

w.

APPENDIX:

I did the work in the R computer language. Here’s the code, giving the “periods” function which does the periodicity function calculations. I’m not that fluent in R, it’s about the eighth computer language I’ve learned, so it might be kinda klutzy.

#FUNCTIONS

PI=4*atan(1) # value of pi

dsin=function(x) sin(PI*x/180) # sine function for degrees

regb =function(x) {lm(x~c(1:length(x)))[[1]][[1]]} #gives the intercept of the trend line

regm =function(x) {lm(x~c(1:length(x)))[[1]][[2]]} #gives the slope of the trend line

detrend = function(x){ #detrends a line

x-(regm(x)*c(1:length(x))+regb(x))

}

meanbyrow=function(modline,x){ #returns a full length repetition of the underlying cycle means

rep(tapply(x,modline,mean),length.out=length(x))

}

countbyrow=function(modline,x){ #returns a full length repetition of the underlying cycle number of datapoints N

rep(tapply(x,modline,length),length.out=length(x))

}

sdbyrow=function(modline,x){ #returns a full length repetition of the underlying cycle standard deviations

rep(tapply(x,modline,sd),length.out=length(x))

}

normmatrix=function(x) sum(abs(x)) #returns the norm of the dataset, which is proportional to the power in the signal

# Function “periods” (below) is the main function that calculates the percentage of power in each of the cycles. It takes as input the data being analyzed (inputx). It displays the strength of each cycle. It returns a list of the power of the cycles (vals), along with the means (means), numner of datapoints N (count), and standard deviations (sds).

# There’s probably an easier way to do this, I’ve used a brute force method. It’s slow on big datasets

periods=function(inputx,detrendit=TRUE,doplot=TRUE,val_lim=1/2) {

x=inputx

if (detrendit==TRUE) x=detrend(as.vector(inputx))

xlen=length(x)

modmatrix=matrix(NA, xlen,xlen)

modmatrix=matrix(mod((col(modmatrix)-1),row(modmatrix)),xlen,xlen)

countmatrix=aperm(apply(modmatrix,1,countbyrow,x))

meanmatrix=aperm(apply(modmatrix,1,meanbyrow,x))

sdmatrix=aperm(apply(modmatrix,1,sdbyrow,x))

xpower=normmatrix(x)

powerlist=apply(meanmatrix,1,normmatrix)/xpower

plotlist=powerlist[1:(length(powerlist)*val_lim)]

if (doplot) plot(plotlist,ylim=c(0,1),ylab=”% of total power”,xlab=”Cycle Length (yrs)”,col=”blue”)

invisible(list(vals=powerlist,means=meanmatrix,count=countmatrix,sds=sdmatrix))

}

# /////////////////////////// END OF FUNCTIONS

# TEST

# each row in the values returned represents a different period length.

myreturn=periods(c(1,2,1,4,1,2,1,8,1,2,2,4,1,2,1,8,6,5))

myreturn$vals

myreturn$means

myreturn$sds

myreturn$count

#ARIMA pseudotemps

# note that they are standardized to a mean of zero and a standard deviation of 0.2546, which is the standard deviation of the HadCRUT3 dataset.

# each row is a pseudotemperature record

instances=24 # number of records

instlength=158 # length of each record

rand1=matrix(arima.sim(list(order=c(1,0,1), ar=.9673,ma=-.4591),

n=instances*instlength),instlength,instances) #create pseudotemps

pseudotemps =(rand1-mean(rand1))*.2546/sd(rand1)

# Periodicity analysis of simple sine wave

par(mfrow=c(1,2),mai=c(.8,.8,.2,.2)*.8,mgp=c(2,1,0)) # split window

sintest=dsin((0:157)*15)# sine function

plotx=sintest

plot(detrend(plotx)~c(1850:2007),type=”l”,ylab= “24 year sine wave”,xlab=”Year”)

myperiod=periods(plotx)

Share this: Print

Email

Twitter

Facebook

Pinterest

LinkedIn

Reddit



Like this: Like Loading...