There is a phrase that I hate: "Lies, damn lies, and statistics," which is generally used to disparage anyone presenting any data that they find not to their taste. Classic examples of this turn up in response to epidemiological studies that show that vaccines and autism are not linked, and, of course, that the anthropogenic global warming is, indeed, anthropogenic.

That said, it doesn't stop people from misusing statistics, but—and this is an important point—this misuse can be most effectively countered with the correct use of statistics. A classic example of this has just turned up in Physical Review Letters, where a seeming link between short-term solar activity and longer-term temperature trends was shown to be the result of poor analysis.

Tracing links across time scales



Lets go back a bit and take a look at one of the original papers by Scafetta and West. We should note that these guys are not your average morons-with-an-Internet-connection, but respectable statistical physicists who are contributors—albeit minor—to the field. The basic idea behind the paper was that, although the sun has an 11-year solar cycle, sunspot activity is not smooth. Scafetta and colleagues wondered if these noncyclical fluctuations might not be driving variations in the Earth's atmospheric temperature on time scales longer than 11 years.

That is a good question, and one worth asking. We might be tempted to think that, because the time scales are so different, they cannot be linked—in a chaotic system, this is not necessarily true. But, if there is a real, physical, cause-and-effect link between the statistics of solar flares and global temperature trends, then they should exhibit similar scaling properties. That is, if solar activity changes on a daily basis but the effects on the Earth's atmosphere take place on a monthly basis, the statistics should look identical after speeding up the Earth's atmospheric dynamics by a factor of 30.

There is, however, an important caveat in here. The Earth's atmosphere is also driven by unrelated processes on different time scales. Those that occur at much shorter time scales appear to be noise—daily fluctuations in the global mean temperature would be an example—while those that occur on much longer time scales will appear as trends. To correctly identify if two phenomena might be related, one must be wary of both noise and trends.

Troubles with trends

The major problem with the analysis was found to be in dealing with long-term trends. You see, Scafetta and his colleagues had found that the statistical scaling properties were related on time periods ranging from a few weeks to a few months. If this is true, then the long-term trends in both the solar flare and the temperature data must be removed. That is, the solar flare data needs to have the 11-year cycle removed and the temperature data needs to have trends on the scale of 10-20 years removed. In the original work, the trend had not been properly removed from either data set.

In fact, the problems were more fundamental than that. If you are analyzing data over various time scales, which is what the researchers are doing, you face a problem: if the mean is different from one time scale to another, this artificially increases the variance in the data. The upshot is that any trend (be it, up, down, sinusoidal, or some weird polynomial) will make it appear as if the two data sets depend upon each other. Dependence is measured by the Hurst exponent, which, in the presence of any long term trends, will tend to unity whether the trends are causally linked or not. The analysis will indicate incorrectly that the two data sets are derived from processes that produce the same statistics.

You might argue that removing long term trends from temperature data would remove recent warming. This is correct, but the point is that, if the scaling between solar flare activity and global temperatures is robust, it should survive when longer-term trends are removed.

To fly or walk

What the authors of the new paper, Rypdal and Rypdal, found in their work is that, once the solar flare data has the 11-year cycle removed, what results is a Lévy flight. The analysis by Scafetta and colleagues had produced a Lévy walk, instead.

It gets worse: Scafetta and colleagues had supposedly shown that the climate also followed Lévy walk statistics. But, once the long-term trend was removed, the climate followed something called fractional Brownian motion—essentially, this is a modified Brownian motion where a particle remembers where it has been for a period of time. This is illustrated in the paper by showing how the real data populate the two different probability density functions: they are completely and remarkably different.

To follow the consequences of this, let me introduce you to the world of Lévy walks and flights. Imagine that you are a pollen grain bouncing around in a liquid. You will find that, in a particular period of time, you will move a characteristic distance via a mean step size—if we have a fixed sampling interval, then the distance you move between samples is the step size—and the step size has a well defined variation. Furthermore, if we examine the contribution of each step to how far you have moved (as the crow flies), then we will find that each step has contributed more or less equally.

This is the sort of diffusion that we see every day. But not all diffusion works like that.

Lévy flights are very different. A grain of pollen going through a Lévy flight will find that most of the distance it travels is dominated by a few very large jumps. In between, it makes lots of little steps. In this case, although there is a well defined mean step-size, the variance in step-size is infinite. This is because the grain of pollen jumps instantly from place to place—that is, it never rests at a particular location.

Lévy walks relax this, allowing finite time at each location, resulting in a well-defined mean and variance, even though transport in a Lévy walk is still dominated by a few very large jumps. So the two are superficially similar, but the Lévy walk is much easier to deal with mathematically.

To really show that the analysis used by Scafetta and colleagues could not distinguish between Lévy walk and a fractional Brownian motion models, the researchers created a fake data set with the same mean, but used numbers extracted from a fractional Brownian motion model. That data produced exactly the same scaling behavior in the tests designed by Scafetta and colleagues. Clearly, that test was inaccurate.

Similarity via different statistics

I'm not too sure I understand the more detailed analysis correctly, but it seems that, because Rypdal and Rypdal found that the distribution of time between solar flares followed a Lévy flight, they performed an analysis that took advantage of a newly discovered property of this behavior.

For data with a finite number of samples, the variance is finite for a Lévy flight, as are all higher order moments (moments are all different measures of how the data varies from its mean value). The researchers were able to show that the scaling behavior of both the solar flare activity and the global temperature remained self-similar for very large changes in scale. Solar flare data was shown to be self-similar for changes in scale from one day to 10,000 days (~27 years). The climate, on the other hand, had very different statistics, but also showed self-similarity for periods ranging from one month through to something like 300 months (25 years).

So, if you assume both systems have the same properties, you can find degrees of self-similarity. The problem here is that your assumption is wrong—the two exhibit completely different statistical behavior, indicating that the link found by Scafetta and colleagues was generated by inadvisably applied statistics. Why did it turn out this way? Well, the statistical tests used in the first papers were unable to distinguish a Lévy walk from a Lévy flight with a trend. Nor could the test distinguish between Lévy walks and fractional diffusion. The upshot being that everything looked like a Lévy walk, even when it isn't.

Out of all this, one thing still really bothers me. The work by Scafetta and colleagues claimed (incorrectly, as it has turned out) that the climate and solar flare data had similar statistical scaling properties over periods ranging up to a few months. I don't understand how that was turned into claims that solar flare activity could account for up to 60 percent of recent warming. If I were more naive, I would believe that Scafetta and company had been taken advantage of, but the cynic in me suggests that they were actively courting attention.

Physical Review Letters, 2010, DOI: 10.1103/PhysRevLett.104.128501

Physical Review Letters, 2010, DOI: 10.1103/PhysRevLett.90.248701 (About DOIs).

Listing image by NASA