« Things to read while the simulator runs; part 8 | Main | How big is a whale? »

November 12, 2009

Power laws and all that jazz, redux

Long time readers will be very familiar with my interest in power-law distributions (for instance, here and here). So, I'm happy (and relieved) to report that my review article, with Cosma Shalizi and Mark Newman, on methods for fitting and validating power-law distributions in empirical data has finally appeared in print over at SIAM Review. Given that this project started back in late 2004 for me, it's very pleasing to see the finished product in print. This calls for a celebration, for sure.

A. Clauset, C. R. Shalizi and M. E. J. Newman. "Power-law distributions in empirical data." SIAM Review 51(4), 661-703 (2009). (Download the code.)

Power-law distributions occur in many situations of scientiﬁc interest and have signiﬁcant consequences for our understanding of natural and man-made phenomena. Unfortunately, the detection and characterization of power laws is complicated by the large ﬂuctuations that occur in the tail of the distribution -- the part of the distribution representing large but rare events -- and by the diﬃculty of identifying the range over which power-law behavior holds. Commonly used methods for analyzing power-law data, such as least-squares ﬁtting, can produce substantially inaccurate estimates of parameters for power-law distributions, and even in cases where such methods return accurate answers they are still unsatisfactory because they give no indication of whether the data obey a power law at all. Here we present a principled statistical framework for discerning and quantifying power-law behavior in empirical data. Our approach combines maximum-likelihood ﬁtting methods with goodness-of-ﬁt tests based on the Kolmogorov–Smirnov (KS) statistic and likelihood ratios. We evaluate the eﬀectiveness of the approach with tests on synthetic data and give critical comparisons to previous approaches. We also apply the proposed methods to twenty-four real-world data sets from a range of diﬀerent disciplines, each of which has been conjectured to follow a power-law distribution. In some cases we ﬁnd these conjectures to be consistent with the data, while in others the power law is ruled out.

Here's a brief summary of the 24 data sets we looked at, and our conclusions as to how much statistical support there is in the data for them to follow a power-law distribution:

Good:

frequency of words (Zipf's law)

Moderate:

frequency of bird sightings

size of blackouts

book sales

population of US cities

size of religions

severity of inter-state wars

number of citations

papers authored

protein-interaction degree distribution

severity of terrorist attacks

With an exponential cut-off:

size of forest fires

intensity of solar flares

intensity of earthquakes (Gutenberg-Richter law)

popularity of surnames

number of web hits

number of web links, with cut-off

Internet (AS) degree distribution

number of phone calls

size of email address book

number of species per genus

None:

HTTP session sizes

wealth

metabolite degree distribution

posted November 12, 2009 08:19 AM in Complex Systems | permalink

Funny that wealth should not be distributed according to a power law, when so much research into this area has been inspired by Pareto's original research :-)

Posted by: Henrik at November 12, 2009 11:05 AM

The story for wealth is slightly more complicated: basically, there's too much structure in the distribution for it to be merely a power-law distribution. It's certainly highly skewed and heavy-tailed, but there's more going on there than a simple power-law hypothesis would lead you to believe.

Posted by: Aaron at November 12, 2009 01:03 PM

Aaron -

Thanks for the paper and the code - I got the pointer from Peter Mucha here at UNC. I used it to test Bank sizes and holdings of credit derivatives (don't follow a power law). Jesse

Posted by: Jesse Blocher at November 12, 2009 05:39 PM