Scientists delight in extracting order from chaos—finding patterns in the complexity of the real world that pull back the curtain and reveal how things work. Sometimes, though, those patterns create more head-scratching than excitement. Such is the case with Benford’s law. One might expect a collection of real-world data—say, the half-lives of various isotopes, for example—to pretty much look like random numbers. And one might further expect the first (non-zero) digit of each of those numbers to also be random (i.e. just as many 2s as 9s).

Oddly, one would (in many cases) be wrong. It turns out that 1s are more likely than 2s, which are more likely than 3s, and so on. Not only that, the probabilities match a logarithmic distribution, just like the spacing on a logarithmic scale. The number 1 will be the first digit about 30 percent of the time, 2 will occur nearly 18 percent of the time, all the way on down to 9 showing up only about 5 percent of the time.

Law-abiding citizens everywhere will be happy to know our planet also obeys Benford's Law, with the duration and size of volcanic eruptions showing the same sort of pattern.

This strange phenomenon was first expressed in 1881 by an astronomer named Simon Newcomb. While using printed tables of logarithms, he noticed that the pages containing numbers that start with 1 were much more worn than the others. After thinking it out, he proposed that the occurrence of digits in the log tables in fact followed a logarithmic distribution themselves.

In 1938, the physicist Frank Benford rediscovered this idea, explored it more fully, and formalized the equation that describes it. He analyzed a number of data sets and showed that the relationship existed in the real world. It’s obviously not universal—it won’t be true of numbers in a telephone book, for example, which share assigned area codes and prefixes. Still, Benford’s law has held good for a truly bewildering variety of data sets, including the surface area of rivers, the specific heat of chemical compounds, mathematical constants in physics, baseball stats, street addresses, populations of US counties, and a number of mathematical tables and series. (Try a few more for yourself.)

Perhaps most famously, Benford’s law has been used to detect financial fraud. Folks who cook the books assume that random numbers will look inconspicuous, not realizing that’s exactly what can make them look conspicuous. Dodgy rounding will also cause a data set to stick out like a sore thumb and get you caught red-handed. (You can hear about an example in this episode of WNYC’s Radiolab.) It’s often been suggested that Benford’s law should be applied to the results of suspicious elections, but the relationship can be unreliable unless numbers span multiuple orders of magnitude.

The burning question that can get some people downright irritated with the whole business of Benford’s law is “why the hell should this be true?” No explanation is completely satisfying (unless you’ve got the fortitude for some mathematical heavy lifting), but a couple come close to de-spookifying the idea in at least some circumstances.

Think of a number starting with a 1. What would it take for it to start with a 2? Well, you’d have to double it. Now consider a number starting with a 9. An increase of only 10% will have it back to starting with a 1 again. And, of course, the process repeats—this number will have to be double again before it will start with a 2. For this reason, financial growth (such as investments) will follow Benford’s law quite faithfully.

Several years ago, researchers in the Earth sciences began taking an interest in seeing whether our planet's behavior followed Benford’s law. A paper published in 2010 applied the analysis to things like the length of time between geomagnetic reversals (when the Earth’s magnetic “north” pole flips to the opposite geographical pole), the depth of earthquakes, greenhouse gas emissions by country, and even the numbers of infectious disease cases reported to the World Health Organization by each nation. All of them showed a decent fit to Benford’s law. (As did some things out of this world, like the rotation frequencies of pulsars and masses of exoplanets.)

In a recent paper published in Geology, a pair of Spanish researchers extend this to three more data sets: the area and ages of volcanic calderas and the duration of volcanic eruptions between 1900 and 2009. This is more than just a bit of fun with numbers, as we’re past the point where Benford’s law needs confirmation. The goal is to use it as a sort of simple truth-check on databases of geologic data. If these things don't follow Benford's law, then it could be a sign that a data set is unrepresentative of reality or contains some sort of pervasive error or bias.

Benford’s law fit the eruption duration data very well. The fit for the caldera areas was pretty good, too, though a few digits differed just enough that the authors suspect some excessive rounding may have taken place. The caldera eruption ages, however, showed a marked deviation from Benford’s distribution. There were too many numbers starting with 2 and 3. When they looked closely, they saw this was due to a large number of North American calderas between 23 and 42 million years old.

As it turns out, this is a well-known anomaly. It’s not clear whether there was really an unusual cluster of calderas at that time or this is simply a case of one area being studied more intensely. Regardless, removing those calderas from the analysis returned the data set to harmony with Benford’s law. In essence, Benford’s law provided another way to show that those calderas are anomalous.

Because researchers often want to know whether the data they’re analyzing is a representative sample of the world at large, any technique that could help them do so is likely to get a serious look. The authors conclude, "Since the use of Benford’s law may serve as a simple and quick quality test of data, and provide new ways to detect anomalous signals in data sets, it could be used as a validity check on future databases related to volcanoes." In other words, before you go searching for patterns in a database, it might be prudent to make sure the database conforms to Benford’s pattern.

Geology, 2012. DOI: 10.1130/G32787.1 (About DOIs).