I’ve spent the last week or so reworking the first draft of my universality article for Mathematics Awareness Month, in view of the useful comments and feedback received on that draft here on this blog, as well as elsewhere. In fact, I ended up rewriting the article from scratch, and expanding it substantially, in order to focus on a more engaging and less technical narrative. I found that I had to use a substantially different mindset than the one I am used to having for technical expository writing; indeed, the exercise reminded me more of my high school English assignments than of my professional work. (This is perhaps a bad sign: English was not exactly my strongest subject as a student.)

The piece now has title: “E pluribus unum: from complexity, universality”. This is a somewhat US-centric piece of wordplay, but Mathematics Awareness Month is, after all, a US-based initiative, even though awareness of mathematics certainly transcends national boundaries. Still, it is a trivial matter to modify the title later if a better proposal arises, and I am sure that if I send this text to be published, that the editors may have some suggestions in this regard.

By coincidence, I moved up and expanded the other US-centric item – the discussion of the 2008 US presidential elections – to the front of the paper to play the role of the hook. I’ll try to keep the Commonwealth spelling conventions, though. :-)

I decided to cut out the discussion of the N-body problem for various values of N, in part due to the confusion over the notion of a “solution”; there is a nice mathematical story there, but perhaps one that gets in the way of the main story of universality.

I have added a fair number of relevant images, though some of them will have to be changed in the final version for copyright reasons. The narrow column format of this blog means that the image placement is not optimal, but I am sure that this can be rectified if this article is published professionally.

— E pluribus unum: from complexity, universality —

A brief tour of the mysteriously universal laws of mathematics and nature.

Nature is a mutable cloud, which is always and never the same. (Ralph Waldo Emerson, “History“, from Essays: First Series, 1841)

1. Prologue: the 2008 US presidential election and the law of large numbers.

Do I contradict myself?

Very well then I contradict myself,

(I am large, I contain multitudes.) (Walt Whitman, “Song of myself“, 1855)

The US presidential elections of November 4, 2008 were a massively complicated affair. Over a hundred million voters from fifty states cast their ballot, with each voter’s decision being influenced in countless different ways by the campaign rhetoric, the media coverage, rumors, personal impressions of the candidates, or from discussing politics with friends and colleagues. There were millions of “swing” voters that were not firmly supporting either of the two major candidates; their final decision would be unpredictable, and perhaps even random in some cases. There was the same uncertainty at the state level: while many states were considered safe for one candidate or the other, at least a dozen states were considered “in play”, and could conceivably have gone either way.

In such a situation, it would seem impossible to be able to accurately forecast the election outcome in advance. Sure, there were electoral polls – hundreds of them – but each poll only surveyed a few hundred or a few thousand likely voters, which is only a tiny fraction of the entire population. And the polls often fluctuated wildly and disagreed with each other; not all polls were equally reliable or unbiased, and no two polling organisations used exactly the same methodology.

Nevertheless, well before election night was over, the polls had predicted the outcome of the presidential election, and most of the other elections taking place that night, quite accurately. Perhaps the most spectacular instance of this was the predictions of the statistician Nate Silver, who used a weighted analysis of all existing polls to correctly predict the outcome of the presidential election in 49 out of 50 states, as well as of all 35 of the 35 US senate races. (The lone exception was the presidential election in Indiana, which Silver called narrowly for McCain, but which eventually favoured Obama by just 0.9%.)

The theoretical basis for polling is a mathematical law known as the law of large numbers. Roughly speaking, this law asserts that if one is making a set of samples via some random method, then as one makes the sample size larger and larger, the average outcome of those samples will almost always converge to a single number, known as the expected outcome of that random method. For instance, if one flips a fair coin a thousand times, then one can use this law to show that the proportion of heads one gets from doing so will usually be quite close to the expected value of 50%; indeed, it will be within 3% of 50% (i.e. between 470 and 530 heads out of 1000) about 95% of the time.

In a very similar vein, if one selects a thousand voters at random and in an unbiased fashion (so that each voter is equally likely to be selected by the poll), and finds out who they would vote for (out of two choices, such as Obama and McCain; assume for simplicity that third-party votes are negligible), then the outcome of this poll has a margin of error of about 3% with a 95% confidence level, which means that 95% of the time that such a poll is conducted, the result of the poll will be within 3% of the true result of the election.

One of the remarkable things about the law of large numbers is that it is universal. Does the election involve a hundred thousand voters, or a hundred million voters? It doesn’t matter – the margin of error for the poll will still be 3%. Is it a state that favors McCain to Obama 55% to 45%, or Obama to McCain 60% to 40%? Again, it doesn’t matter – the margin of error for the poll will still be 3%. Is the state a homogeneous bloc of (say) affluent white urban voters, or is the state instead a mix of voters of all incomes, races, and backgrounds? It still doesn’t matter – the margin of error for the poll will still be 3%. And so on and so forth. The only factor which really makes a significant difference1 is the size of the poll; the larger the poll, the smaller the margin of error.

In 2008, reliably accurate meta-polls were still something of a novelty. But they seem here to stay, particularly in high-profile elections in which hundreds of polls are conducted; expect to see more from them in the future.

2. Approaching normal: bell curves and other universal laws.

I know of scarcely anything so apt to impress the imagination as the wonderful form of cosmic order expressed by the “Law of Frequency of Error”. The law would have been personified by the Greeks and deified, if they had known of it. It reigns with serenity and in complete self-effacement, amidst the wildest confusion. The huger the mob, and the greater the apparent anarchy, the more perfect is its sway. It is the supreme law of Unreason. Whenever a large sample of chaotic elements are taken in hand and marshaled in the order of their magnitude, an unsuspected and most beautiful form of regularity proves to have been latent all along. (Sir Francis Galton, Natural Inheritance, 1889, describing what is now known as the central limit theorem.)

The law of large numbers is one of the simplest and best understood of the universal laws in mathematics and nature, but it is by no means the only one. Over the decades, many such universal laws have been found, that govern the behaviour of wide classes of complex systems, regardless of what the components of that system are, or even how they interact with each other. In this article we will have a quick tour of some of these remarkable laws.

After the law of large numbers, perhaps the next most fundamental example of a universal law is the central limit theorem. Roughly speaking, this theorem asserts that if one takes a statistic that is a combination of many independent and randomly fluctuating components, with no one component having a decisive influence on the whole, then that statistic will be approximately distributed according to a law called the normal distribution (or Gaussian distribution), and more popularly known as the bell curve; some examples of this mathematical curve already appeared above in Figure 3. The law is universal because it holds regardless of exactly how the individual components fluctuate, or how many components there are (although the accuracy of the law improves when the number of components increases); it can be seen2 in a staggeringly diverse range of statistics, from the incidence rate of accidents, to the variation of height, weight, or other vital statistics amongst a species, to the financial gains or losses caused by chance, to the velocities of the component particles of a physical system. The size, width, location, and even the units of measurement of the distribution varies from statistic to statistic, but the bell curve shape can be discerned in all cases. This convergence arises not because of any “low-level” or “microscopic” connection between such diverse phenomena as car crashes, human height, trading profits, or stellar velocities, but because in all of these cases the “high-level” or “macroscopic” structure is the same, namely a compound statistic formed from a combination of the small influences of many independent factors. This is the essence of universality: the macroscopic behaviour of a large, complex system can be almost totally independent of its microscopic structure.

The universal nature of the central limit theorem is tremendously useful in many industries, allowing them to manage what would otherwise be an intractably complex and chaotic system. With this theorem, insurers can manage the risk of, say, their car insurance policies, without having to know all the complicated details of how car crashes actually occur; astronomers can measure the size and location of distant galaxies, without having to solve the complicated equations of celestial mechanics; electrical engineers can predict the effect of noise and interference on electronic communications, without having to know exactly how this noise was generated; and so forth. It is important to note, though that the central limit theorem is not completely universal; there are important cases when the theorem does not apply, giving statistics with a distribution quite different from the bell curve. I’ll return to this point later.

There are some other distant cousins of the central limit theorem that are universal laws for slightly different types of statistics. For instance, Benford’s law is a universal law for the first few digits of a large statistic, such as the population of a country or the size of an account; it gives a number of counterintuitive predictions, for instance that any given statistic occurring in nature is more than six times as likely to start with the digit 1, than with the digit 9. Among other things, this law (which can be explained by combining the central limit theorem with the mathematical theory of logarithms) has been used to detect accounting fraud, since numbers that are simply made up, as opposed to arising naturally in nature, often do not obey this law.

In a similar vein, Zipf’s law is a universal law that governs the largest statistics in a given category, such as the largest country populations in the world, or the most frequent words in the English language. It asserts that the size of a statistic is usually inversely proportional to its ranking; thus for instance the tenth largest statistic should be about half the size of the fifth largest statistic. (The law tends not to work so well for the top two or three statistics, but becomes more accurate after that.) Unlike the central limit theorem and Benford’s law, this law is primarily an empirical law; it is observed in practice, but mathematicians still do not have a fully satisfactory and convincing explanation for how the law comes about, and why it is so universal.

3. At the threshold: the universality of phase transitions

There is nothing so stable as change. (Bob Dylan, 1963. From “No Direction Home”, by Robert Shelton)

We’ve been discussing universal laws for individual statistics: complex numerical quantities that arise as the combination of many smaller and independent factors. But universal laws have also been found for more complicated objects than mere numerical statistics. One example of this is the laws governing the complicated shapes and structures that arise from phase transitions in physics and chemistry.

As we learn in high school science classes, matter comes in various states, including the three classic states of solid, liquid, and gas, but also a number of exotic states such as plasmas or superfluids. Ferromagnetic materials, such as iron, also have magnetised and non-magnetised states; other materials become electrical conductors at some temperatures and insulators at others. What state a given material is in depends on a number of factors, most notably the temperature and, in some cases, the pressure. (For some materials, the level of impurities is also relevant.) For a fixed value of the pressure, most materials tend to be in one state (e.g. a solid) for one range of temperatures, and in another state (e.g. a liquid) for another range. But when the material is at or very close to the temperature dividing these two ranges, interesting phase transitions occur, in which the material is not fully in one state or the other, but tends3 to split up into beautifully fractal shapes known as clusters, each of which embodies one or the other of the two states.

There are countless different materials in existence, each of which having a different set of key parameters (such as the boiling point at a given pressure). There are also a large number of different mathematical models that physicists and chemists use to model these materials and their phase transitions, in which individual atoms or molecues are assumed to be connected to some of their neighbours by a random number of bonds, assigned according to some probabilistic rule. At the microscopic level, these models can look quite different from each other. For instance, the figures below display the small-scale structure of two typical such models: a site percolation model on a hexagonal lattice, in which each hexagon (or site) is an abstraction of an atom or molecule randomly placed in one of two states, with the clusters being the connected regions of a single colour; and a bond percolation model on a square lattice, in which the edges of the lattice are abstractions of molecular bonds that each have some probability of being activated, with the clusters being the connected regions given by the active bonds.

If, however, one zooms out to a more macroscopic scale, and looks at the large-scale structure of clusters when one is at or near the critical value of parameters such as temperature, the differences in microscopic structure fade away, and one begins to see a number of universal laws emerging. While the clusters have a random size and shape, they almost always have a fractal structure, which roughly speaking means that if one zooms in a little on any portion of the cluster, the resulting image resembles the cluster as a whole. Basic statistics such as the number of clusters, the average size of the clusters, or how often a cluster connects two given regions of space, appear to obey some specific universal laws, known as power laws (which are somewhat similar, though not quite the same, as Zipf’s law, which was mentioned earlier). These laws seem to arise in almost every mathematical model that has been put forward to explain (continuous) phase transitions), and have also been observed many times in nature. As with other universal laws, the precise microscopic structure of the model or the material may affect some basic parameters, such as the phase transition temperature, but the underlying structure of the law is the same across all such models and materials.

In contrast to more classical universal laws such as the central limit theorem, our understanding of the universal laws of phase transition are still incomplete. Physicists have put forward some compelling heuristic arguments that explain or support many of these laws (based on a powerful, but not fully rigorous, tool known as the renormalisation group method), but a completely rigorous proof of these laws has not yet been obtained in all cases. This is very much a current area of research; for instance, in August of 2010 a Fields medal (one of the most prestigious prizes in mathematics) was awarded to Stanislav Smirnov for his breakthroughs in rigorously establishing the validity of these universal laws for some key models (such as percolation models on a triangular lattice).

4. Nuclear resonance and the music of the primes: the universality of spectra

Even without Hollywood hyperbole, however, the chance encounter of Montgomery and Dyson was a genuinely dramatic moment. Their conversation revealed an unsuspected connection between areas of mathematics and physics that had seemed remote. Why should the same equation describe both the structure of an atomic nucleus and a sequence at the heart of number theory? And what do random matrices have to do with either of those realms? In recent years, the plot has thickened further, as random matrices have turned up in other unlikely places, such as games of solitaire, one-dimensional gases and chaotic quantum systems. Is it all just a cosmic coincidence, or is there something going on behind the scenes? (Brian Hayes, “The spectrum of Riemannium“, American Scientist, 2003).

We are nearing the end of our tour of universal laws, and I’ll now turn to another example of this phenomenon which is closer to my own area of research. Here, the object of study is not a single numerical statistic (as was the case of the central limit theorem) or a shape (as was the case for phase transitions), but a discrete spectrum: a sequence of points (or numbers, or frequencies, or energy levels) spread out along a line.

Perhaps the most familiar example of a discrete spectrum is the radio frequencies emitted by local radio stations; this is a sequence of frequencies in the radio portion of the electromagnetic spectrum, which one can of course access by turning a radio dial. These frequencies are not evenly spaced, but usually some effort is made to keep any two station frequencies separated from each other, to reduce interference.

Another familiar example of a discrete spectrum is the spectral lines of an atomic element that come from the frequencies that the electrons in the atomic shells can absorb and emit, according to the laws of quantum mechanics. When these frequencies lie in the visible portion of the electromagnetic spectrum, they give individual elements their distinctive colour, from the blue light of argon gas (which, confusingly, is often used in neon lamps, as pure neon emits orange-red light) to the yellow light of sodium. For simple elements, such as hydrogen, the equations of quantum mechanics can be solved relatively easily, and the spectral lines follow a regular pattern; but for heavier elements, the spectral lines become quite complicated, and not easy to work out just from first principles.

An analogous, but less familiar, example of spectra comes from the scattering of neutrons off of atomic nuclei, such as the Uranium-238 nucleus. The electromagnetic and nuclear forces of a nucleus, when combined with the laws of quantum mechanics, predict that a neutron will pass through a nucleus virtually unimpeded for some energies, but will bounce off that nucleus at other energies, known as scattering resonances. The internal structures of such large nuclei are so complex that it has not been possible to compute these resonances either theoretically or numerically, leaving experimental data as the only option.

These resonances have an interesting distribution; they are not independent of each other, but instead seem to obey a precise repulsion law that makes it quite unlikely that two adjacent resonances are too close to each other, somewhat in analogy to how radio station frequencies tend to avoid being too close together, except that the former phenomenon arises from the laws of nature rather than from government regulation of the spectrum. In the 1950s, the renowned physicist and Nobel laureate Eugene Wigner investigated these resonance statistics and proposed a remarkable mathematical model to explain them, an example of what we now call a random matrix model. The precise mathematical details of these models are too technical to describe here, but roughly speaking one can view such models as a large collection of masses, all connected to each other by springs of various randomly selected strengths. Such a mechanical system will oscillate (or resonate) at a certain set of frequencies; and the Wigner hypothesis asserts that the resonances of a large atomic nucleus should resemble that of the resonances of such a random matrix model. In particular, they should experience the same repulsion phenomenon. Since it is possible to rigorously prove repulsion of the frequencies of a random matrix model, this gives a heuristic explanation for the same phenomenon that is experimentally observed for nuclei.

Now, of course, an atomic nucleus does not actually resemble a large system of masses and springs (among other things, it is governed by the laws of quantum mechanics rather than of classical mechanics). Instead, as we have since discovered, Wigner’s hypothesis is a manifestation of a universal law that govern many types of spectral lines, including those that ostensibly have little in common with atomic nuclei or random matrix models. For instance, the same spacing distribution has been found in the waiting times between buses arriving at a bus stop: