So far, we have considered a model with a simplified dissonance function and assumed octave-wise periodicity, treated in the mean field approximation. One can explore other assumptions for the periodicity as well, simply by redefining x = log b f/f ref for b other than 2. For example, one finds that for tones with odd harmonic partials (ϕ n = 1,3,5, … ), taking b = 3 (so “octaves” are separated by a factor of 3 in frequency) yields a 13-fold phase known as the Bohlen-Pierce scale ( 18 ) with lower free energy than with b = 2 (see fig. S2). One could simplify further in a Potts-type model ( 19 ), in which the tones t i may take on only a discrete set of q pitches, x i = n/q, n = 0,1, … , q − 1. This model allows further analytic calculation and lends itself to other methods from statistical mechanics. These models serve to demonstrate that phase transitions can occur from disordered sound to ordered arrangements of pitches that bear notable resemblance to familiar musical systems.

The mean field model assumes a fixed value of w c , although experimentally w c is found to depend on the root pitch (lower of the two pitches). The values of w c that correspond to root pitches from middle C (C4) to the highest C on the piano (C8) are indicated on Fig. 4 . Each ordered phase occupies a fairly small range of root pitches, with the 12-fold phase occurring only at relatively high pitch, likely because of the gross simplifications of the mean field approximation.

d∣p k ∣/dT is plotted with color indicated by the color bar, superimposing all values of k. Regions of a single color indicate phases with one or few dominant p k ; white regions indicate phases with many significant p k . The lines show −d k /2 versus w c for values of k corresponding to the color bar. w c corresponding to pitches from C4 (middle C) to C8 are indicated.

The dependence of P(x) on both T and the fixed value of w c can be visualized via a color map of d∣p k ∣/dT, with different values of k represented by superimposing different colors (see Methods). The resulting plot is essentially a phase diagram with transitions made apparent by large values of d∣p k ∣/dT. Figure 4 shows the phase diagram for the sawtooth timbre used above, with T swept down at each value of w c . [See fig. S1 for plots of P(x) versus T at several values of w c , as well as a comparison of the phase diagrams with T swept up and down. As expected for a second-order transition, T c1 is the same for T swept up and down, up to a small computational error. The lower transition shows significant hysteresis in T c2 , illustrating that mode coupling leads to a first-order transition]. The lines on the plot show −d k /2 versus w c , with the value of k indicated by the same color scale. When T is swept down from the disordered phase, the first transition is well described by the single-mode, mean field prediction. The single value of k 0 can be seen clearly just below T c1 by a distinct color. As more modes become mixed in, the color fades, suddenly becoming bright white after the transition at T c2 . While the 12-fold phase that appears from 0.022 < w c < 0.042 reproduces the 12-fold division of the octave used in Western music, other ranges of w c exhibit phases with other values of k 0 , such as 5, 7, 19, and 31. Suggestively, these values of k 0 are among those used in some non-Western music traditions ( 15 , 16 ) or in modern music ( 17 ). The particular values of k 0 that appear, and their relative stabilities, depend on the choice of timbre (see fig. S1, C and D, for phase diagrams of timbres featured in non-Western music that do not use a 12-fold octave division). This mean field model therefore provides an avenue for optimizing the timbre of instruments to achieve a desired stability (or instability) for an existing or new musical system.

When more than one p k ≥ 1 ≠ 0, there are additional terms in the free energy arising from interactions between the modes. The coupling yields more complex behavior where multiple modes undergo transitions at the same temperature T c2 , reducing the symmetry to only the assumed octave-wise (“onefold”) symmetry. The spontaneous symmetry breaking randomly chooses one of the 12 peaks to become dominant over the others. This symmetry breaking calls to mind a tuning system that favors a particular key, as was common in medieval music through the time of J. S. Bach. For example, “just intonation” (JI) tuning places pitches at small-integer ratios relative to the root pitch. In Fig. 3 , we compare the maxima of P(x) from the mean field model with the pitches used in the JI and ET systems. Above T c2 , all pitches are treated equally by symmetry, so we must have the even 12-fold division of ET. When the symmetry is broken below T c2 , a single pitch can be designated as the root, and the pitches tend to align with the JI scheme. Just below T c2 , the peaks reflect a compromise between ET and JI, which is similar to tuning systems that have also been used historically, such as the sixth-comma meantone system ( 14 ).

The free energy only depends on the modulus of p k 0 , with the complex phase randomly chosen by spontaneous symmetry breaking. This symmetry breaking is seen in Fig. 2 as the continuous symmetry of the disordered phase changes to the 12-fold translational symmetry below T c1 . The randomly chosen phase of p 12 gives rise to the particular positions of the 12 peaks in P(x). In musical terms, a benchmark pitch is chosen arbitrarily and is fixed only by convention (e.g., A440 at 440 Hz). In this phase, all 12 peaks are equivalent and equally spaced, calling to mind the equal temperament (ET) tuning that has been commonly used in Western music for the last several hundred years ( 14 ).

The transition between the high-T disordered solution and the 12-fold solution can be understood from Landau theory. In the neighborhood of T c1 , we can expand F in powers of a single order parameter p k 0 (see Methods). We find that there is a second-order phase transition at T c1 = − d k 0 /2 from the disordered phase to a phase with k 0 -fold order, where d k0 is the most negative Fourier coefficient. d 12 is the most negative Fourier coefficient in Fig. 1C , with the transition to the 12-fold phase occurring at T = − d 12 /2, as indicated by the dashed line in Fig. 2B .

Figure 2B shows ∣p k (T)∣ for k = 0 to 20 for the data shown in ( Fig. 1A ). As expected, the high temperature solution has p 0 = 1 and all other p k = 0. As T decreases, a single component p 12 is the first to become nonzero at T c1 = 20.2. After a further decrease of T, the other p k become nonzero at T < T c2 = 16.2.

Figure 2A shows the solutions P(x) for several values of T using the first 10 partials of the sawtooth timbre and w c = 0.03 [see fig. S1A for a plot of P(x) over a continuous range of T]. At high T, P(x) = 1 is the stable solution, and the system is completely disordered. At low T, P(x) approaches a single delta-function spike at an arbitrary pitch. As T increases from zero, additional peaks emerge, eventually leading to a transition to a P(x) with 12 equal peaks. We will see below that the high- and low-T behaviors are separated by phase transitions from random, disordered sound to the ordered, discrete tuning systems used in music.

We first apply a mean field approximation and study the equilibrium distribution of pitches. In the limit N t → ∞ and all a ij = a 0 constant, the pitches within the set are described only by their probability distribution P(x), where the pitch x = log 2 f/f ref is specified by the ratio of its frequency f with an arbitrary reference frequency f ref . P(x) represents the probability with which a pitch x is used in a system of music. We make two further simplifying assumptions: (i) We take a fixed value of w c , so D is a function of x only, and (ii) we assume the periodicity P(x) = P(x + 1). That is, the pitch distribution is the same in every octave. With these assumptions, we define D p ( x ) = ∑ n = − ∞ ∞ D ( x + n ) and restrict our attention to a single octave x = [0,1). Integrating over P(x), we obtain the total dissonance D tot = 1 2 ∫ 0 1 ∫ 0 1 P ( x ) D p ( x − y ) P ( y ) dydx (1)and entropy S = − ∫ 0 1 P ( x ) ln P ( x ) d x (2)

The tone lattice

To go beyond the mean field model, we turn to numerical simulation of a system of interacting tones. Using this method, we will be able to relax the unrealistic assumption that each tone interacts equally with all other tones. Instead, each tone will interact with a subset of the other tones. This can be accomplished by placing the tones on a lattice, with interaction only with a set of nearest neighbors. This lattice can be interpreted as existing in an abstract space, where tones on nearby lattice sites have a stronger harmonic relationship than more distant ones. Although many lattices with different dimension and different interactions may be studied, here, I choose one of the simplest cases that can be expected to produce rich behavior: tones on a two-dimensional (2D) square lattice, with only short-range interactions. Given octave-wise periodicity of interactions, this system can be described using the well-studied XY model (20, 21), where the pitch x stands in for the angle θ. The XY model on a 2D lattice is known to produce particularly rich behavior. As in superfluid thin films (22) and nematic liquid crystals (23), one expects to observe a Kosterlitz-Thouless transition from a disordered state, with free vortices to an ordered state exhibiting quasi–long-range order (24–26). A vortex in this system is a region surrounding a topological defect, where the pitches along a closed path enclosing the defect traverse one or more octaves. Following a quench of the system from the disordered to ordered state, metastable states are observed with bound vortices and antivortices (27–29). In this case, moreover, interactions tend not only to align neighboring pitches but also to favor sudden jumps by particular intervals Δx ≈ 3/12,4/12,5/12,7/12,9/12. This type of XY model, where the energy has multiple minima, has found application, for example, in describing stacking domains in bilayer graphene (30), ferroelectric domains in hexagonal manganites (31), and axion-based cosmological models (32). For sufficiently small vortices, we expect not a continuous variation of x around a vortex but, instead, a number of domain boundaries where x changes by one of the more consonant intervals.

To study the 2D XY behavior that emerges from a set of tones on a lattice, we consider interactions only with a set of nearest neighbors and simulate the resulting stochastic dynamics (see Methods). As before, tones with pitches f i and f j interact because of their mutual dissonance D(f i , f j ). For maximum generality, we make no assumption about the periodicity of the pitch distribution and allow the dissonance width w c to vary with the root pitch min(f i , f j ) so that intervals involving lower pitches have broader dissonance curves, as observed experimentally. [Figure S3 shows D(Δx) for several root pitches]. We then use a combination of Langevin dynamics and Metropolis Monte Carlo (MC) to simulate the behavior of the tone lattice at temperature T.

Figure 5A shows the histogram of pitch classes x i mod 1 following a simulated quench from a disordered state to selected T (see fig. S4A for the full pitch histograms). At T = 5.5 and above, no long-range order of the pitches is observed. At T = 5 and below, the pitches show clear ordering. At T = 5 and T = 4, we see five- and sevenfold division of the octave, as was also seen in the mean field result. At T = 3.5, the 12-fold division of the octave has emerged, reproducing the Western musical system (for example, see fig. S4B showing the 12-fold division of the octave in both the frequency spectrum of the T = 3.5 tone lattice and the spectrum of Bach’s Prelude in D major, BWV850). Now, we can delve deeper by studying the spatial arrangement of these pitches on the lattice. [See fig. S5 for plots of the distribution C(Δx, r) of intervals Δx = x i − x j versus distance r between sites i and j, at T = 5.5 and T = 3.5].

Fig. 5 Tone lattice simulation results. (A) Histograms of pitches x mod 1 in a metastable configuration of the tone lattice following a quench to temperature T. At T = 6, no ordering is observed. At T = 5,4, and 3.5, ordering is observed, with the octave divided into 5, 7, and 12, respectively. (B) Pitches on the tone lattice at T = 3.5. Pitch domains are labeled with pitch indices 0 to 11. Major or minor triads are marked with triangles. Junctions of more than three pitch domains are marked with circles. (C) The Tonnetz (edges shown in blue) with connections between neighboring pitch domains at T = 3.5 shown in red.

Figure 5B shows a visualization of the pitches on the lattice in a metastable configuration following a quench to T = 3.5. Each pixel corresponds to one lattice site colored according to a hue/saturation/value triplet. The hue (ranging from blue to red, as shown on the color bar) represents the pitch class x i mod 1, with the saturation and value adjusted to represent ⌊x i ⌋, the octave in which that pitch lies. Darker (lighter) colors represent lower (higher) octaves. The image is characterized by domains of the same hue, indicating a single pitch class within that domain (pitch classes are labeled by indices 0 to 11). Repeated runs of the simulation result in final states that differ in the details but show the same general behavior, such as the same division of the octave, and similar correlations. Figure S6 shows simulation results at T = 4, 5, and 6.

The quenched tone lattice can be viewed as a metastable configuration of bound vortices and antivortices, as expected for the 2D XY model. Because pitches can vary continuously only at a high cost of dissonance, it is more favorable to find domains of nearly constant pitch class separated by domain boundaries, where the pitch class changes by a consonant interval. The most common interval across a boundary is a fourth or fifth (change in pitch index of 5 or 7 mod 12). Major and minor thirds are also common (change in pitch index of 3 or 4 mod 12). Places where two or more domain boundaries meet at a point constitute a topological defect, with the surrounding domains forming a vortex around it. The simplest vortex consists of three domains that meet at a point (see points marked with a triangle in Fig. 5B). The pitches of these domains consist of a root pitch, a major or minor third above the root, and a fifth above the root. This combination of three pitches forms a major or minor triad—the type of chord that forms the backbone of Western music. Vortices with more domain boundaries are indicated by circles. (See the Supplementary Materials for sound clips of selected regions of the tone lattice.)

The arrangement of pitch domains on the lattice shown in Fig. 5B reflects elements of musical harmony. The Tonnetz (Fig. 5C), originally described by Euler (33, 34), is a graph in which the nodes are pitch indices and the edges (shown in blue) represent fifths, major thirds, and minor thirds connecting those pitch classes. In the ET approximation, the Tonnetz can be viewed as a graph on a torus, with the connections around the torus indicated by the gray circles. Moving continuously to the right, we increase by fifths, cycling around the “circle of fifths.” Moving along one diagonal direction changes by minor thirds, and the other diagonal direction changes by major thirds. Each set of three adjacent nodes forming a triangle represents a major or minor triad. The pitch classes on the tone lattice in Fig. 5B that share a border can be represented by a similar graph. We construct this graph first by identifying all pitch domains, defined as contiguous regions with at least five lattice sites having the same pitch, rounded to the nearest of the 12 pitch classes. Pitch domains that share a border are then identified as those that have at least two adjacent lattice sites in one domain bordering two adjacent lattice sites in the other domain. The resulting graph of all neighboring domains on the tone lattice (red lines in Fig. 5C) maps directly onto the Tonnetz. All possible fifths are present (for example, the ellipse in Fig. 5B traverses the circle of fifths), with many of the thirds present as well.

The major or natural minor diatonic scales used in traditional Western music consist of seven contiguous pitch classes around the circle of fifths (or a trapezoid on the Tonnetz), whereas the anhemitonic pentatonic scales used in other musical traditions consist of five contiguous pitches on the circle of fifths. Therefore, all such scales may be constructed from contiguous regions of the tone lattice, with major and minor triads within these scales appearing in these regions. Furthermore, as the tones on a finite lattice relax through the quenched metastable states toward equilibrium, some of the 12 pitch classes begin to grow and others shrink, possibly eventually disappearing. Because the domains are most often separated by fifths or fourths, it is likely that the domains that grow and do not disappear will be contiguous on the circle of fifths. Therefore, the pitches on the tone lattice are expected to converge toward a major or minor diatonic scale (and, upon further relaxation, to an anhemitonic pentatonic scale). Figure 6A shows a histogram of pitch classes from the T = 3.5 tone lattice, arranged in order of the circle of fifths. One can see that the first seven pitch classes are all fairly common (in the shaded box), and the three least common pitch classes are contiguous on the circle of fifths and outside the box. This type of pitch distribution is common in Western music, where many of the pitches are drawn from a major or minor diatonic scale (and thus from seven contiguous pitch classes on the circle of fifths), but with some exceptions. For example, the pitch distribution from Bach’s BWV850 is shown in Fig. 6B, in which all 12 pitch classes are present, but with the first 7 pitch classes (corresponding to the D major scale) occurring more frequently than the latter 5. In other runs of the simulation, or at different times within one run, one might observe peaks in the pitch distribution with more equal or less equal height (or some missing), which would agree better with other styles of music.

Fig. 6 Comparison of the pitch class distribution on the tone lattice and in a selected piece of music. (A) Histogram of pitch classes 12x mod 12 rounded to the nearest pitch class on the T = 3.5 tone lattice and arranged by ascending fifths. (B) Histogram of pitch classes appearing in Bach’s Prelude and Fugue in D major, BWV850, arranged by ascending fifths. The shaded area corresponds to the notes of a diatonic scale.

The arrangement of triads on the tone lattice gives rise to the same elements of music theory that have been extracted from the Tonnetz, such as a geometric interpretation of tonality (35), and progressions of triads that produce parsimonious voice leading, the concept that allows the combination of melody and harmony known as counterpoint (36). By varying the quench temperature T, pitch distributions with different numbers of pitch classes are observed. Inspecting the arrangement of pitch domains on lattices at different T, one observes certain commonly occurring intervals across domain boundaries. One can then propose “Tonnetzes” in these musical systems by analogy to the usual Tonnetz with 12 pitch classes, where the most commonly occurring interval forms the horizontal edges and the less common intervals are represented by diagonal edges (see fig. S6).

The fact that the behavior of the tone lattice in two dimensions replicates many features of traditional Western music raises the question: What happens in other dimensions? No phase transition is expected in 1D (unless the interactions are extended to long range). In 3D, the expected phase transition of a system with XY symmetry has been studied extensively in the context of cosmology. In this case, the phase transition is not a Kosterlitz-Thouless transition but, instead, a normal first- or second-order transition. Still, though, one expects the emergence of topological defects in the ordered phase if the cooling takes place at a finite rate by the so-called Kibble-Zurek mechanism (37). In these topological defects, the point-like vortex cores seen in the 2D lattice are now extended along 1D paths, originally referred to as cosmic strings, and later observed in condensed matter systems (31, 38). It is possible that the extension of the tone lattice to three dimensions may give rise to a 3D Tonnetz-like structure, as has been previously proposed in the spiral array model (35), or in generalized Tonnetzes (3).

The results presented here demonstrate that many patterns in music can emerge from this statistical mechanics framework. Undoubtedly, further insights into the structure of music can be gained by studying related models and bringing all of the tools of statistical mechanics to bear. The historical development of thermodynamics provides a parallel to the present case. Starting in the 17th century, empirical laws (Boyle’s law, Charles’s law, etc.) were postulated to explain observations of the behavior of gases. This top-down approach to thermodynamics proved quite useful. However, the development of statistical mechanics yielded a bottom-up theory, and this fundamental understanding led to an array of new discoveries in the 20th century and beyond. Likewise, top-down music theory has proven quite useful for composing and understanding music, but a bottom-up theory provides a foundational understanding, perhaps leading to new ways of composing, understanding, and enjoying music.