arXiv:1202.4212v1 [cs.SD] 20 Feb 2012

Harmony Explained:

Progress Towards A Scientific Theory of Music The Major Scale, The Standard Chord Dictionary, and The Difference of Feeling Between The Major and Minor Triads Explained from the First Principles of Physics and Computation; The Theory of Helmholtz Shown To Be Incomplete and The Theory of Terhardt and Some Others Considered Daniel Shawcross Wilkerson

Begun 23 September 2006; this version 19 February 2012.

Abstract and Introduction

Most music theory books are like medieval medical textbooks: they contain unjustified superstition, non-reasoning, and funny symbols glorified by Latin phrases. How does music, in particular harmony, actually work, presented as a real, scientific theory of music?

In particular we derive from first principles of Physics and Computation the following three fundamental phenomena of music:

the Major Scale,

the Standard Chord Dictionary, and

the difference in feeling between the Major and Minor Triads.

While the Major Scale has been independently derived before by others in a similar manner as we do here [Helmholtz1863, p. 300], [Birkhoff1933, p. 92], I believe the derivation of the Standard Chord Dictionary as well as the difference in feeling between the Major and Minor Triads to be an original contribution to science and art. Further, we think our observations should convert straightforwardly into an algorithm for classifying the basic aspects of tonal music in a manner similar to the way a human would.

Further, we examine the theory of the heretofore agreed-upon authority on this subject, 19th-century German Physicist Hermann Helmholtz [Helmholtz1863], and show that his theory, while making correct observations, and while qualifying as scientific, fails to actually explain the three observed phenomena listed above; Helmholtz isn't really wrong, he just fails to be really right, and considers only physical and not computational phenomena. We also consider the more recent and more computational theory of Terhardt [Terhardt1974-PCH] (and others) and show that, while his approach (and, it seems, that of others following in his thread) also attempts a computational explanation and derives some observations that seem to resemble some of those of the initial part of our analysis, we seem to go further.

I intend this article to be satisfying to scientists as an original contribution to science (as a set of testable conjectures that explain observed phenomena), yet I also intend it to be approachable by musicians and other curious members of the general public who may have long wondered at the curious properties of tonal music and been frustrated by the lack of satisfying, readable exposition on the subject. Therefore I have written in a deliberately plain and conversational style, avoiding unnecessarily formal language; Benjamin Franklin and Richard Feynman often wrote in a plain and conversational style, so if you don't like it, to quote Richard Feynman, "Don't bug me man!"

Table of Contents

1 The Problem of Music 1.1 Modern "Music Theory" Reads Like a Medieval Medical Textbook 1.2 What is a Satisfactory, Scientific Theory? 1.3 Music "Theory" is Not a Scientific Theory of Anything 1.4 Can we Make a Satisfactory Theory of Music? 1.5 Physical Science: Harmonics Everywhere 1.5.1 Timbre: Systematic Distortions from the Ideal Harmonic Series 1.6 Computational Science: as Fundamental as Physical Science 1.6.1 Algorithms are Universal

2 Living in a Computational Cartoon 2.1 Searching for Harmonics 2.1.1 Virtual Pitch: Hearing the Harmonic Series Even When it is Not There 2.1.2 Using Greatest Common Divisor as the Missing Fundamental 2.1.3 Even Animals Seem to Compute the Ideal Harmonic Series 2.2 Artifacts of Optimization 2.2.1 Relative Pitch: Differences Between Sounds 2.2.2 Octaves: Sounds Normalized to a Factor of Two 2.3 Harmony: Sweetness is the Ideal 2.3.1 Recreating an Ideal Harmonic Series using Instruments having Systematically-Distorted Timbre 2.3.2 Harmony Induces Two Kinds of Intervals: Horizontal Within the Note and Vertical Across the Notes 2.3.3 Vertical Intervals Have Pure Ratios 2.3.4 Vertical Intervals Have Balanced Amplitudes 2.3.5 Vertical Intervals Are All The Same Ratio 2.3.6 Harmony is Sweeter Than Sweet 2.4 Interestingness: Just Enough Complexity 2.4.1 The Simplicity of Theme 2.4.2 The Complexity of Ambiguity 2.5 Recognition: Feature Vectors 2.5.1 Soft Computing 2.5.2 False Recognition 2.5.3 Cubism: Partial Recognition Due to Redundant, Over-Determined Feature Vectors

3 Harmonic Music Explained 3.1 The Major Triad 3.2 The Major Scale 3.2.1 Interlocking Triads 3.2.2 Using Logarithms to Visualize Distances Between Tones/Notes 3.2.3 The Keyboard Revealed 3.3 Scales and Keys 3.3.1 Changing Key: Playing Other Groups of Triads 3.3.2 Key Changes Break Harmony 3.3.3 Just versus Equal Tuning 3.4 The Minor 3.4.1 The Minor Triad 3.4.2 The Minor as Auditory Cubism 3.4.3 Minor Scales 3.5 Chords 3.5.1 The Standard Chord Dictionary 3.5.2 How to Turn Sweetness into Mud: Over-Using Octaves 3.5.3 Chords from the Harmonic Series 3.5.4 Chords Inducing Ambiguity 3.5.5 Chords Using the Minor Triad 3.5.6 Chords Preserving Intervals but not Harmonics

4 Miscellaneous Objections 4.1 But what about the Circle of Fifths! 4.1.1 Fifths make a Circle 4.1.2 The Circle of Fifths is Just a Combinatorial Coincidence 4.1.3 The Circle of Fifths Allows for Cool Chord Transitions 4.1.4 The Symmetries of the Circle of Fifths are a Terrible Red Herring 4.2 But Other Cultures Have Different Musical Scales! 4.2.1 A Culture May Simply not be Fully Exploiting All of the Universal Harmonic Features 4.2.2 But The Nasca People Of Peru Use A Linear, Not A Logarithmic, Scale! 4.3 But You Can Make a Piece of Music Based Entirely on That Utterly Un-Harmonic Interval, the Augmented Fourth! 4.4 But I've Been a Musician All My Life / Studied Music In College and I've Never Heard Any of This Before!

5 Helmholtz Fails to Fully Explain Harmony 5.1 Helmholtz's Theory Relies Only On Interfering Overtones, But Harmony Is Something More 5.2 Helmholtz's Theory Doesn't Imply Virtual Pitch 5.3 Helmholtz's Theory is that Pleasure is Only the Absence of Pain 5.3.1 Harmony is Rapture 5.4 Helmholtz's Theory Fails to Fully Explain the Qualitative Difference Between the Major and Minor Triads 5.5 Helmholtz Isn't Really Wrong, He Just Fails To Be Really Right

6 Other Modern Theories, such as Terhardt and 'Fusion or pattern matching' Theory 6.1 Terhardt Recognizes that the Brain is Listening For Something 6.2 Terhardt Does Not Explain Sustained and Minor Chords

7 Future Work: Towards A Unifying Theory of Music 7.1 Melody as Arpeggio 7.1.1 Scale As Theme: Melodic Association From Harmonic Association 7.1.2 Streaming: Multiple Similar Phenomenon Occurring Consecutively Are Explained By The Brain As One Thing Moving 7.1.3 Melody can Easily Create Interesting Ambiguities 7.2 The Role of Narrative Generally 7.3 Embodiment and Emotion 7.4 A Proposal For A Unifying Physical and Computational Theory of Music

8 Acknowledgements

9 References

1 The Problem of Music

People push different keys on a piano; some combinations and patterns sound good; others do not. How does that work? Looking at a piano, it is laid out in the following pattern (w=white, b=black)

... wbwbw wbwbwbw wbwbw wbwbwbw ...

Hmm, the white and black keys mostly just alternate, yet these alternating regions last for 5 and then 7 keys and then that 5/7 region-pair repeats, and where these regions meet there are two adjacent white keys. There seems to be a pattern, but it is quite an odd one.

The piano keyboard seems really weird and ad-hoc.

Further, this weirdness is not specific just to the piano: the key layout reflects the Major Scale [maj] which is the basis of all Western music. Is that black-white pattern somehow fundamental to sound and music itself? Or are they really just a cultural coincidence, combinations of sounds that we have heard over and over since infancy and been trained to associate with different emotions? Is something fundamental to the ear and to sound itself that is going on here or not?

1.1 Modern "Music Theory" Reads Like a Medieval Medical Textbook

These questions have bothered me literally for decades (starting when I was about ten, looking at our piano keyboard and asking "what?!"; I basically wrote the above Section 1 "The Problem of Music" at that time). Consulting "music theory" never helped me either, as

Reading a music theory book is like reading a medieval medical textbook: such books are full of unjustified superstition, non-reasoning, and funny symbols glorified by Latin phrases.

For example, here is the first page from a famous book on Jazz Theory, "Jazz Improvisation 1: Tonal and Rhythmic Principles" by John Mehegan [Mehegan1959]. Recall, this is the first page of Lesson 1 of Section 1 of Book 1, the very first thing the student reads!

"Each of the twelve scales is a frame forming the harmonic system."

What is a "scale"? Where do they come from? For what purpose are there or how does it emerge that there are twelve exactly? What is a "harmonic system" and what does it mean to say a scale "frames" it?

"Diatonic harmony moves in two directions: Horizontal and Vertical."

Really?! They both look pretty diagonal to me. Oh, but it's Diatonic! That sounds Latin so I guess these people are smart.

"By combining these two movements... we derive the scale-tone seventh chords in the key of C."

What is a "chord"? What is a "key"? WHAT THE HECK ARE THEY TALKING ABOUT!

You can't start a science textbook like that. You have to start with simple observations humans can make. You have to build up complex structures from simple ones. You have to motivate your distinctions.

Even if you say "A chord is 3 or more notes played together" that's also almost the definition of a "key" as well; for what purpose do we have this distinction? You could say "well the notes of a key are played together but not at the same time," but that also is true of an arpeggio-ed chord; again what's the distinction? Even if you say "a C major chord is C-E-G" there is no motivation as to how it is that C-E-G sound good together and other combinations of notes do not.

This "music theory" reminds me a bit of Richard Feynman's description of a science textbook he reviewed for the California school board as told in '"Surely You're Joking, Mr. Feynman!": Adventures of a Curious Character', [Feynman1985, p. 270-271], (emphasis in the original):

For example, there was a book that started out with four pictures: first there was a wind-up toy; then there was an automobile then there was a boy riding a bicycle; then there was something else. And underneath each picture it said, "What makes it go?" I thought, "I know what it is: They're going to talk about mechanics, how the springs work inside the toy; about chemistry, how the engine of the automobile works; and biology, about how the muscles work." It was the kind of thing my father would have talked about: "What makes it go? Everything goes because the sun is shining." And then we would have fun discussing it:

"No, the toy goes because the spring is wound up," I would say.

"How did the spring get wound up?" he would ask.

"I wound it up."

"And how did you get moving?"

"From eating."

"And food grows only because the sun is shining. So it's because the sun is shining that all these things are moving." That would get the concept across that motion is simply the transformation of the sun's power. I turned the page. The answer was, for the wind-up toy, "Energy makes it go." And for the boy on the bicycle, "Energy makes it go." For everything, "Energy makes it go." Now that doesn't mean anything. Suppose it's "Wakalixes." That's the general principle: "Wakalixes makes it go." There's no knowledge coming in. The child doesn't learn anything; it's just a word!

1.2 What is a Satisfactory, Scientific Theory?

Further, a scientific theory of something is expected to have a certain "explanatory power". But what is "explanatory power"? Is it just whatever we like? Consider the old explanations of disease; here is one: evil spirits inhabit you [dem]. Well, did anyone ever see these spirits? Were the experiences of these spirits universal across human kind? Where there some general rules of how the spirits behaved? How many there were? What would appease them?

Another theory was Humorism [hum]: that there were four different fluids in the body: blood, black bile, yellow bile, and phlegm; when they got out of balance, you had a disease. Ok, this is better than arbitrary spirits, but did anyone measure the relative levels of these fluids? Could someone predict sickness by observing these fluids get out of balance? Could you make someone better by, say, draining blood from them? "Treatment" based on this theory seem to have been long practiced, but did anyone measure to see if draining blood really made people better versus a control group that did not have their blood drained?

Now we have an new theory called modern medicine. It is much more complex, but let's take a subset of it: there are little creatures called bacteria that live everywhere. Certain kinds can live in your body and the results of their activity, such as their excretions, get your body out of normal working order, and thus you become sick. If you give chemicals to a person that are more toxic to the bacteria than the person, does the person get better? Yes [Mobley-antibiotics] ! Even when compared to a control group? Yes! Can we see these little bacteria in a microscope? Yes! Ok, this is much more satisfactory as a scientific theory.

Now, let us step back and consider what makes us more satisfied with this theory. What is going on such that it is a better theory?

For one thing, the theory is mechanical: we have some mechanism, consistent with our understanding of inanimate matter today (physics and chemistry) such that the operation of the mechanism corresponds with what we observe (Scientific Method) [sci].

Further, this mechanism is deterministic and precise: there isn't much arbitrariness in the mechanism: we can compute rather well how sick someone will get and how much toxin we have to give them to kill the bacteria and not the person.

This mechanism is universal: there is no appeal to beliefs or cultural norms: people throughout the world get sick in the same way and the medicines work on them, with but small differences that can be further explained by another mechanism called genetics.

This mechanical explanation is simple and minimal (Occam's razor) [occ]. We can see the parts working.

Lastly, the mechanism is factored -- made up of independent parts -- and the complexity of the observed phenomena is emergent -- arising naturally from the operation of the parts. That is, these parts of the explanation of disease all operate independently: (1) how the body works such that the bacterial excretions disrupt it, (2) how bacteria works such that the toxin kills it, (3) how the toxicity to the human depends on the size of the human, etc.

Physicist Richard Feynman gave a series of lectures where he attempted to encapsulate the basic nature of how science is done and the kind of results it produces; these were published as "The Character of Physical Law" [Feynman1965]. Here is a brilliant paragraph on how to know when you have finally found the truth. [Feynman1965, p. 171] (underlining added, not in the original):

One of the most important things in this 'guess -- compute consequences -- compare with experiment' business is to know when you are right. It is possible to know when you are right way ahead of checking all the consequences. You can recognize truth by its beauty and simplicity . It is always easy when you have made a guess, and done two or three little calculations to make sure that it is not obviously wrong, to know that it is right. When you get it right, it is obvious that it is right -- at least if you have any experience -- because usually what happens is that more comes out than goes in . Your guess is, in fact, that something is very simple. If you cannot see immediately that it is wrong, and it is simpler than it was before, then it is right. The inexperienced, and crackpots, and people like that, make guesses that are simple, but you can immediately see that they are wrong, so that does not count. Others, the inexperienced students, make guesses that are very complicated and it sort of looks as if it is all right, but I know it is not true because the truth always turns out to be simpler than you thought.

Using Computer Science terminology, I summarize Feynman's point as follows.

The more factored a theory and the more emergent the observed phenomena from the theory, the more satisfying the theory.

The Ptolemaic [ptol] model of the solar system puts the earth at the center. This explanation really does explain the movements, especially when epicycles [epi] are added, but it is rather complex and ad hoc: how does it emerge that we need epicycles? The Copernican [cop] system is also another explanation of the solar system that puts the sun at the center. This second explanation only requires Newton's laws of motion plus gravity. The consequences of Newton's laws are complex and even hard to simulate, even on a modern computer, but the laws themselves are quite simple and independent and mechanical and factored and observable etc. Even further, the notation used in this theory easily reflects the underlying understanding in the theory: it allows for easy calculations when making predictions of the theory. All in all, the Copernican system is quite a quite satisfying explanation, or theory, of the motions of planets in the solar system because, not only does it explain the observed phenomena, it is factored into simple parts and the observed phenomena are emergent from the interactions of those parts. Consequently, we use the Copernican system today (adjusted for relativity and other more recent observations).

1.3 Music "Theory" is Not a Scientific Theory of Anything

Music "theory" as we find in books today contains none of the properties of a modern theory that we find satisfying. At the start we are presented the odd white-black-white-WHITE-black keyboard or Major Scale as a given. We are sometimes told for example that the Major Scale comes from the Ancient Greeks. We are sometimes told it is arbitrary and it only sounds good because we have heard it since childhood.

Nothing in music "theory" counts as a scientific theory of anything.

We are told that certain combinations of notes sound good; these combinations are called "chords" and the fact that these combinations sound good is also arbitrary. We are told lots of strange names for intervals between notes and these names make no sense. The Standard Chord Dictionary of common chords simply consists of a list of note combinations we are told are good to play together and will feel a certain way when heard. Nowhere is there any notion of how we would predict the feeling each chord engenders from the construction of the chord.

Sometimes I have encountered vague explanations offering "pairs of notes having low whole-number ratios" as the reason some notes sound good together and then told no one really knows how that works. In Section 5 "Helmholtz Fails to Fully Explain Harmony" we address a well-known theory of Helmholtz where he attempts an explanation of how it is that notes with frequencies that are in low whole-number ratios to one another should sound good together. We will show that his theory has problems.

If we make any attempt to actually compute note ratios, the notation actually gets in the way of our understanding: The notation for the notes and their distances really does not convey very well the actual ratios of the notes. For example, in the Major Scale, sometimes going up to the next one (space to line above it or line to space above it) goes up one whole "step", a ratio of 2^(1/6) = 1.122 (the sixth root of 2), and sometimes only a "half-step" (or "semi-tone"), a ratio of half as much 2^(1/12) = 1.059 (the twelfth root of 2). (For more on logarithms and exponentials, see Section 3.2.2 "Using Logarithms to Visualize Distances Between Tones/Notes".) (To those unfamiliar with musical notation, we will explain the numbers later.) The difference between these whole and half steps can only be discerned by looking way over to the left of the page of music and doing complex computations with sharps and flats in order to compute the "key" of the music; and that whole process is designed to defeat the sometimes-half/sometimes-whole steps (for the arbitrary key of C) that is baked into the notation itself. This notation may make music easy to play, but it does not make it easy to understand.

This music "theory" has all the properties of preventing understanding, not promoting it. It fits the description of pseudo-science pretty well. Let's try to do better.

1.4 Can we Make a Satisfactory Theory of Music?

I simply refuse to believe that something so fundamental to human life and so satisfying to so many people is so arbitrary and so un-explainable. I have attempted to come up with something better and I think I have succeeded.

As we build up this theory, we want to make sure that we make as few assumptions as possible, and that these assumptions are founded upon actual experimentally-derived facts -- just as we now demand of the rest of science. In particular we would like a real, scientific theory of music to be universal and not appeal to cultural relativism that says "it's all just arbitrary"; no explanation that says such things is a real scientific theory of anything.

Given that

sound and instruments exist in reality and music only sounds like something because a human brain is computing the listening to it,

physics and computation

The brain is central to our theory. Not knowing how the brain really works, we therefore have a hole to fill in our explanation. We proceed by telling a story to explain the known properties of music; along the way we assume certain conjectures about the structure of the brain where we need them. We make these conjectures as reasonable as possible, given the assumption that

The brain is a machine optimized by evolution to compute human survival.

That is, being a machine, the brain is likely to be subject to properties that computer scientists and engineers have observed across many computational systems and that these properties will be driven by evolutionary optimization. In the end, the test of our theory will depend on (1) how well it explains the observed phenomenon called music, and (2) how well the conjectures hold up under testing. In this essay we do (1) and we leave (2) for future work by cognitive/brain scientists.

1.5 Physical Science: Harmonics Everywhere

Physical science is about as rock-solid of a theory of the world as anything. This is a good place to start. Catherine Schmidt-Jones [Schmidt-waves]:

For the purposes of understanding music theory, however, the important thing about standing waves in winds is this: the harmonic series they produce is essentially the same as the harmonic series on a string. In other words, the second harmonic is still half the length of the fundamental, the third harmonic is one third the length, and so on.

We can either compute or observe (using, say, high-speed cameras) the properties of the stable vibrations that occur when a string or or a column of air is excited:

There is one frequency (the "fundamental") at which the string or air will vibrate; there are also other vibrations (the "harmonics" or "overtones") having higher frequencies that are multiples of 2, 3, 4, 5, 6, 7 etc. times the fundamental at which the string or air will also vibrate.

These harmonics can be demonstrated by two people hold a long jump-rope: (1) If they swing the rope slowly, the whole rope makes a single wave. (2) However if they go twice as fast and out of phase (one goes up while the other goes down) then half of the rope will be up and the other half down and the positions of up and down will switch twice as fast; further the very middle of the rope will not move at all (a "node"). (3) A similar effect happens with three waves if they go even faster. For a picture, see [Schmidt-waves, Figure 2]. When a string is plucked, all of these waves are happening at the same time. That is, plucking generates all waves, but only those the frequency of which divides the length of the string will bounce back and forth and re-enforce each other and persist; other frequencies will die out. From [Schmidt-waves]:

In order to get the necessary constant reinforcement, the container has to be the perfect size (length) for a certain wavelength, so that waves bouncing back or being produced at each end reinforce each other, instead of interfering with each other and cancelling each other out. And it really helps to keep the container very narrow, so that you don't have to worry about waves bouncing off the sides and complicating things. So you have a bunch of regularly-spaced waves that are trapped, bouncing back and forth in a container that fits their wavelength perfectly. If you could watch these waves, it would not even look as if they are traveling back and forth. Instead, waves would seem to be appearing and disappearing regularly at exactly the same spots, so these trapped waves are called standing waves.

We will call each single sine-wave at a single frequency a "tone", whereas the collection of frequencies that occur together due to a single physical process (such as a vocal utterance or the striking of a piano key) we will call a "note". (A tone can be expressed simply as (1) a wave "frequency" in Hertz (Hz), the number of cycles per second, (2) a wave "amplitude", the wave peak height, and (3) a wave "phase", where the wave is in its cycle compared to other waves; we won't discuss amplitude and phase much.)

This sequence of tones forming a note is called the "Harmonic Series" [har] or "Overtone Series" of the fundamental. Herein we speak of "the (ideal) Harmonic Series" when we mean an abstract computational ideal and speak of "an overtone series" when we mean what is actually produced in reality by a particular actual instrument (which may be quite different from the ideal); note that others quoted here may not follow this same convention. (Further, throughout we pluralize "series" as "series-es" because in a technical discussion it is very important to avoid the ambiguity between a single series of multiple tones and multiple series-es of multiple tones.)

There are two conventions for numbering overtones/harmonics; we use the convention where the fundamental or "Root" tone is called "harmonic 1", the tone vibrating twice as fast is called "harmonic 2", the tone vibrating three times as fast is called "harmonic 3", etc.

1.5.1 Timbre: Systematic Distortions from the Ideal Harmonic Series

The timbre of a sound is the principal feature that distinguishes the grow of a lion form the purr of a cat, the crack of thunder from the crash of ocean waves,.... Timbral discrimination is so acute in humans that most of us can recognize hundreds of different voices. We can even tell whether someone close to us -- our mother, our spouse -- is happy or sad, healthy or coming down with a cold, based on the timber of that voice. Timbre is a consequence of the overtones.... When you hear a saxophone playing a tone with a fundamental frequency of 220 Hz, you are actually hearing many tones, not just one. The other tones you hear are integer multiples of of the fundamental: 440, 660, 880, 1200, 1420, 1640, etc. The different tones -- the overtones -- have different intensities, and so we hear them as having different loudnesses. The particular pattern of loudnesses for these tones is distinctive of the saxophone, and they are what give rise to its unique tonal color, its unique sound -- its timbre. A violin playing the same written note (220 Hz) will have overtones at the same frequencies, but the pattern of how loud each one is with respectively to the others will be different. Indeed, for each instrument, there exists a unique pattern of overtones. For one instrument, the second overtone might be louder than in another, while the fifth overtone might be softer. Virtually all of the tonal variation we hear -- the quality that gives a trumpet its trumpetiness and that gives a piano its pianoness -- comes from the unique way in which the loudnesses of the overtones are distributed. Each instrument has its own overtone profile, which is like a fingerprint. It is a complicated pattern that we can use to identify the instrument. Clarinets, for example, are characterized by having relatively high amounts of energy in the odd harmonics -- three times, five times, and seven times the multiples of the fundamental frequency, etc. (This is a consequence of their being a tube that is closed at one end and open at the other.) Trumpets are characterized by having relatively even amounts of energy in both the odd and the even harmonics (like the clarinet, the trumpet is also close at one end and open at the other, but the mouthpiece and bell are designed to smooth out the harmonic series). A violin that is bowed in the center will yield mostly odd harmonics and accordingly can sound similar to a clarinet. But bowing one third of the way down the instrument emphasizes the third harmonic and its multiples: the sixth, the ninth, the twelfth, etc.

Besides introducing us to timbre, Levitin points out:

Most real instruments systematically produce tones having amplitudes distinct from that of the ideal Harmonic Series.

Michael O'Donnell points out that the effects of timbre on the overtone series goes even further [O'Donnell, 14 January 2009]:

I suggest that you check into the importance of approximate harmonic series. E.g., the overtones on a piano string are measurably and audibly higher in frequency than the harmonics that they approximate. Both the nearness to harmonics, and the perceptible difference, appear to be important.... You mentioned the way that the harmonic series of frequencies occurs naturally in air columns, as in strings. But, on soft strings (such as guitar, violin---little resistance to bending) the natural series of resonant frequencies is very accurately harmonic. In wind instruments, the natural resonances of the air column approximate the harmonic series rather poorly. In the brass, the approximation is so poor that the numbers of the harmonics don't even match between the natural resonances and the notes as played. While the conical shape of many reeds is designed to improve the harmonicity of the resonances, the bell on the brass is actually designed to increase the inharmonicity of the natural resonances, which produces a better match in the misaligned overtones. It is phase locking between vibrational modes, caused by the highly nonlinear feedback in the excitation mechanisms (reeds, lips, bow scraping) that makes the overtone series so accurately harmonic, not the natural resonances.

That is, O'Donnell points out:

Most real instruments systematically produce tones having frequencies distinct from that of the ideal Harmonic Series.

Therefore whatever our theory of harmony it should work for sounds where the overtone series differs from the ideal Harmonic Series by (1) altered amplitudes and (2) altered frequencies. However, notice that both of these distortions of the ideal Harmonic Series have one important property:

The distortions made by the overtone series of a given instrument to the ideal Harmonic Series are a predictable, systematic function of the instrument kind.

That is, two notes (series-es of overtones) made by the same (kind of) instrument will be distorted from the ideal Harmonic Series in the same (or similar) way. This must be the case in order for an instrument or instrument kind to have a uniform, recognizable timbre. We will use this below.

1.6 Computational Science: as Fundamental as Physical Science

I think part of the reason the theory we develop here might not have been described before is that there aren't many people who think about both the physical and the computational understanding needed to derive it.

The properties, or laws, of computation are just as fundamental as the physical laws.

Computation is everywhere -- you live in a sea of it.

You may see a cup, but computational engineers see an idiom for managing liquids by getting them stuck in a local optimum.

You may think of ownership as a basic human right, but engineers think of it as an distributed decision-making algorithm.

You may enjoy a field full of bumblebees pollinating flowers, but engineers enjoy it as information distribution network.

You may think it is polite to not talk on top of other people at dinner, but engineers think it is optimal to use a back-off algorithm to resolve a network packet collision.

I wrote that list off of the top of my head as fast as I can type and edit text: the examples are myriad.

Consider for a moment that perhaps you are computation: that you are the computational activity of your brain. Some people say that this reduces the wonder of life to simple mechanism; I say it simply elevates mechanism to the wonder of life. While you need not adopt this All-Is-Computation point of view as your personal understanding of life or of yourself, a computational understanding of the brain has amazing explanatory power, so please consider it at least for the rest of this essay.

1.6.1 Algorithms are Universal

Finding good ways to solve a problem with less resources is a basic pursuit of those who study computation. A general method for solving a problem is called an "algorithm"[alg]. New algorithms that solve common problems well are rare and highly valued. When a solution is "reduced to the simplest and most significant form possible without loss of generality" we say it is "canonical" [canon]. An algorithm is a canonical method.

Many tricks in engineering seem not to be merely the artifacts of human cleverness, but instead the result of fundamental properties of the medium of computing. Algorithms invented by different species to solve the problem called staying alive often resemble each other in ways that cannot be explained by any other means than "that's the only way to do it" (or one of only a few ways). From [cutt]:

The organogenesis of cephalopod eyes differs fundamentally from that of vertebrates like humans. Superficial similarities between cephalopod and vertebrate eyes are thought to be examples of convergent evolution.

The human eye and the cuttlefish eye both address the problem of extracting information at a distance from light. Both evolved separately and yet they both end up at a very similar solution. Biologists call this phenomenon "convergent evolution" [conv]; architects call it "timeless pattern" [Alexander1979]; storytellers call it "archetype" [archetype]; clothiers call it "classical style"; computer scientists call it "algorithm". When humans tried to find a mechanical solution to the same problem, they invented the camera which is just an eye again. We should therefore not be surprised if

Conjecture One: Computational laws/idioms/patterns/algorithms are universal: The brain works using a combination of simple computational algorithms of which we are likely already aware.

2 Living in a Computational Cartoon

"I'm not bad, I'm just drawn that way." -- Jessica Rabbit [ Jessica-bad ]

Jessica Rabbit [Jessica-pout] is one of the sexiest characters in Hollywood, elected 88th of The 100 Greatest Movie Characters of All Time by Empire Magazine [Jessica-great]. Sadly, she is just a drawing and a voice. Despite the powerful illusion to the contrary, we do not see or hear the world; we see and hear the world that our brains compute. Like the characters in "Who Framed Roger Rabbit?" [WFRR-1988], we live in a cartoon. Music is not what the world does; it is what we do with the world.

A friend of mine Joel Auslander used to intern at Pixar; his job was to make physics simulator tools for the animators. He wanted to make simulators that were accurate to the real physics, but he said that the animators told him that people don't want to watch real physics, people want to watch cartoon physics: even though not accurate as real physics, cartoon physics is somehow more satisfying [Auslander, c. 1996].

Conjecture Two: The brain uses cartoon physics, that is, physics that is easy to compute, but not necessarily faithfully accurate to reality.

We suggest that both the use of cartoon physics and the inaccuracy of cartoon physics are due to the simple fact that the brain is computationally limited.

Here is a cartoon physics effect in vision. When taking a drawing class our teacher pointed out some useful visual effects to us: (1) To make an object look round, shade the object the more its face bends away from the viewer and (2) put highlights where the light source would reflect off of it. Now think what pantyhose do to women's legs. (1) When the mesh of the hose is straight on, it is not very dark, but as the leg bends away and the mesh is seen on edge, the threads line up and the grid rapidly appears to darken. (2) Pantyhose are shiny and so naturally produces reflection highlights. That is, pantyhose fire the recognizers in your brain for the features of roundness harder than a real round leg could: her leg looks rounder than round, impossibly round. See Section 2.5 "Recognition: Feature Vectors" for more on this phenomenon.

We suggest that the brain is using cartoon physics when processing sounds as well. That is, explanations of auditory effects based on the physical properties of actual overtones of different instruments (such as the piano or the trumpet) are beside the point (or at least beside the primary point) when it comes to the brain. As we will see in Section 5 "Helmholtz Fails to Fully Explain Harmony", this point of view is the essential point where our theory differs from that of Helmholtz. What primarily distinguishes this essay from previous attempts to explain music is that our whole approach is oriented primarily not from the external world of physics, but from the internal world of the computation by our brains that is us, from the computational cartoon in which we live and from which we think we experience the world, but which is not the world, but instead only ourselves.

2.1 Searching for Harmonics

As Levitin pointed out in Section 1.5.1 "Timbre: Systematic Distortions from the Ideal Harmonic Series", finding the difference between what we hear and the ideal Harmonic Series is a valuable tool for recognizing people and determining their emotional state. Many sounds are made by vibrating strings or columns of air, but perhaps more importantly, the human voice is made up of vibrating "chords" and a "windpipe" of air. Given that sounds associated to a single source would tend to be arranged in a Harmonic Series, and especially given how important the voice is to humans, it would not be surprising if perhaps

Conjecture Three: Finding harmonics is a common and important problem, so the brain has hardware for recognizing the Harmonic Series.

You can hear a demonstration of this, and of many other interesting auditory phenomena, on from the "Auditory Demonstrations" CD from the Institute for Perception Research, Eindhoven, The Netherlands and the Acoustical Society of America [acoustical-demo, Demo 1], "Cancelled Harmonics":

[Twenty tones in the same Harmonic Series are all played together.] When the relative amplitudes of all 20 harmonics remain steady (even if the total intensity changes), we tend to hear them holistically. However, when one of the harmonics is turned off and on, it stands out clearly. The same is true if one of the harmonics is given a "vibrato" (i.e. its frequency, its amplitude, or its phase is modulate at a slow rate).

I recall my voice teacher Andrea Fultz saying the goal was to get me to sing so that my voice resonated in my "mix": in both my head and chest voice at the same time [Fultz, c. 2006]. She was trying to get me to have a more ringing or sweeter voice by making sure all the overtones were present by ensuring that somewhere in my body some resonator of the right size was amplifying it (see Section 2.3 "Harmony: Sweetness is the Ideal" below).

2.1.1 Virtual Pitch: Hearing the Harmonic Series Even When it is Not There

There is reliable acoustic phenomenon called "Virtual Pitch": if the Harmonic Series is processed to remove the Root or Fundamental tone and then played to a person, that person will hear the note, including the Root tone, even thought it is not played [miss-fund]. The "Auditory Demonstrations" CD again [acoustical-demo, Demo 20], "Virtual pitch":

A complex tone consisting of 10 harmonics of 200 Hz having equal amplitude is presented, first with all harmonics, then without the fundamental, then without the two lowest harmonics, etc. Low-frequency noise (300-Hz lowpass, -10dB) is included to mask a 200-Hz difference tone that might be generated due to distortion in playback equipment.

As they say, in the demo overtones are subtracted one at a time, from the fundamental on up. Amazingly, the note being played seems to stay the same; however it does get more buzzy or annoying to the point where a fellow listener Simon Goldsmith thought that he would no longer call the last example the same note [Goldsmith, c. 2010].

Virtual pitch is what allows engineers to fake bass notes on small speakers: they don't play the low tones, as often the speaker is too physically small to make the fundamental frequency anyway; instead they play the overtones and rely on your brain to reconstruct the whole Harmonic Series. However, as we noted above, you will hear that small, cheap speakers sound, well, cheap or "tinny"; the bass just doesn't sound as good as it does when played on sub-woofers. That said, don't forget how remarkable it is that you can still "hear" the non-existent fundamental tone at all (which helpfully prevents the need for people to jog with sub-woofers attached to their ears). From [miss-fund]:

For example, when a note (that is not a pure tone) has a pitch of 100 Hz, it will consist of frequency components that are integer multiples of that value (e.g. 100, 200, 300, 400, 500.... Hz). However, smaller loudspeakers may not produce low frequencies, and so in our example, the 100 Hz component may be missing. Nevertheless, a pitch corresponding to the fundamental may still be heard.

(Note that virtual pitch is a special case of (1) the feature vector understanding that we give in Section 2.5 "Recognition: Feature Vectors" and (2) the concomitant effect of false recognition that we speak of in Section 2.5.2 "False Recognition", where here virtual pitch is the false recognition of the Harmonic Series.)

(See Section 6.2 "Terhardt Does Not Explain Sustained and Minor Chords" for an illustration by Coren [Coren1972] (as quoted by Terhardt [Terhardt1974-PCH]) which shows standard visual illusions as a metaphor with virtual pitch.)

(In "How to Play From a Fake Book" [Neely1999] says that when playing a chord, you can drop not only the Root of the chord, but also the Fifth and the listener will still hear the chord; see Section 3.5.4 "Chords Inducing Ambiguity". We should point out that here we speak of omitting one note from a chord, a collection of multiple notes, or multiple series-es of tones, whereas virtual pitch is a phenomenon of omitting one tone from a single Harmonic Series of tones of a single note. However we argue later in Section 2.3.2 "Harmony Induces Two Kinds of Intervals: Horizontal Within the Note and Vertical Across the Notes" that these two situations are closely related and therefore the fact that it works to omit the Root or Fifth of a chord is actually the phenomenon of virtual pitch again and is thus more evidence for our theory that the brain is listening for the Harmonic Series.)

2.1.2 Using Greatest Common Divisor as the Missing Fundamental

What is the means by which the brain determines the missing fundamental? From [acoustical-demo, Demo 21], "Shift of Virtual Pitch":

A tone having strong partials with frequencies of 800, 1000, and 1200 Hz will have a virtual pitch corresponding to the 200 Hz missing fundamental, as in Demonstration 20. If each of these partials is shifted upward by 20 Hz, however, they are no longer exact harmonics of any fundamental frequency around 200 Hz. The auditory system will accept them as being "nearly harmonic" and identify a virtual pitch slightly above 200 Hz (approximately 1/3 * (820/4 + 1020/5 + 1220/6) = 204 Hz in this case). The auditory system appears to search for a "nearly common factor" in the frequencies of the partials.

There is a simple algorithm for finding the Root of a partial overtone series:

Given a set of tones, hear the (approximate) Greatest Common Divisor (gcd) of the tones as the fundamental.

2.1.3 Even Animals Seem to Compute the Ideal Harmonic Series

This conjecture on the brain creating virtual pitch seems to hold even for non-humans, as pointed out in "This is Your Brain on Music" by Daniel J. Levitin [Levitin2006, p. 41] (emphasis in the original):

When I was in graduate school, my advisor, Mike Posner, told me about the work of a graduate student in biology, Petr Janata.... Peter [sic] placed electrodes in the inferior colliculus of the barn owl, part of its auditory system. Then, he played the owls a version of Strauss's "The Blue Danube Waltz" made up of tones [by "tones" here he means what we are calling "notes": each note is an entire series of overtones] from which the fundamental frequency [what we are calling the fundamental tone of the overtone series] had been removed. Petr hypothesized that if the missing fundamental is restored at the early levels of auditory processing, neurons in the owl's inferior colliculus should fire at the rate of the missing fundamental. This was exactly what he found. And because the electrodes put out a small electrical signal with each firing -- and because the firing rate is the same as a frequency of firing -- Petr sent the output of these electrodes to a small amplifier, and played back the sound of the owl's neurons through a loudspeaker. What he heard was astonishing; the melody of "The Blue Danube Waltz" sang clearly from the loudspeakers: ba da da da da, deet deet, deet deet. We were hearing the firing rates of the neurons and they were identical to the frequency of the missing fundamental. The harmonic series has an instantiation not just in the early levels of auditory processing, but in a completely different species.

Michael O'Donnell pointed out to me that there is an ambiguity here [O'Donnell, 14 February 2009]:

[The above story] doesn't allow one to distinguish whether the Owl, or the human listener, is experiencing the virtual pitch.

I passed this on to Daniel J. Levitin; his response [Levitin, 24 May 2010]:

You're absolutely right that these two possibilities need to be distinguished. The electrodes that were placed in the brain of the owl (in the inferior colliculus) were analyzed using specotrograms[sic] and fourier[sic] analysis. It was clear that the signal itself coming from the owl's brain had replaced the missing fudnamental[sic]. It was only after this analysis that Petr thought to hook it all up to play the signal over loudspeakers (so that humans could hear the output) as a cool demonstration.

Female Mosquitoes only mate when rate of the wing-beats of the male harmonize at a Perfect Fifth above the rate of her wing-beats (we start introducing musical terminology such as the Perfect Fifth in Section 3.1 "The Major Triad"). From "Mosquitoes make sweet love music" [Mosquito-harmony]:

The familiar buzz of a flying female mosquito may be irritating to humans, but for her male counterpart, it is an irresistible mating signal. Males and females each have their own characteristic flight tone - which they create by beating their wings. But when scientists from Cornell University listened in on a male Aedes aegypti pursuing his mate, they were surprised to hear a new kind of "music" playing.... The amorous couple began to beat their wings together at a matching frequency - 1,200 hertz. This love song is a "harmonic", or multiple, of their individual frequencies - 400 Hz for the female and 600 Hz for the male.... "So we're trying to discover what makes a male more attractive. It's a mystery. It could be his odour[sic], or his bright black and white markings. "But we think females are assessing the fitness of males based on how well they can sing."

2.2 Artifacts of Optimization

The brain has constrained resources. Evolution has no time to waste and therefore these resources are likely used in an optimal way -- or at the very least any easy optimizations will have been done for a given organization of a brain. (That is, evolution will drive a machine into a local optimum, even if it gets stuck there and does not reach a global optimum.)

Having separate hardware in the brain for recognizing each combination of tones that co-occur in nature is sub-optimal and it would just be an expensive way to use up neurons. The algorithm every engineer resorts to in this situation, and what I suspect the brain does also, is to find a way to "re-use code": to solve the problem by generalizing the hardware a little so the same "code" can be used in many more situations. Here, we want one Harmonic Series recognizer that works for all the different overtone series-es we may encounter.

Further, the problem that the brain is solving when listening to music is recognizing sounds that are important to it, such as perhaps the nuances of a human voice against a background of noise. In order to recognize something, it is ok to simplify the input or throw away information if it makes the problem easier, as long as enough information is retained to complete the task.

We now consider two different tricks for greatly simplifying the computation the brain must do in order to recognize the harmonic series. We will also conjecture some computational artifacts of the way the brain computes that should result from these optimizations, resulting in well-known universal features of music: relative pitch and octaves.

2.2.1 Relative Pitch: Differences Between Sounds

Again, most engineers would tell you that, given the problem of designing a brain to recognize the Harmonic Series, their intuition would tell them to build one, single Harmonic Series recognizer, not a different one for every possible note. The way to accomplish this would be to make the machine recognize only that which is the same (or mostly the same) in all overtone series-es and ignore that which changes. While the tones of different Harmonic Series-es differ, conveniently the ratio of their frequencies to their fundamental frequency does not. Therefore we consider it very likely that

Conjecture Four: The brain normalizes tones by dividing tones to get tone ratios.

Recognizing ratios of tones (and notes) more strongly than the absolute tones themselves is a phenomenon called "Relative Pitch" [rel]. A ratio of a pair of tones (or notes) is called an "interval".

2.2.2 Octaves: Sounds Normalized to a Factor of Two

Processing sound requires operating on frequencies over several orders of magnitude. If these frequencies could be made to "wrap-around" then we have another opportunity for code re-use.

When the police take a mug shot of a criminal, their goal is to take the photo in such a way as to maximize the recognizability of the subject in the future given the photo. They employ a common trick used in the recognition problem: they photograph the subject in standard positions (front and profile), under standard lighting conditions, against a standard backdrop, and after removing any obscuring clothing. We say they normalize the photograph: they remove information irrelevant to the thing to be recognized and put it in a standard form; doing this helps recognize the thing later.

Consider the conceptually straightforward process of the brain halving or doubling the frequency of a wave until it is within a particular range. Now the brain only needs a Harmonic Series recognizer for tones within a frequency range of a single factor of two, not across the whole spectrum of sound. Breaking the problem into two parts like this, (1) normalization followed by (2) recognition, greatly simplifies the resulting frequency recognizer. We therefore consider it likely that

Conjecture Five: The brain normalizes tones by halving or doubling them until within a particular frequency range spanned by a factor of two.

The individual computational units of the brain are not as fast as those in modern electronics, however those of the brain are operating in "massive parallel": many operations may be computed at once and all that is needed is that one find the answer. To the intuition of anyone who has seen hardware designed it seems very likely that the brain is halving/doubling frequencies by many different powers of two in parallel and then running all of the results through the frequency recognizer at once. If any one matches, the harmonic has been found.

If this were so, then tones (and notes) that differ from each other by a factor of two would sound very much alike. The range of notes that are all within one factor of two is called in music an "Octave" [oct]. ("Oct" is Latin for eight, not two; the relationship to the number eight will become clear later.) Levitin again from "This is Your Brain on Music" [Levitin2006, p. 29]:

Here is a fundamental quality of music. Note names repeat because of a perceptual phenomenon that corresponds to the doubling and halving of frequencies. When we double or halve a frequency, we end up with a note that sounds remarkably similar to the one we started out with. This relationship, a frequency ratio of 2:1 or 1:2, is called the octave. It is so important that, in spite of the large differences that exist between musical cultures -- between Indian, Balinese, European, Middle Eastern, Chinese, and so on -- every culture we know of has the octave as the basis for its music, even if it has little else in common with other musical traditions.

Again, according to Levitin, the Octave interval occurs in every musical tradition in the world. This observation is the first of many to suggest that the musicality of sound depends on something universal about human beings, rather than simply being learned from culture.

2.3 Harmony: Sweetness is the Ideal

Recall from Section 1.5.1 "Timbre: Systematic Distortions from the Ideal Harmonic Series" that the brain uses differences from the ideal/cartoon model as a kind of or "personality" or in this case "timbre". Recall from the same section that Levitin suggests that we use this timbre to solve the important problem of recognizing people and their emotional state. But being perfect makes this recognition hard; from "What Caricatures Can Teach Us About Facial Recognition" [Austen-caricature] (see Section 2.5.2 "False Recognition" for more):

[W]hen you talk to these artists about their process, you realize that the psychologists have gotten the basics down pretty well. When Court Jones, the 2005 Golden Nosey winner, describes how he teaches the craft to younger artists, he lays out exactly the algorithm that vision scientists believe humans use to identify faces. Students, he says, should imagine a generic face and then notice how the subject deviates from it: "That's what you can judge all other faces off of." Also, just as a vision scientist would predict, symmetrical faces -- those close to our internal average -- are especially difficult to caricature. People at the convention mention struggles with Katy Perry and Brad Pitt; the animator Bill Plympton, a guest speaker at the convention, tells me that Michael Caine has long been a bête noire. The same principle explains why the person at the convention with maybe the least symmetrical of faces appears by week's end in no fewer than 33 works of art on the ballroom walls.

I don't think I need a citation to claim that Katy Perry and Brad Pitt are considered to be very beautiful people. This suggests another conjecture.

Conjecture Six: Absence of distortion (or personality or timbre) is sweetness.

2.3.1 Recreating an Ideal Harmonic Series using Instruments having Systematically-Distorted Timbre

In Section 1.5.1 "Timbre: Systematic Distortions from the Ideal Harmonic Series" above we saw that the overtone series of a single instrument is easily distorted by myriad physical effects. However, recall that for the same (kind of) instrument, those distortions were systematic and reliable. Therefore by playing

multiple notes,

on instruments having the same (or similar) timbre,

and relying on Relative Pitch to subtract the differences for us,

from distorted overtone series-es we can magically recreate parts of the ideal Harmonic Series!

2.3.2 Harmony Induces Two Kinds of Intervals: Horizontal Within the Note and Vertical Across the Notes

Suppose we play two notes on the piano that are a Fifth (a factor of 3/2) apart. Per O'Donnell's comment in Section 1.5.1 "Timbre: Systematic Distortions from the Ideal Harmonic Series" above, since piano strings are not the strings of ideal physics, they don't make an ideal Harmonic Series. Instead, each tone in the series is moved by being multiplied by some fudge factor. However notice that strings on the piano are made of the same stuff, at least nearby strings, and this fudge factor should therefore be somewhat consistent across strings. That is, two corresponding tones at the same point in the overtone series of two different notes should get multiplied by the same fudge.

Tones of 1st note: 1 ---> (1 * 2 * fudge2) ---> (1 * 3 * fudge3) ... - ---------------- ---------------- | | | | | | v v v --- ---------------- ---------------- Tones of 2nd note: 3/2 ---> (3/2 * 2 * fudge2) ---> (3/2 * 3 * fudge3) ...

Now notice that there are two kinds of intervals of tone pairs:

"horizontal": intervals made by pairs of tones within the one series of tones generated by one note, and

"vertical": intervals made by pairs of tones across the two series-es of tones generated by the two different notes, especially those of corresponding overtones.

2.3.3 Vertical Intervals Have Pure Ratios

As O'Donnell points out above in Section 1.5.1 "Timbre: Systematic Distortions from the Ideal Harmonic Series", real instruments can systematically produce overtones at frequencies different from those of the ideal Harmonic Series; one such instrument is the piano which produces stretched overtones. However, these distortions from the ideal Harmonic Series affect these horizontal and vertical intervals differently:

Horizontal intervals are fudged: the ratio of overtone 3 of the 2nd note to overtone 1 of the 2nd note has fudge in it:

(3/2 * 3 * fudge3) / (3/2) = 3 * fudge3,

Vertical intervals are pure: the ratio of overtone 3 of the 2nd note to overtone 3 of the 1st note is pure:

(3/2 * 3 * fudge3) / (3 * fudge3) = 3/2 (pure!).

However, I would be remiss if I did not point out here [acoustical-demo, Demo 31], "Tones and Tuning with Stretched Partials" from "Auditory Demonstrations" CD, quoted in Section 5.1 "Helmholtz's Theory Relies Only On Interfering Overtones, But Harmony Is Something More". In Demo 31, a piece by Bach is played on computer-generated piano (part 1) having normal overtones and (part 4) having overtones where an Octave is stretched from a factor of 2 to a factor of 2.1. Taken naively, our theory that the purity of vertical intervals matters to the brain suggests that these should both harmonize; however the normal one (part 1) certainly sounds better. We suggest therefore that if the horizontal intervals are distorted grossly enough, then the fact that the vertical intervals are pure cannot save the harmony from being destroyed by the dissonance of the horizontal intervals.

2.3.4 Vertical Intervals Have Balanced Amplitudes

As Levitin points out above in Section 1.5.1 "Timbre: Systematic Distortions from the Ideal Harmonic Series", real instruments can systematically produce overtones at amplitudes different from those of the ideal Harmonic Series; one such instrument is the clarinet which emphasizes the odd overtones. Again however, these distortions of the ideal Harmonic Series affect these horizontal and vertical intervals differently:

Horizontal intervals are sometimes made by a pair of tones having unbalanced amplitudes: for example, with the clarinet the ratio of an odd overtone to an even overtone will be an interval between a loud tone and a soft tone.

Vertical intervals are always made by a pair of tones having balanced amplitudes: again, the amplitude variations are systematic, so the tones that are paired up vertically will have the same amplitude variations.

2.3.5 Vertical Intervals Are All The Same Ratio

Further, these two kinds of intervals are going to show up very differently to the relative pitch detector:

Horizontal intervals are only one of each kind, a Whitman's Sampler : while there is sweetness in one voice, especially that of a trained singer, as in the horizontal intervals of that voice there is one instance of each interval of the Harmonic Series (albeit with the fudge we mentioned above of horizontal intervals).

: while there is sweetness in one voice, especially that of a trained singer, as in the horizontal intervals of that voice there is one instance of each interval of the Harmonic Series (albeit with the fudge we mentioned above of horizontal intervals). Vertical intervals are all of the same kind, an entire box of chocolate almond cherry: on the other hand when two voices are sung, say, a Fifth apart, there is an entire wall of the same kind of sweetness, a wall of many Fifths coming at you, namely the vertical intervals above, each of which is a Fifth.

(Again, for an introduction to musical intervals such as the Fifth), see Section 3.1 "The Major Triad".)

2.3.6 Harmony is Sweeter Than Sweet

Therefore we see that note ratios induce a set of the same tone ratios. Further these tone ratios are pure, have balanced amplitudes, and are all of the same interval.

This harmonic effect works best if the two notes of an interval are played on the same instrument having therefore the same distortions from the ideal Harmonic Series. My Men's Chorale teacher Bill Ganz told us that to have our voices harmonize, we should sing the same vowels, which supports this theory as the same vowels will have closer timbres [Ganz, c. fall 1991] (Bill says this is a known effect, not something he independently observed; a cursory search does not produce a better reference, so I cite him). Notice that this effect allows instruments making tones that are not anywhere near the Harmonic Series to still harmonize with each other (at least up to a point where the horizontal intervals interfere too much; see the point about [acoustical-demo, Demo 31] in Section 2.3.3 "Vertical Intervals Have Pure Ratios").

The wall of vertical intervals hammer the same relative pitch sensor with a wall of the pure interval one of the features of the cartoon physics ideal Harmonize Series of your brain is looking for. Recall from the introduction to Section 2 "Living in a Computational Cartoon" the effect of pantyhose making a leg look rounder than round; again more on this effect in Section 2.5 "Recognition: Feature Vectors". Harmony is sweeter than sweet. It's impossibly sweet -- impossible for one voice anyway -- which is just what the theory predicts.

2.4 Interestingness: Just Enough Complexity

Anticipation and prediction is one of the fundamental operations of the brain. We suggest that there is an art to balancing the simplicity and complexity: if understanding and predicting a storyline are too easy, then it is boring, and if too hard, then it is noise, but if just right, then it is interesting. As we discuss below, (1) simplicity comes from data having a "theme" and (2), ambiguity is the absence of a single explanation or theme and therefore a good way to rapidly produce complexity. See Section 7.2 "The Role of Narrative Generally" for how theme and ambiguity are unified to make narrative.

2.4.1 The Simplicity of Theme

People frequently experience that, before receiving information, having an expectation as to the context of that information, its theme, helps considerably in the processing of it. For example, people who speak more than one language sometimes have the experience of hearing words (1) in a language that they know, but (2) that they were not expecting, and therefore not understanding those words until they "listen" to them again in their mind from within the context of the language in which those words were spoken. There are myriad examples of context influencing how something occurs to someone.

Surprise Reduction: The technical name for the amount of expected information one gets from situation is the entropy [ent] [Wilkerson-entropy]. Some call the entropy of a measurement the amount of surprise one expects get out of it. Clearly, if one knows more about what to expect in a situation, the amount of surprise can be greatly reduced. Since it is work to process information, we suggest that the brain likes to have reliable expectations in order to minimize the amount of surprise it is dealing with all day.

Model Inference: Life is full of situations where we may observe the consequences of a situation but are not told explicitly what is the state of the situation. There is nothing left to do but to infer a model of the state of affairs from observation of many details, and therefore inference is likely a constant activity of the brain. For example, people often infer the rules of a game from observation and without reading the rules.

Have you ever seen someone color-coordinate their clothes or even their room? Have you ever been to a "theme party" where everyone was to dress and act from a given era or situation? How about a "theme restaurant" or "theme park"? Having a theme for all of the elements of a given situation

(surprise reduction) reduces the amount of new information or "surprise" that each one introduces, and

(ease of inference) allows the brain to construct a whole from the parts.

Conjecture Seven: The brain wants input to have a theme. That is, the brain both infers themes from input and uses themes as context when processing input.

2.4.2 The Complexity of Ambiguity

Feldman2006, p. 307, 308

Please read the following sentence aloud slowly, word by word:

The horse raced past the barn fell. Sentences like these are called garden-path sentences because, in slow reading, we often notice that we have followed an analysis path that turned out to be wrong.... But why are people surprised in garden-path situations? The brain is a massively parallel information processor and is able to retain multiple active possibilities for interpreting sentence, scene, and so on. Well, there must be a cutoff after which some possible interpretations are deemed so unlikely as to be not worth keeping active. The final piece of their [referring to a model given by other researchers] model was an assumption that a hypothesis was abandoned if its belief net score was less than 20% of that of its rival. We experience surprise when the analysis needed for a full sentence is one that was deactivated earlier as unlikely. This is a complex computational model, but nothing simpler can capture all the necessary interactions.

The input the brain gets as we live life is inherently and often wildly ambiguous. Alternatives multiply and so the number possible ambiguities in a situation can easily grow exponentially. No machine can keep up with the demands of a problem the size of which grows that fast. Therefore:

Much of the brain is a massive disambiguation engine that is running all the time and is functioning at its computational limit.

Jokes are often of the form of an ambiguity of contexts/themes resolved by a punchline which evaluates one way in one context and another way in the other context (say true in one and false in the other); the story that precedes the punchline serves to amplify the weaker context, the weaker side of the ambiguity, so as to maximize the punch of the line by making it break symmetry between two almost equal contexts/themes. Story plots are often of this form as well, in particular mysteries. The language of Shakespeare is full of double meanings and even perhaps a triple meaning here and there. These are all to the same purpose:

Conjecture Eight: The brain enjoys having its disambiguation engine teased.

2.5 Recognition: Feature Vectors

I need to introduce yet another computational idiom: the feature vector [feat]. It is actually a completely straightforward idea that you already use every day. Think of how you summarize a thing when you post an online ad to sell it. Suppose you are selling a car. You might very well put in the ad the total volume of the cylinders in the engine. Your probably won't list the number of bolts in the engine. You probably will list how many miles the engine has driven. You probably will not list the number of hours the radio has been on (even if you knew it). The point is that

Humans naturally abstract ; that is, they retain the features that are important for a given purpose and discard the rest.

All language is abstraction. Suppose I point at a chair and I say "what is that?" You say "that is a chair." I say "are you telling the complete truth?" You say "yes!" I lean down and look very closely and I say "yea, but you didn't mention this little scratch down here...." You roll your eyes in annoyance.

An abstraction is a reduced amount of information that still serves the purpose. In the context of recognizing a thing as a member of a class, an abstract adjective is called a "feature". Usually there is more than one, so we collect them together into a "vector", which just means a list where the elements are not interchangeable (that is, you can't swap the mileage and the year of a car without severely changing the meaning of the car ad).

Once we have described a class of inputs as a vector of features, we have a clear algorithm for recognizing a thing as being a member of that class:

Whenever we encounter a thing, for each feature (in parallel), check if that feature is present. If all (or most) of the features in the vector are present ("fire"), then recognize the thing as being in the class abstracted by the feature ("fire" the whole recognizer).

Note that the second part above which looks for the conjunction of features may be realized by a more sophisticated mechanism than a simple AND gate that just fires its output when all of its inputs have fired: a simple conjunction mechanism would be too "brittle" in the face of the noisy input of the real world. For example, even plants such as the Venus Fly Trap can compute a rather sophisticated conjunction of features before recognizing a fly [venus-fly]:

The trapping mechanism is so specialized that it can distinguish between living prey and non-prey stimuli such as falling raindrops; two trigger hairs must be touched in succession within 20 seconds of each other or one hair touched twice in rapid succession, whereupon the lobes of the trap will snap shut in about 0.1 seconds.

Recall that in the case of virtual pitch, the feature recognition mechanism seems to find the greatest common divisor of the tones presented; that is, this recognizer uses a special wholistic property of this particular set of features in order to work well in the face of missing features. Recall that a timbre amounts to the systematic absence of parts of the idea Harmonic Series and that real sounds (in particular, voices) exhibit a range of timbres; thus the Harmonic Series recognizer must be able to robustly find the fundamental even when some of the tones are missing. See Section 2.1.1 "Virtual Pitch: Hearing the Harmonic Series Even When it is Not There", Section 2.1.2 "Using Greatest Common Divisor as the Missing Fundamental", and Section 1.5.1 "Timbre: Systematic Distortions from the Ideal Harmonic Series".

2.5.1 Soft Computing

Machines are good at crisp, mechanical behavior, such as adding huge lists of numbers. This is fun for a while, but it can get old.

I don't often need huge lists of numbers added, but I really would like to go to an online auction site and find a car that is "sort of" like my ideal car which I might be willing to describe.

You will notice the use of the non-crisp or "soft" phrase "sort of" in the previous problem specification. Some people try to get machines to do this sort of soft reasoning that humans do so well. It can sometimes be done, at least within a very constrained context of, say, shopping for cars or plane tickets. Such a discipline is called Artificial Intelligence or Soft Computing or Machine Learning or Statistical Inference, depending on exactly how one goes about it and who is providing the research funding. The important thing for us is that describing problems using feature vectors is a very general and widely used technique. Recalling our conjecture that computational laws are universal, we would not find it surprising if

Conjecture Nine: The brain uses feature vectors for recognition.

2.5.2 False Recognition

To get the brain to (1) have the experience of the presence of a thing, it is not necessary to (2) present the actual thing to the brain. It is enough to just present anything that fires the feature vector in the brain assigned to recognize that thing. That is, if I want you to think "hamburger" I don't have to show you a hamburger, only a picture of one. Recall the example from Section 2 "Living in a Computational Cartoon" of pantyhose making a leg look rounder than round, impossibly round.

It is pretty easy to tell the difference between a photograph of something and the thing itself: you wouldn't accidentally eat a photograph of a hamburger. Yet at the same time the picture definitely says "hamburger" to your brain, often strongly enough that you are willing to part with some money to have a real one right now! But it gets even weirder.

Have you ever seen a cartoon of a person that looks more like the person than the person does?

Some political cartoonists are very good. They

pick some very unusual features of the person, and then exaggerate those features.

Amazingly, what can result is something that looks more like the person than the person. From [Harmon-art-brain]:

As someone who has worked in pen and ink for decades, cartoonist Jules Feiffer realizes that "what we see is often quite divorced from what is actually there," he noted. He calls the two-dimensional representations metaphors, noting that "the metaphor is often more understandable than the real thing." And research on the perception of faces reveals that the human brain and individual neurons are tuned to extreme representations, explained Margaret Livingstone, a professor of neurobiology at Harvard Medical School. Her research has shown that people are much quicker to recognize caricatures of people than documentary photographs, showing how the brain at work prizes the representative over the more factual.

From "What Caricatures Can Teach Us About Facial Recognition" [Austen-caricature] :

At the University of Central Lancashire in England, Charlie Frowd, a senior lecturer in psychology, has used insights from caricature to develop a better police-composite generator. His system, called EvoFIT, produces animated caricatures, with each successive frame showing facial features that are more exaggerated than the last. Frowd's research supports the idea that we all store memories as caricatures, but with our own personal degree of amplification. So as an animated composite depicts faces at varying stages of caricature, viewers respond to the stage that is most recognizable to them. In tests, Frowd's technique has increased identification rates from as low as 3 percent to upwards of 30 percent. . . . "A lot of people think that caricature is about picking out someone's worst feature and exaggerating it as far as you can," Seiler says. "That's wrong. Caricature is basically finding the truth. And then you push the truth."

The features can be anything that is important to the task of recognizing that person: a nose or lip shape, etc. -- technically, this feature has a lot of "information". An good example of this I remember was a yellow smiley face that had red blotch on its forehead -- everyone knew it was Mikhail Gorbachev [gorb].

While a thing may induce a feature vector in the brain for use later in recognize the thing, some other things will also fire that vector, causing artificial recognition.

2.5.3 Cubism: Partial Recognition Due to Redundant, Over-Determined Feature Vectors

There is no rule that says that the features in a feature vector must be independent, that for every subset of features, there is some input that will fire those features and not the others. If the brain is doing all it can to recognize things as fast and cheaply as possible, it is going to use the most effective features it has and some redundant / over-determined sets of features can easily arise.

Hmm, what would you experience if some but not all of the features in a vector were to fire? Note that, while there may be no natural input that can cause this, that does not mean that there is no such art-ificial input. This leads to interesting phenomena that can be exploited by artists.

Cubism is a form of art from the early 20th century that has a certain particular quality:

the parts of an object may be rendered reasonably faithfully so that one recognizes them,

however they do not arrange into a whole in a coherent way.

This produces an interesting effect:

we recognize the object, as the features we require for recognition do fire,

although we still have an overall feeling that we are not seeing the thing in it's natural form, but instead in a disturbed or unhappy or dreamy state.

You may say "Of course it looks disturbed! It's all messed up!" But think for a moment: if it is all messed up, how is it that it looks like anything at all? Again, per Section 2.5.2 "False Recognition", because the features are present.

Consider Picasso's "Head of a Woman" [Picasso1938] on the right. One eye is in profile and the other is straight ahead, a physical impossibility. Yet we have no trouble at all instantly recognizing a woman.

3 Harmonic Music Explained

What can we make of all of this? Do the above insights into physics and computation yet provide enough information for us to derive something that we recognize as music? For example, can we compute a set of notes that will sound good when played together?

Recall the observation of Section 2.3.2 "Harmony Induces Two Kinds of Intervals: Horizontal Within the Note and Vertical Across the Notes" that two notes induce parallel vertical series-es of overtones all of the same ratio means that note ratio and tone ratio are intimately connected. That is, from now on, when speaking of two notes that are in a ratio, what we really mean is that the overtone series-es of the two notes make two series-es of vertical tones having that ratio. From now on we will omit reiterating this point and simply speak of "the ratio of two notes making an interval within the Harmonic Series".

3.1 The Major Triad

Let's try the simplest thing we can that will generate notes that the brain wants to hear together (recall from Section 2.1 "Searching for Harmonics" how much the brain wants to hear the Harmonic Series):

find the ideal Harmonic Series induced by, say, Middle C,

map it into one Octave by dividing by two whenever necessary,

replace tones with notes as, again, these notes will induce the same (vertical) intervals as the tones.

Note that in Scientific Pitch Notation [sci-pitch] the particular Octave that contains Middle C is "Octave 4", the next Octave up is "Octave 5", etc., where we increment the octave number each time we cross the note C. Starting at C4 the sequence of notes we get is as follows.

Factor of 1: The fundamental: C4.

Factor of 2: This is just C5 (up one Octave); dividing by 2 gives us 1 times C4 = C4 again, so no new note in the collection.

Factor of 3: This is G5; divide once by 2 gives us 3/2 times C4 = G4. This is the first really "interesting" different note.

Factor of 4: This is C6; dividing twice by 2 gives us 1 times C4 = C4 again, which we have already in our collection.

Factor of 5: This is close to E4; dividing twice by 2 gives us 5/4 times C4 = E4. Ah, another new and "interesting" note.

Factor of 6: This is G6; dividing twice by 2 gives us 6/4 times C4 = 3/2 times C4 = G4, which we already have in our collection.

Let's stop here. (We stop at harmonic 6 in particular for a reason that will become clear later.) The starting tone/note of Middle C is arbitrary, but the ratios we we get, namely 1, 5/4, and 3/2, times the fundamental, are not. Three notes in these ratios are called "The Major Triad". There are standard names for these notes (relative to the fundamental): reordering them from the harmonic order above to their numeric order when folded down into one Octave, the first note (1) is called the "Root", the second (harmonic 5, so in this Octave 5/4) is called the "Major Third", and the third (harmonic 3, so in this Octave 3/2) is called the "Perfect Fifth" (!). The weirdness of musical nomenclature is just beginning. Note further that it is unclear which terms should be capitalized; we treat as proper nouns any illusory Platonic ideal objects created by the mind: "Harmonic Series", "Major Triad", etc. Again, these names of the intervals reflect their position in the Major Scale, described below, and as you can see, confusingly do not correspond to their order in the Harmonic Series.

Major Triad Root: (harmonic 1): 1 = 1.0. Major Third (harmonic 5): 5/4 = 1.25. Perfect Fifth (harmonic 3): 3/2 = 1.5.

Note that according to our measure of interestingness in Section 2.4 "Interestingness: Just Enough Complexity", the intervals of the Fifth (factor of 3) and the Third (factor of 5) are, in that order, the most interesting intervals: (1) they are in the theme of the Harmonic Series, while also (2) they have some complexity resulting from not being a simple power of two times the Root (which if they were would make them subject to the octave effect tending to make two notes sound like one; that is, a factor of 2 is too boring).

If we pick C as the Root (as we did above) then the resulting Major Triad is called the "chord" of C Major. The starting node of "C" was arbitrary; however the resulting triad was not. Is it so surprising that this Major Triad is everywhere in music? It sounds rather nice to play notes in the C-Major Triad; try it. However, after a while it is a little boring, so we would like to add some variety. How little complexity can we add and yet still change something?

3.2 The Major Scale

The Major Scale [maj] is so fundamental to Western music that it is even "built into" the notation (the Major Scale is sometimes called the Diatonic Scale, although the term "Diatonic" seems to mean different things depending on who you ask, therefore instead I use the less ambiguous term "Major"): if you play notes by going up the white keys of a piano keyboard one step at a time, which is the same as going up the alternating lines and spaces of an unadorned musical score, you are playing the C Major Scale. Is this Major Scale arbitrary or is it somehow fundamental to the way the brain hears? If it is fundamental, we should be able derive such a thing simply from first principles as we suggested in the introduction. Let's try it.

3.2.1 Interlocking Triads

Well, we like the Major Triad, so let's make another one, but starting with a different note as the fundamental. To preserve as much theme with the previous triad, let's start with the "closest" notes to the C that we have in our first triad: The first note other than C that we hit was 3/2 times the Root, also called the Perfect Fifth; therefore let's build a triad using 3/2 times C4 = G4 as the fundamental. Let's remember to divide by 2 when necessary to keep everything within the same Octave.

Major Triad Up by a Perfect Fifth Root: 3/2 * 1 = 3/2 = 1.5. Major Third: 3/2 * 5/4 = 15/8 = 1.875. Perfect Fifth: 3/2 * 3/2 = 9/4, which is bigger than 2, so divide by 2, giving: 9/8 = 1.125.

Ok, that was so much fun let's go in the other direction as well. That is, let's make yet another Major Triad where that the Perfect Fifth of that Triad is the Root of our first Triad. That means multiplying by 1/(3/2) = 2/3; therefore let's build a triad using 2/3 times C4 = F3 as the fundamental. Let's be sure to multiply by 2 when necessary to keep everything within the same Octave. (Note that throughout we use "~" (tilde) to mean "almost equals".)

Major Triad Down by a Perfect Fifth Root: 2/3 * 1 = 2/3 which is smaller than 1, so mult by 2: 4/3 ~ 1.333. Major Third: 2/3 * 5/4 = 5/6 which is smaller than 1, so mult by 2: 5/3 ~ 1.666. Perfect Fifth: 2/3 * 3/2 = 1 = 1.0.

Note that the selection of three interlocking triads is suggested by our measure of interestingness from Section 2.4 "Interestingness: Just Enough Complexity". That is, using three overlapping Major Triads (1) maximizes the theme of the Harmonic Series while not requiring any harmonics beyond harmonic 5 (the interval called the Third), while also (2) having some complexity by not all being of one Harmonic Series.

Now we have three "interlocking" Triads: the Perfect Fifth of one is the Root of the next. How many notes is that? Three notes per triad times three triads is nine notes; however two of the notes where the triads interlock are counted twice, so there are 3 * 3 - 2 = 7 unique notes. Let's plot them on a line to see how far they are from one another.

3.2.2 Using Logarithms to Visualize Distances Between Tones/Notes

Wait... before we do that, when plotting notes, such a plot should "mean something to us". As we saw above, what makes sense would be for the ratios of the notes to have some regularity; that is the multiplicative ratios of frequencies is what our brain is listening to, not the additive distances. For this plot to mean something, we would want equal ratios to show up equally on the plot. How do we turn (multiplicative) ratios into (additive) distances?

The function that does this is called the "logarithm" (or just "log") [log]. Explaining it is beyond the scope of this article, but basically if you count someone's income by how many digits they have in it, you are already familiar with logarithms. In general, a "six-figure income" is ten times that of a "five-figure income" (though a high five-figure income is usually pretty close to a low six-figure income, so the example isn't perfect). That is, by counting the number of figures, you have turned a (multiplicative) ratio of a factor of one ten into an (additive) increment of one figure of income. You see,

After going through the logarithm, multiplicative factors turn into additive increments .

The logarithm we want turns factors into increments in the same way as the income example, except that we care about factors of 2 instead of 10, so we take logs "base 2" (instead of "base 10"): each multiplicative factor of 2 will be displayed as one unit of additive increment in the graph called an Octave. This means that the relative pitch of specific tone (or note) ratios shows up as specific distances between tone (or note) logarithms.

You can skip the next little section if your eyes are glazing over. All you need to remember is that on a logarithmic scale a multiplicative factor looks like an additive increment.

Some Technical Details on Logarithms and Exponents

*** Feel free to skip this section! ***

The "log base 2 of x" is usually denoted something like "log_2(x)"; to avoid visual clutter, we omit the parentheses around "x" when the meaning is unambiguous, writing "log_2 x". When computing the ratio of two logs, as we do in Section 3.5.3 "Chords from the Harmonic Series", the bases of the logs don't matter (they cancel out) and so we omit them, writing simply "log x / log y".

We denote the exponential, raising a base, b, to a power, y, as "b^y". (Note that your browser may or may not render the y as a superscript; in case it does not, I have also redundantly retained the "^" character.) The logarithm and exponential are inverses, so

b^(log_b x) = x and log_b(b^y) = y

You can use exponentials to think about logarithms. When computing the ratios of numbers, imagine each number represented as an exponential of a base, such as 2. Now think about what multiplication and division of that number do to the exponent as the numbers are divided or multiplied. That is, if we think of taking the ratio of two numbers represented as exponents we see that we we are just subtracting their exponents (the logarithms of the original numbers); that is,

2^p / 2^q = 2^(p-q).

3.2.3 The Keyboard Revealed

We now plot all of the notes of the three above-derived interlocking triads. Due to the phenomenon of relative pitch, we express each note as a ratio to the Root of the base Triad. Due to the phenomenon of octaves, we multiply or divide by 2 to keep all the notes within one Octave. Again, since we want the plot to mean something, we take logarithms before we plot so that same multiplicative ratios map to same additive increments. We use 2 as the base of our logarithm so that a factor of one Octave, or 2, corresponds to an additive increment of 1, so all numbers will be between 0 (inclusive) and 1 (exclusive).

To review, the three fractions multiplied to obtain the fraction for the note are in order:

which Major Triad: 1/1 for C, 3/2 for up a Fifth, 2/3 for down a Fifth,

which element of that Major Triad: 1/1 for Root, 5/4 for the Third (harmonic 5), 3/2 for the Fifth (harmonic 3),

multiply or divide by more factors of 2 to keep the result in the same Octave.

Multiplying all of that together and taking log base 2 we get the following:

Base Major Triad Root: (1/1) * (1/1) * (1/1) = 1/1; log_2 1/1 ~ 0.000. Major Third: (1/1) * (5/4) * (1/1) = 5/4; log_2 5/4 ~ 0.322. Perfect Fifth:(1/1) * (3/2) * (1/1) = 3/2; log_2 3/2 ~ 0.585. Major Triad up by a Perfect Fifth Root: (3/2) * (1/1) * (1/1) = 3/2; log_2 3/2 ~ 0.585. Major Third: (3/2) * (5/4) * (1/1) = 15/8; log_2 15/8 ~ 0.907. Perfect Fifth:(3/2) * (3/2) * (1/2) = 9/8; log_2 9/8 ~ 0.170. Major Triad down by a Perfect Fifth Root: (2/3) * (1/1) * (2/1) = 4/3; log_2 4/3 ~ 0.415. Major Third: (2/3) * (5/4) * (2/1) = 5/3; log_2 5/3 ~ 0.737. Perfect Fifth:(2/3) * (3/2) * (1/1) = 1/1; log_2 1/1 ~ 0.000.

Note that this may look more complicated than it really is. All that is going on numerically is playing with factors of 2, 3, and 5, in a rather systematic way, as follows:

The three Triads are each a factor of 3 (a "Fifth") apart.

Within each Triad, we have a factor of 3 (a "Fifth") and a factor of 5 (a "Third") from the Root.

We multiply or divide by 2 (an "Octave") enough times to keep everything in one Octave.

Notice that the importance of the numbers 2, 3, and 5 is not uniform: 3 is used most prominently, 5 is more secondary, and, going the other direction, 2 is so boring we just throw it in wherever we like. This reflects our observation from Section 3.1 "The Major Triad" of the different harmonics, a factor of 3 seems to have the right amount of complexity to be most interesting, so it gets top billing (see Section 2.4 "Interestingness: Just Enough Complexity" for more on interestingness in general).

Sorting and Plotting the Three Triads on One Line

Now we sort the logarithms and remove duplicates. Let's give them letter names and for some strange reason let's start at C instead of A and wrap around.

C: log_2 1/1 ~ 0.000. D: log_2 9/8 ~ 0.170. E: log_2 5/4 ~ 0.322. F: log_2 4/3 ~ 0.415. G: log_2 3/2 ~ 0.585. A: log_2 5/3 ~ 0.737. B: log_2 15/8 ~ 0.907.

Now let's plot them on the unit interval to within 0.02 units.

C D E F G A B C +----+----+----+----+----+----+----+----+----+----+ 0 1 2 3 4 5 6 7 8 9 0

Hmm, now that's interesting, if we were to fill in a few gaps they would look almost evenly spaced. Following music theory, we'll call the big gaps "tones" (yes, this is a different meaning of the word "tone") and the small gaps "semi-tones". (This meaning of "tone" and "semi-tone" will not occur very often, and below I try to use only semi-tone.) I'll fill in the big gaps with a hash sign (I'll omit computing any exact values for them) so all the gaps are now semi-tones.

C # D # E F # G # A # B C +----+----+----+----+----+----+----+----+----+----+ 0 1 2 3 4 5 6 7 8 9 0

Does that look familiar? If not, color the letters white and the hashes black and look again at the picture of the keyboard at the top of the article.

Notice that there was no resorting to the following arguments:

"Because the Ancient Greeks did it this way."

"Because if the notes were equally spaced your ear would lose its place."

"Because your culture has trained these notes into your ear since you were a baby."

After I made this derivation of the Major Triad from first principles, a friend of mine Peter McCorquodale pointed me to "Aesthetic Measure" by George D. Birkhoff [Birkhoff1933]. On page 92 in the section "The Natural Diatonic Scale", Birkhoff independently makes the same derivation of the Major Scale as we do above, albeit providing less detail and with no motivation from computer or brain science. Given the Major and Minor Triads, Helmholtz also seems to give the same theory of a key as interlocking chords [Helmholtz1863, p. 300] as we do above, though we argue below that he fails to explain how it is that we find the Major and Minor Triads compelling to listen to in the first place.

While my derivation of the Major Scale above is therefore not a completely original contribution, it is also certainly not well known. It was quite an effort for me to invent it, given my starting point of nothing but curiosity about the problem and disgust with all the books to which I had access. How is it that even music majors in college not know this derivation of the Major Scale?

3.3 Scales and Keys

The notes above are known as the 12-(Semi-)Tone Western (Chromatic) Scale (you will hear people call it the "12-tone Western Scale", including in quotations below). The subset of lettered (or white) keys, omitting the hashes (or black keys), is called the Major Scale.

We can now explain some standard musical terminology. In the particular case of the C-triad, the note E is called the Major Third, as it is the third white key in the Major Scale. Similarly, the note G is called the Perfect Fifth, as it is the fifth white key. Deep huh? We defer discussion on how it is that one is called "Major" and the other "Perfect" until the section on Equal vs. Just Tuning below.

Explaining all musical conventions is beyond the scope of this article, but I mention a few basic ones we will need. Going up a step is called "sharp", denoted "#", and down "flat", denoted "b", so we now have two names for each black key; for example the black key between C and D is both "C#" and "Db".

The "key" of the scale is the Root note of what we called the base triad (the one in the middle of the three interlocking triads); that is, in the example above the key was C Major. It will turn out that there is more than one way to build a scale than to lock together those three Major Triads (F, C, G).

We could use a note other than C as the base of the center triad. We could use another kind of triad other than the Major Triad; we haven't talked about that yet.

3.3.1 Changing Key: Playing Other Groups of Triads

The three interlocking triads we came up with have the Roots C, F, and G. Lots of music uses just these three triads; in fact

The entire genre of music called "Twelve-Bar Blues" [twelve-bb] basically uses only these three triads!

In the hands of a skilled musician, these three triads of the Major Scale can actually be interesting for quite a long time. I once asked my then Jazz piano teacher Ben Stolorow to give me more interesting chord progressions to practice (so I wasn't just playing scales for hours). His response was that F/C/G was plenty interesting enough and he demonstrated by simply arpeggio-ing these three chords (playing the notes of a chord one at a time) while switching between different rhythms [Stolorow, c. 2006]; I remember saying "Wow, I would pay money just to listen to that and it's just three chords!".

If you enjoy playing notes that all lie within the three interlocking triads we made, you will be just fine and dandy with the group of notes above, which we call the Major Scale, and you will never need the black keys on your piano. However, after a while, just as you got bored with one Major Triad, you might get bored with three of them (though, per the above demo by my teacher, it might take longer). That is, if your melody is playing around in one Major Scale, you might want to do the same playing around but all within a different scale made starting with a different key note than C. Making this change is called a "key change".

For example, you might pick a key using two of the triads you have already, C and G, but making G the base triad (instead of C as we did above) and adding one more triad based at the Perfect Fifth above G (which is D). Uh, oh, we don't have all the notes for the D Major Triad in our C Major Scale (the white keys). You can repeat the construction of the C Major Scale above and discover that the missing note is F#. This is how it emerges that piano players playing in a Major Scale that is not C can still have to use some black keys.

3.3.2 Key Changes Break Harmony

Adding F# to our keys can be done; however there is a worse problem. Let's compute the ratios we get if we build a Major Triad starting at D, that is, using the notes D, F#, and A. In particular, if A is to be the Perfect Fifth above D, then their ratio should be 3/2. (I kept so many decimal places below as the fractional part is just too cool to omit.)

We got D as the Fifth above G, and G as the Fifth above C. (We divide by 2 to keep it in the same Octave:) D = (3/2) * (3/2) / 2 = (9/4) / 2 = 9/8. We got A as the Third above F, and F as the Fifth below C: A = (1/(3/2)) * (5/4) = (2/3) * (5/4) = 5/3. A over D is therefore A / D = (5/3) / (9/8) = (5*8)/(3*9) = 40/27 ~ 1.481. Whereas a Perfect Fifth should be 3/2 = 1.5. The error is therefore (PerfectFifth - (A/D)) / PerfectFifth = ((3/2) - (40/27)) / (3/2) ~ 0.0123456790123457 ~ 1.2%.

So if we measure carefully, we notice that, even with the big gaps filled in, the intervals are not all exactly right for playing another key, such as D Major. That is:

If we want to do a key change, we can try (a) just using the same piano we derived for the key of C Major, but (b) playing whatever piano keys we find when we just move "up" a triad; that is, using the same notes as for C Major but making the triad rooted at D.

However, if we compute the note ratios carefully, we see that the ratios for the "triad" rooted at D will not be quite right. They also will not sound right. In fact, if we do more key changes, moving, say, repeatedly "up" by a Perfect Fifth again (beyond D), some of the other triads will be even less right and will start to sound really bad.

Uh, oh. Shou