This paper reveals the way in which musical pitch works as a peculiar form of cognition that reflects upon the organization of the surrounding world as perceived by majority of music users within a socio-cultural formation. The evidence from music theory, ethnography, archeology, organology, anthropology, psychoacoustics, and evolutionary biology is plotted against experimental evidence. Much of the methodology for this investigation comes from studies conducted within the territory of the former USSR. To date, this methodology has remained solely confined to Russian speaking scholars. A brief overview of pitch-set theory demonstrates the need to distinguish between vertical and horizontal harmony, laying out the framework for virtual music space that operates according to the perceptual laws of tonal gravity. Brought to life by bifurcation of music and speech, tonal gravity passed through eleven discrete stages of development until the onset of tonality in the seventeenth century. Each stage presents its own method of integration of separate musical tones into an auditory-cognitive unity. The theory of “melodic intonation” is set forth as a counterpart to harmonic theory of chords. Notions of tonality, modality, key, diatonicity, chromaticism, alteration, and modulation are defined in terms of their perception, and categorized according to the way in which they have developed historically. Tonal organization in music, and perspective organization in fine arts are explained as products of the same underlying mental process. Music seems to act as a unique medium of symbolic representation of reality through the concept of pitch. Tonal organization of pitch reflects the culture of thinking, adopted as a standard within a community of music users. Tonal organization might be a naturally formed system of optimizing individual perception of reality within a social group and its immediate environment, setting conventional standards of intellectual and emotional intelligence.

The phenomenon of tonal organization in music has attracted attention of scholars from numerous fields: music theory, history, ethnomusicology and, more recently, cognitive psychology. Each of these disciplines has elaborated its own framework of study, with its own taxonomy and terminology, making it hard to cross-relate findings from different areas of research. To add to the confusion, there is little correlation between theories that originated in countries of the former Soviet block and Western research. This paper attempts to bring the vast data to a common denominator, based on the framework of cognitive psychology, and identify the principal models of tonal organization in the course of its evolution—from its origin to the rise of Western tonality.

Szabolcsi (1965) came closest to drafting this evolutionary outlook up until the twentieth century, however, he barely touched upon the earliest forms (crucial for separation of music from speech), and limited his research to pentatonic, heptatonic, and chromatic systems—primarily from musicological perspective, based on melodic analysis. Significant gains of archeological (Morley, 2013) and ethnomusicological (Sheikin, 2002) research in the past half-a-century, as well as technological progress in sound analysis tools (Schneider, 2013), allow to draw a much finer picture of typology of early music and relate existing musical traditions to known prehistoric cultures. This paper identifies 12 known stages of tonal development—pentatony being chronologically the 7th in this order.

Emergence of biomusicology (Wallin, 1991) triggered interest in matters of origin of music (Wallin and Merker, 2001), bringing in disciplines of evolutionary biology and neurophysiology (Altenmuller et al., 2015). Such input allowed to reduce controversy that balked the development of evolutionary theory in Western ethnomusicology: when scholars refused to accept the idea of a single “world music” passing through evolutionary changes—instead, envisaging multiple “musics,” each passing its own course of development (Blacking, 1974). In such view, any imposition of cross-cultural categorization would misrepresent native music theories (Nettl, 2005, p. 112). Biological sciences have answered this objection by identifying features of musical perception shared by all humans. These features can establish the foundation for cross-cultural investigation of tonal organization. As such, typology of tonal perception can be linked to typology of tonal composition—materializing Riemann's “relational thinking” that governs listener's ability to realize coherence of melodic contours and intervals (Neuhaus, 2013); and Handschin's (1995) “tone-character” that enables listener to distinguish one pitch from another.

People hear certain tones as matching each other by processing frequency in a particular way. These ways are finite in number, cultivated within a particular social group and are determined by the interaction of individuals within this group and with their environment. One of such methods—Western tonality—has been successfully investigated by cognitive science (Krumhansl, 1990). The methodology of its research can be adapted to study other methods of unification of musical sounds into a perceptual sonic ensemble. This paper drafts the foundation for such research and is broken into two parts: prehistoric and historic. The historic is based on the examination of documented music theory (Christensen, 2008) and organology (Dumbrill, 2005), correlated with analysis of music samples, wherever possible. The prehistoric is based primarily on generalizations from comparative morphological analysis of multiple music samples by experts in a given folk culture. Here, despite speculation and risk of misrepresenting the native music theory, reliance on experts' interpretation is inevitable. Comparative scientific study of different cultures is only possible when data is presented in terms of a coherent comprehensive music theory, and processed by uniformed analytical procedures (Schneider, 2006).

Rational analysis and speculation have been the instruments of scientific investigation of music—kept in check by empirical examination of conclusions (Schneider, 2010). Ideally, the inferred tonal principles should be tested on native listeners to see whether or not they authenticate production of music according to hypothesized rules (Arom, 2010). The established models of tonal unification can then be cross-examined to see if one is derived from another—decided by geographic (Zemcovskij, 2005) and ethnic distribution of certain musical features (Grauer, 2006) vs. estimation of mental processing involved in perception of that music . Finally, the discovered type of mental operation can be related to other cultural activities that have been dated by archeologists —thus, I compare tonal organization with spatial representation in art works. Appendix II (Supplementary Material) offers a novel method of inferring tonal organization from musical instruments based on the methodology by Beliayev (1990). Applied to archeological finds, this allows drafting approximate timeline of the introduction of each of the tonal models. Similarity between social organization in a modern ethnic community and the one revealed by archeological research of the past suggests similarity between their music systems (Both, 2009). Tonal evolution can be as helpful for anthropology as the study of technological modes of manufacturing stone tools (Foley and Lahr, 2003).

Biological and physiological constraints, together with laws of psychoacoustics, determine commonality in music production across different synchronic and diachronic cultures. In this broadest sense, “music” can be defined as the arrangement of sounds in relation to their amplitude, frequency, duration, and spectral content, which entrains groups of people, and is used to transpose intentions in order to emotionally stir the listener in a certain way by means of vocal and/or instrumental performance. Such definition encapsulates pitchless timbre-driven vocalizations that are still encountered in Siberia embedded in pitched music (Ojamaa, 2005), and allows culture-historic comparison of different “musics.”

The proposed stages of tonal organization should not be viewed as phenomenological laws, but as cognitive constructs similar to Piagetian stages of mental development, where each stage represents a particular style of integration of cultural data (Goodman, 1976, p. 11). The idea of applying Piaget's framework onto the evolution of human intelligence was introduced by Wynn (1985) and accepted by many anthropologists as useful means of interpretation, albeit without consensus regarding how exactly the prehistoric cultural periods correspond to Piaget's stages. The progression of “associational,” “logical,” and “hypothetical” stages in culture of thinking (Parker and Jaffe, 2008, p. 188) roughly matches three general “ages” of music:

• indefinite pitch organization that supports timbre and articulation;

• elementary definite pitch organization limited to small sets;

• hierarchical organization that requires parallel top-to-bottom/bottom-to-top operation, exercised through frequent categorization assumptions and their confirmation/negation.

Tonal models appear to be cumulative—music representative of each of them can be encountered within the same culture (Alekseyev, 1986).

Arranged according to their lineage, stages of tonal organization provide unique outlook on development of human consciousness, and establish a frame of reference for understanding the role of music and language as biological markers of Homo sapiens. Opposition of language, as bearer of cognitive dissonance, to music (Perlovsky, 2014) which then accepts the function of “cognitive consonance,” leads one to believe that “cognitive consonance” is that elusive adaptive value of music which has been sought after since Darwin and Spencer (Honing and Ploeger, 2012). Music's “consonant” function is evident in the mode of its default perception: we tend to integrate concurrent musical sounds, but segregate sounds of speech (Bregman, 1994, pp. 461–589), especially phonetics involves heavy fission (Staun, 2013); we sing together, but take turns in speech (Brown, 2007).

1. Audio: Shagay Kharvakh, collective dice game with singing, Mandalgovi, Gobi desert. This example illustrates how dice players spontaneously vocalize by “tuning-in,” each in his own way reflecting upon the mental activity the group is engaged in. Today such “musicking” aloud has given way to audiation, but cognitive consonance still takes place in an act of “self-other merging” (Tarr et al., 2014). http://chirb.it/PL6PJO

I see pitch organization as a unique mechanism for simultaneous processing of large number of signals with relative ease (McDermott et al., 2010a). Pitch medium is indispensable to optimizing cognitive schemes suitable for a particular environment, and reinforcing the cultural reproduction of this scheme within the community (Cross, 2007).

Instrumental for building the pre-tonal timeline is the Russian research. The Soviet regime committed enormous resources to investigation of folk cultures. During the 1940s, dedicated centers of folkloric studies were created at major conservatories, leading to accumulation of substantial databases and scholarly research. The Moscow Conservatory collection alone contains over 140,000 units of folk recordings (Giliarova, 2010). All major musicologists active in the USSR territory from the 1930s wrote on folkloric music. All graduate students in musicology and composition were required to take an ethnomusicology course and participate in field-studies.

I must underline that the goal of this paper is not to report on a theory of particular Russian scholar in his exact terms, but to present his findings to the English-speaking cognitive scientists in a format comfortable for implementation in their own research. Since cognitive science resorts to the terminology of pitch set theory, I explain all forms of tonal organization that use definite pitch in terms of set theory.

Following Wiora (1962), I use ethnic music to illustrate prehistoric music. Audio examples illustrate points of tonal organization crucial for my presentation; and to those interested in testing my writings experimentally, they indicate which music is suitable for testing. I look at my paper as a preliminary outline where many theoretic postulates might be corrected or found specific to certain conditions. Nevertheless, I feel it necessary to re-initiate in the Western science the line of research that became interrupted after the 1960s (Nettl, 2010, p. 108) .

The large scope of this paper leaves little room for detailed explanations, which is addressed by provision of reference to bibliographic sources with fuller information.

The Cognitive Science Framework of Study of Tonal Organization

At the foundation of cognitive study of tonal organization lies the concept of pitch set [PS]. It originates from the theory of atonal music (Babbitt, 1955). Allen Forte formalized the PS theory, defining PS as “any collection of unique pitches” (Forte, 1964). Although the original concept of PS was very specific in its reference to the order of appearance of 12 tones in an atonal composition, cognitive scientists have accepted this term in relation to any kind of music—understanding it as a set of tones used to constitute a particular music work (Balzano, 1982).

The adoption of PS elevated the importance of octave equivalence, since a set is assembled from pitches that are categorized into pitch-classes [PC]—presuming that all tones an octave apart represent the same pitch class. This principle sets forth another crucial concept—interval set [IS]: the distance between all pairs of PS tones within an octave. This distance is calculated in increments of the equal temperament semitone. Hence, the notion of PC is synonymous with pitch chroma (Hutchinson and Knopoff, 1978): division of an octave into 12 equal parts reduces each tone in a work to one of 12 tones, despite the original spelling of the tone in the score and its exact tuning in performance (enharmonic equivalence rule). Represented in this way, a PC defines an interval class [IC]—distance between two PCs reduced to a single representation (E/C = C/E).

PS can be transposed—thus, the sameness of IS between the original PS and its chromatic transposition forms pitch-class set [PCS]. Numerous music works can be based on the same PCS, and share the same interval-class content (Lewin, 1960)—which I prefer to call interval class set [ICS] (by the analogy with PCS). Such works are regarded as sharing the same tonal organization and expressive properties.

Perhaps the biggest contribution of cognitive psychology to musicology is the identification of the principal factors that contribute to the “experience” of a key (Krumhansl, 1990, p. 60). Tones contrast each other in stability—the sensation of a relative state of finality. Uniformity of distribution of stability/instability, with categorization of tonic, dominant, median, the rest of the diatonic, and the chromatic tones into five stability ranks (Lerdahl, 2009), constitutes tonal hierarchy, and defines tonality. Hierarchic organization can substantially vary, making it necessary to distinguish the stability profile of a particular PS from a PCS (Bigand, 1997).

Tonal hierarchy enables the perception of tonal melody in terms of fluctuations in tonal tension (Lerdahl and Krumhansl, 2007). Harmonic and melodic structures contained in music are responsible for the experience of tension in listeners (Lehne et al., 2013). Whenever unstable tones receive metric, rhythmic, dynamic, or textural stress, the listener perceives increase in tension (Krumhansl, 1996). This tension is quite objective: recent MRI study has identified the left lateral orbitofrontal cortex as the site responsible for this (Lehne et al., 2014). Metro-rhythmic leaning on stable tones decreases tension—perceived as momentary relaxation. Hence, unstable tones act as a driving force that raise “expectancy-tension” in the listener. Looking forward toward an unknown melodic continuation heightens attention for the subsequent events, which transpires into an impression of greater forward-directedness in melody (Margulis, 2005). Fluctuations in tonal tension are experienced in terms of locomotor impulses.

Steve Larson's model of “musical forces” provides a detailed framework in describing tonal “locomotion.” Drawing the analogy between mechanical laws that govern the motion of a body, and tonal laws that govern melodic motion from tone to tone (Larson and McAdams, 2004), Larson elaborates the “energetics” theory introduced by Ernst Kurth (Rothfarb, 1988). Tendency of unstable tone to resolve into the closest stable tone, Larson calls magnetism. Magnetism of unstable tones compliments the gravity of stable tones, generating melodic motion with assistance from inertia: the tendency to proceed in the direction set by the resolution of an unstable tone into a stable one. Kurth's idea that instability charges melodic motion has received experimental support: Larson and Vanhandel (2005) found magnetism to present a greater force than gravity and inertia; Vega (2003) discovered that the tendency of unstable tones to move exceeded the tendency of stable tones to stand; Hubbard and Ruppel (2013) show how gravity affects inertia.

Bharucha's (1996) notion of “anchoring” complements Larson's scheme by accounting for a harmonic grouping mechanism that binds an unstable tone with a stable tone that follows it. Music theory explains this by the integrating effect of “resolution.”

Distinction between Vertical and Horizontal Harmony

There is, however, an important distinction between “consonance” and “stability” (Kholopov, 1988, p. 22). Vertical harmony organizes simultaneous combination of tones, whereas horizontal harmony organizes succession of tones. Both types remain “harmony”—that is, a method of ordering the pitches according to a certain principle of euphony (pleasant-sounding combination of tones)—however, each operates on a different plane. Thus, for horizontal intervals, timbral contrast between two successive tones presents an obstacle for their integration in the same perceptual unit—whereas for vertical intervals it poses no problems (Borchert et al., 2011). The specificity of a plane causes different processing: melodic intervals trace—the first tone leaves a perceptual after-sound that sums with the following tone—except the interval of a 2nd. Tiulin (1966, p. 49) was first to note (1937) that a harmonic 2nd is a harsh dissonance, but a melodic 2nd is pleasant to the ear due to the peculiar short-memory phenomenon of “erasing the trace.” Komar (1971) elegantly explained this as displacement of the resolving tone by the resolved tone.

Larson incorporated displacement in his “musical forces” model. When the melody leaps, the first tone perceptually protrudes and overlaps with the new tone. If the melody steps, the new tone completely eradicates the previous tone's memory (Larson, 1997). Processing of melody involves the same harmonization-bias (in most cultures) as processing of harmony. The melodic progression is euphonized, when the gap between the two adjacent tones is smoothened by the mental prolongation of the first tone. Wider leaps are associated with stronger emotional connotations—perhaps, based on the speech prototype (Johnson-Laird and Oatley, 2010, p. 107). Tracing might yet serve the purpose of registering the exact size of a leap, semantically important, by caching the previous tone.

So, opposition of tracing and displacement in horizontal harmony should be viewed as the equivalent of the opposition of dissonance and consonance in vertical harmony. On the vertical plane, compliance of two tones in their harmonic spectrum determines their accord/discord (McDermott et al., 2010b). On the horizontal plane, stepwise progression of tones binds them into one stream of information, whereas leaps suggest bifurcation into two parallel streams (Bregman, 1994, p. 496). The leap then undergoes examination: whether it indeed marks the entrance of a new part, or it constitutes an “exclamation” within the same melodic part. Such discrimination makes all leaps “complex,” by definition, and associates them with melodic unease and tension (Rags, 1980, p. 19). “Displacement” serves as a sequential consonance in the progression of pitches—in contradistinction to “tracing” that works as a vertical buffer to compensate for disruption in the melodic smoothness (Tiulin, 1966, p. 33). Consonance is used more often than dissonance (Huron, 1994)—respectively, steps prevail over leaps (Zivic et al., 2013), especially in vocal music (Ammirante and Russo, 2015). Melodic 2nd is the principal binding agent in the music tissue (Tiulin, 1966, p. 49).

Melodically, large intervals contrast the 2nd by their capacity for stability. Each non-chromatic 2nd, as a rule, contains a stable tone , whereas all other intervals can have both tones unstable. Therefore, 2nd is inherently associated with resolution (stability), whereas other intervals are not. Displacement is crucial for cadences: in melody without rests, displacement works best for resetting the “pitch integration window” (Plack and Watkinson, 2010) to mark the ultimate resolution.

Consonance/dissonance define vertical harmony, while stability/instability—horizontal harmony. Since both serve the same purpose of harmonization, they stay interconnected. In Western tradition, horizontal harmony is processed through mediation by vertical harmony. Listeners infer vertical harmonic relations upon hearing melodic progressions, and surmise the “chords” implied by the melody—in an effort to anticipate the melody (Holleran et al., 1995). This might work as a harmonic error-correction tool in verifying perceived pitch contour (Povel and Jansen, 2002, p. 83).

Stability/instability guides the melodic assessment—only adjusted for a consonance/dissonance relationship (Bytchkov, 1997). Musical texture, in contrary, is estimated primarily in terms of consonance/dissonance; only correlated with stability/instability where the intervallic content of melody mismatches the vertical harmony (as in dissonant non-chordal tones in embellishments).

Toward Taxonomy of Melodic Intervals

Melodic consonance can be defined as euphony of successive tones, and must be distinguished from harmonic consonance. Thus, for harmonic intervals, frequency-ratio discrimination depends on ratio simplicity: octave, 5th, and 4th are identified more easily than 7th. For melodic intervals ratio simplicity is found to have no effect (Bonnard et al., 2012). Dissonance of vertical intervals is determined by fusion. Dissonance of horizontal intervals originates from:

• the extent of melodic disruption;

• the capacity to mark the resolution.

Tones that fuse well necessarily appear melodically weak, since fusion reduces tones' autonomy (Huron, 2001, p. 19) . Unison is a primary harmonic consonance, but a secondary melodic consonance. Melodic unison often falls on an unstable degree, appearing weak and giving poor resolution, unlike 2nd. That is why despite greater smoothness in pitch, unison does not match 2nd in its “gluing” power and capacity to mark a tonal center. Unison might be considered an “imperfect melodic consonance,” whereas 2nd—a “perfect melodic consonance.” This can be validated by listeners' general preference for melodic 2nd (Dowling, 1967, p. 21) and their expectation for a melodic contour to be completed by a 2nd (Carlsen, 1981).

The phenomenon of “implied polyphony” presents the best measure of melodic consonance. Whenever the melody features frequent leaps up and down, the listeners perceive two melodic lines: the upper line unites the crests of the leaping tones, the lower line—their base. This effect is not specific to Western music: also used by Japanese koto players (Burnett, 1980). The melodic dissonance of an interval is revealed through its capacity to generate an alternative melodic stream. Such testing was conducted and established the Temporal Coherence Boundary, above which segregation occurs (van Noorden, 1975, pp. 40–67). In slow tempo, minor 3rd serves as the bifurcation point, while in very fast tempo major 3rd can keep the integrity of the melodic line, delegating bifurcation to the 4th (Huron, 2001, p. 23).

2. Audio: Bach J.S. - Prelude for cello BWV 1007. Melodic consonance and dissonance. http://bit.ly/1QQmkFt.

Major 2nd champions melodic consonance , followed by unison and minor 2nd—all permanently consonant. The statistic analysis of folk samples of seven nations reveals that unison and major 2nd are by far the most frequently used intervals, followed by minor 2nd, major 3rd, 4th, and 5th (25). Vos and Troost (1989) received the same results for classical and popular music.

Minor and major 3rd are consonant in faster hemitonic music. They are permanently consonant in pentatony, where they can outnumber 2nd (Kolinski, 1967, p. 14). In passages, 4th can become consonant. These intervals make a special class of intersonance: state of being melodically unsteady - sometimes disruptive, and sometimes not.

Larger intervals always disturb the melodic line. However, they differ in their capacity to terminate it. Octave and 5th provide a good cadence, making them an “imperfect dissonance” . Tritone, 6th, and 7th produce incomplete-sounding endings. They constitute “perfect dissonance”—including melodic 6th which listeners report as high in tension (Maher and Berlyne, 1982) and difficult to identify by ear (Hall and Hess, 1984).

The following seems plausible for ranking of the melodic consonance:

2 n d , u n i s o n , 3 r d , 4 t h , 5 t h , o c t a v e , 6 t h , t r i t o n e , 7 t h .

Consonant ranking is influenced by melodiousness of the corresponding melodic intonation, which is a cultural factor. However, the ability to distinguish melodic consonance/dissonance appears to have genetic roots—just as its harmonic counterpart—according to the EEG measurements during newborn infants' sleep (Stefanics et al., 2009). The newborns can segregate concurrent tones into separate audio streams by detecting inharmonic relations between the co-occurring sounds (Bendixen et al., 2015).

Musicians know that melodic intervals bring about stronger emotional reaction than do harmonic intervals. Music training includes teaching “well”-tuned melodic intervals. Performers and listeners consider a dissonant melodic interval well-tuned when it is slightly wider than that which is prescribed by music theory—and this discrepancy becomes greater for larger intervals—responsible for their association with tension, harshness, and irritability (Rags, 1980, p. 19). Tracing determines larger intervals' valence. The “trace” is subject to the same rules as vertical intervals. So, melodic tritone is usually considered harsher than 5th despite being smaller. The aggregate data of all the spectral content of a particular “musical moment” is collected and converted into a rate-based code in the brainstem (Plack et al., 2014). Therefore, contribution of harmonic consonance/dissonance to melodic categorization is perhaps inevitable.

Yet another principal difference between vertical and horizontal harmony is that the concept of ISC is not applicable to melodic intervals (Tiulin, 1966, p. 49). Inversion of a melodic interval does not retain its tonal properties. Thus, 2nd is consonant, while 7th dissonant; so are unison and octave; 3rd can be consonant, while 6th is always dissonant; so are 4th and 5th.

Virtual Music Space

Vertical and horizontal axes, together, define a virtual music space, where “musical forces” control the melodic and harmonic progressions within a music work. Although this reality remains “virtual” and exists only in the listener's mind, by no means should it be considered “subjective” in a sense that every listener imagines tonal tension in his own arbitrary way. Through a series of stem completion tasks, priming tasks, and continuation rating tasks, Larson (2012, pp. 212–310) was able to demonstrate uniformity in estimation of musical gravity, magnetism, and inertia amongst the listeners of tonal music. His findings are corroborated by the line of research on locomotor entrainment through music .

Musical sounds are not just abstract auditory signals—they are spatial constructs that exist on a 3-D plane (time/pitch/texture) and specify fictional movement every time musical tones are bound together by tonal tension. Pitch changes generate melodic motion, where “pitch contour” and “distance” act as psychoacoustic correlates of “turn” and “displacement” of physical space (Ammirante and Thompson, 2012). Despite its illusiveness, melodic motion constitutes a fundamental aspect of music's impact and meaning (Clarke, 2001). Music is a motion-abstraction scheme that has a life of its own: “Music is an auditory fiction in which the sounds of voices or instruments are combined to produce sounds that never appear in nature” (Bregman and Woszczyk, 2004). In fact, the modus operandi of music opposes that of real life sound: the default state for musical perception is fusion, whereas natural sounds usually trigger fission.

Music is a unique and peculiar form of constructing quasi-spatial relations between auditory objects—taking after the relations of physical objects. The entrainment mechanism links the musical and physical universes. Rhythm is not the only property that connects musical and physical organizations. Dynamics is also involved in musical modeling. Dynamics contributes to the impression of relative “mass,” relying on the synesthetic connection between the perceived “size” of a sound and the actual size of the object that produces it (Marks, 1978, p. 53). The cross-modal mapping of height-to-pitch and thickness-to-pitch is already observed in 4-month-old infants (Dolscheid et al., 2014). This percept can be titled “virtual mass”: humans selectively entrain specific parts of their body to music depending on the distribution of periodic metric stress—heavier pulses engage axial body parts, whereas lighter pulses act more on lighter distal parts (Toiviainen et al., 2010).

Musical gravity imitates physical gravity. However, their correspondence is not strict. Eitan and Granot (2006) established that listeners, in their spatial representation of music, relate pitch contour to verticality, and loudness to distance and energy. But a number of cross-modal correspondences was found to work asymmetrically: descending pitch contour was perceived as spatial descent, whereas ascending contour was not nearly as strongly associated with ascent. Correspondence of increase in velocity with intensification was equally asymmetric. Evidently, musical gravity only partially follows its physical analog (Hubbard and Courtney, 2010), influenced by cultural factors and perceptual differences between senses of vision and hearing.

Musical “virtual space” should be regarded as a medium of autonomous organization that generalizes information known to an individual about the world in which he lives, and negotiates this generalization within the community of music users (Eitan, 2013). Through a series of cultural interactions music users form consensus on how their motion control and motor coordination are affected by observable physical laws—and take the established relationship as a prototype for relationship between musical tones in a PS (Gruhn, 1998).

Since musical gravity operates on principles that only partially imitate principles of physical gravity, dogmatic reliance on gravitational correspondence might lead to error. The recent theory of evolutionary origin of tonality (Doğantan-Dack, 2013) leans on universality of resolution: claiming that melodic motion is meant to end in a stable state analogous to physical unstable states, terminated by stable states. Even for Western tonality this is not necessarily the case. Ending on a stressed dissonant chord prevails in jazz/blues, setting a stereotype in popular music—altogether with unstable “vamp” fading-out. In folk practice unstable ending is just as good as stable.

3. Audio: Harvest Song, Bulgaria. Otglas (a break-off tone) marks the end by instantaneously throwing off the reference frame for stability (Kholopov, 2005). http://bit.ly/1IY0NV7

Folk-song can stop on the leading tone. Performers do it deliberately: “as though I lost my track” (Rudneva, 1994, p. 171). Unstable ending often works similar to ellipsis in punctuation.

4. Audio: Olonkho Oso Tuigun, Sakha. Ending of music on unstable tone corresponds to the standard formula of ending in Yakut epic tale: “saying this, he departed.” http://chirb.it/bb59c5

Musical forces manifest themselves not so much in cadence, but in the choice and functionality of the tones—the uncovering of which is impossible by the PS theory alone and requires the modal theory.

Investigation of Melodic Harmony: Mode and Intonation

The concepts of PC, PS, and IS impose analytic restrictions which limit the scope of musical material that can be effectively investigated using these notions alone. Assumptions of PC are made based on harmonic analysis of a score. But folklore is oral. Many genres are characterized by continuous music-making (Maghreb n u bah can last for a few days). Where does one “song” bridge to another? And where are the two contrasting sections of the same “song”? Even ethnomusicology has not yet coined a comprehensive definition of a song (Zemtsovsky, 1983). Structural features alone make it difficult to delineate song from speech (Mang, 2000; List, 2008).

The way of universally covering tonal organization is to incorporate melodic harmony in the notion of PS. Traditional musicology addresses this with the concept of “mode.” The Grove dictionary defines mode as the interaction of certain hierarchy of pitch relationship with a certain melody type, which results in setting a compositional norm that can be understood as a “particularized scale” or/and a “generalized tune”—depending on the musical context (Powers et al., 2001). Despite its progress, this definition still has shortcomings. It reserves the possibility for a mode to be “a scale,” restricts it to a single central tone, and disregards intervallic typology. This leads to poor distinction between “scale” and “mode,” as well as “mode” and “key,” which becomes an issue when dealing with music of folk origin . In general, modes have had little connection with scales until the High Middle Ages, and “then only in the minds of theorists” (Wulstan, 1971).

In Russian musicology, mode was not a prerogative of Medievalists, but a backbone for study of any music—at least since 1908 (Yavorsky, 1908)—including folk and non-Western cultures. Beliayev (1990, p. 225) carved the most laconic definition:

• “mode is the generalization of types of melodic motion in relation to intervallic structure of these types.”

More elaborated definitions emphasize the organic coherence of tones in a mode . Russian Musical Encyclopedia defines mode as “pleasant to ear concordance of tones in their pitch” manifested in “systemic relations of pitches, united in a set by a central tone or a group of tones—as well as concrete combinations of tones that embody such systemic relations” (Kholopov, 1982) . This definition puts forward the criteria of complex gravity, intervallic system, and characteristic melodic intonations.

“Intonatsiya” theory is another achievement of Russian musicology, poorly understood abroad . Although “intonatsiya” became associated with Asafyev's name (Tull and Asafyev, 2000), who understood it as a complex semiotic and cultural phenomenon, the underlying concept of melodic “intonation” was introduced by Yavorsky (1908, p. 4) as: the “elementary unit of music structure that binds its semantic content to similar verbal intonation.” Modern research generally confirms that melodic contour, interval, and tonal organization are analogous to linguistic direction, slope, and height (Bradley, 2013), and are engaged in emotional communication—where the “audio resolution” is quite high, to the semitone level (Cook, 2002, p. 104).

Intonatsiya theory connected the abstract notion of mode to the concrete implementation of tonal order in a given music work—revealed by means of intonational analysis (Zemtsovsky, 1980) (see the sample analysis at the end of Appendix I). Asafyev (1952, p. 289) describes the structural aspect of intonatsiya—which I am going to call “intonation”—as a “tone-cell” that in its simplest form presents a 2-tone melodic interval, and possesses three attributes:

• intervallic distance;

• melodic direction;

• gradation in melodiousness.

The latter reflects the psycho-physiological ease of singing of a given interval, and a cultural preference for it.

Mazel (1982) elaborated the theory of “intonation” as the elementary structural unit in the organization of horizontal harmony—the counterpart of “chord” in vertical harmony. The succession of intonations comprises melody, and charges it with tension at points critical for expression. A single intonation represents a time-point in a “form-process” (the experience of changes in expression of music) while simultaneously serving as a brick in a “form-crystal” (a structure derived upon completing audition of a work)—something akin to “quantal element in musical experience” (Godøy, 2013). Thus, intonation “glues” musical structure to experience, opening gates to semantic interpretation, and mediating between memory and attention: the listener decodes melody by recognizing familiar intonations, while identifying and memorizing new ones.

Intonation charges the melodic contour with stability/instability values, pollinating the vertical harmony: traceless and tracing intervals interact with each other, creating zones of greater verticality (traces in melodic leaps) and greater horizontality (displacement in melodic steps). The contrast in traces of consonant and dissonant intervals further differentiates the melody. Music-users devise maps of melodic tension to navigate through music. The most common intonations comprise maps of standard reference within a given culture.

Musicians intuitively pick on those intonations that are important in their social group. Use and re-use of the same pool of most common pitch contours forges melodic idioms—fixed patterns of melodic intervals placed in the metric and harmonic space—which obtain their semantic referents through association with specific genres (Orlova, 1984). Thus, ascending anacrusis 4th characterizes a march, associated with determination and purposefulness, whereas descending downbeat 3rd characterizes a lullaby, associated with comforting, and supporting. Such correspondences were noticed by Cooke (1959, p. 89)—and received some experimental confirmation (Maher and Berlyne, 1982).

Competent music users intuitively build their glossaries of musical intonations peculiar to a given cultural context. Those glossaries merge into a mega-glossary of conventions shared by all music-users within a social group (Shakhnazarova, 1966). Entire nations can be described in terms of “intonational culture”—and in fact, for music of numerous Siberian ethnicities that is the only rational way of description (Sychenko, 2009). Each historic formation can be characterized by an assortment of particular intonations (Szabolcsi, 1965, p. 205). And frequency of distribution of these intonations shapes a mode. The ultimate selection of tones for a composition is determined by a set of intonations most important for expression in a particular genre. Typology of content leads to typology of form—crystallizing a mode (Skrebkov, 1967)—which then, in turn, starts formatting the content.

Recent exploration of statistic methods in melodic analysis supports Asafyev's claim that certain styles of music can be defined by their intonation prevalence (Asafyev, 1971, p. 281). Zivic et al. (2013) report that Classicistic melodies are characterized by prevalence of double unison—which is rather rare in the Romantic repertoire. Eitan (1993) confirms marked differences in contour typology between historic styles. Different types of music use specific “theoretically important tones” more frequently than other tones, and guide the listeners unfamiliar with a given style to the tonal organization (Castellano et al., 1984). Juhász (2012) analyzes pitch contours and segmentation of 30,000 melodies from 25 different cultures, and demonstrates significant differences between certain national types in their use of melodic intervals.

Asafyev's “tone-cell” is remarkably close to what Brown and Butler (1981) identified as a “cue-cell” in their experiments, when they discovered that listeners do not have to hear the tonic in order to detect the tonal center . Quinn and Mavromatis (2011) also concluded that “pairs of neighboring harmonic states, demarcated by note onsets, are sufficient as windows for key-finding.” They specified that harmonic dissonance had no contribution to stability—rather, that the tonal center was defined by the fact that cadential progressions utilized few motifs that used the same few pitches, whereas other progressions used many motifs that were distributed across pitches transpositionally. Evidently, the knowledge of characteristic intonations helps listeners navigate across tonal maps, following the compass of tonal gravity. Huron (2006, p. 160) came closest to Asafyev, when he inferred the scheme for typical scale-degree successions in the corpus of German folksongs. He calculated the probabilities for each of the major key degrees to proceed into other degrees, and identified those for which a single continuation dominated all other possibilities—calling them “tendency tones.” What Huron discovered were Asafyev's “tone-cells” that characterize the major key mode.

Modality vs. Tonality

Key is a mode, too: the unity of its tones is generated by melodic harmony as much as by vertical harmony—“tendency tones” are not any less important for perception of tonality than are the functions of implied chords. Temperley and Marvin (2008) put this condition under test and discovered that listeners performed poorly in finding the key of a melody when it was generated by the distribution of PCS alone. Listeners needed structural cues produced by the ordering of tones within a sequence, to successfully define a key.

The same key can host different modes: during the 1800s string players employed two tuning standards, Gamme europeenne and Gamme grecque (Barbieri and Mangsen, 1991) that differed in their treatment of the VII degree. Both gammas represented the same key, yet presented distinctly different modes. Evidently, the difference was determined by the prevalence of certain melodic progressions: prevalence of VI–VII turned the major key into Gamme grecque,—while the prevalence of VI-V made it into Gamme europeenne. “Tendency tones” produced modal inflection.

Every key is a mode, but not every mode is a key (see Part-2). Hence, it is cardinal to distinguish between modality and tonality—following Choron and Fayolle (1810), who opposed their contemporary “key” to the Greek mode (Blum, 1985).

Tonality (in a narrow historic sense) is such principle of organization where all tones in a PCS are subordinated to the tonic and the tonic triad, and are categorized through their functional relations to one another, expressed in the formation of chords that execute functions of stability (tonic), instability (dominant), or neutrality (subdominant) in distribution of harmonic tension. Such organization is typical for classical and popular Western music, as well as more recent folk music. Major and minor keys constitute tonality—which includes the natural, harmonic, and melodic modes of these keys.

Modality can be defined as a principle of tonal organization where all tones in a PCS are united by melodic relations—that is, by frequency of occurrence of certain intonations and their melodic functionality: capacity to initiate, finalize, or develop melodic phrases. Such organization is characterized by weak tonicity: it is normal for such music to have multiple anchoring tones of variable gravity. If to compare tonality to electric DC, then modality would be AC: an unstable tone can turn into stable, or vice versa, and be attracted by a different tone—fluidity of such alternation distinguishes modality from tonality. Just as much as tonality is characterized by permanence of tonic function and abundance of alterations (sharpened/flattened degrees); modality is characterized by permanence of scale (scarcity of alterations) and fluctuations in gravity (Kholopov, 1975). Western music prior to the seventeenth century, and most of world music, constitute the modal domain.

Modality and tonality can coexist. Examples of this are found in music composed in Church modes after the eighteenth century, as well as in modern jazz, rock, and post-tonal classical music. The share of tonality varies depending on whether it is harmonic or melodic consonance that governs organization. Modal gravity depends on melodic consonance (Kholopov, 2005). The leaning tones in characteristic intonations magnetize other tones. Modal gravity is a function of rhythm and meter, frequency of repetition, and sequential position in melodic phrases (especially in starting and ending points).

Problem of Intervallic Typology

The difference between modality and tonality transpires into the difference in intervallic priority: modality relies on melodic consonance, while tonality—on harmonic consonance (Von Hornbostel, 1948). This difference is not obvious. Modal music has its own taxonomy of organization, different from tonality (see Appendix I)—especially early forms of modality cannot be parsed in accordance to Lerdahl/Jackendoff theory (Ojamaa and Ross, 2011).

The biggest obstacle for applying tonal methodology on modal material is the difference in intervallic typology: a principle used to define the reference pitch-points in a melodic contour (Kholopov, 1988, p. 115). Intervallic typology is influenced by the tuning system and the mode, but presents its own aspect of tonal organization, deliberately managed by the creator of music—at least since the Hellenic era (West, 1992, p. 162). Greeks distinguished between 3 types: diatonic, chromatic, and enharmonic—each associated with specific semantics (Pont, 2008). In addition to the 3 Greek types, there are 5 other types (see Appendix I in Supplementary Material)—each characterized by its own expression.

The problem is that different models of tonal organization subscribe to different methods of tracking intervallic relations. Not every music system recognizes the concept of interval. Even such sophisticated music system as Indian raga does not reserve a term for “interval”: in raga, the exact position of one tone in relation to another is processed not in terms of pitch-distance but as membership in a PCS combined with a numerical value of the degree within a mode (Rowell, 1981). Such thinking, in fact, prevails in early folk music.

That is why it is essential to account for pitch order in PS of a mode (ascending, descending, symmetric). Three earliest forms of tonal organization use indefinite pitch and disallow application of PS framework. Six stages of it are based on non-octave interval typology, requiring adaptation of the PS theory to account for other types of equivalence.

Octave equivalence must have been discovered during the Neolithic Era, limited to selective tones, and acquired formative power in tonal organization only by the Middle Ages. Contrary to the widespread belief based on confusion over the historic transformations of the term “mode” (Cazden, 1971), Ancient Greek music was built on equivalence of not octave but 4th . Aristoxenus described modulation by an octave—which indicates octave inequivalence (Hagel, 2009, p. 4). Music systems that succeeded the Greek were non-octave in their design: Byzantine oktōēchos, Daseian notation, Persian dastgah, Mediterranean and Central Asian maqam—all feature non-octave naming scheme and tetrachordal/trichordal principle of music-making.

Just like folk songs, Medieval art-music followed what Sachs (1960) terms the “chain principle”: their melody had a formative 3–4-tone kernel, which expanded whenever a singer became excited—adding a similar interval above/below the kernel's margin. This expansion disregarded octave equivalence, because the singer tended to leave out the distant tones and operate only on nearby pitches.

5. Audio: Samai. The chain principle: melody starts on the tetrachord Saba on D, ascending to tetrachord Hijaz on F, and further up to tetrachord Hijaz on C—where upper Db mismatches lower D-natural. http://bit.ly/1YCPVqZ

Chain principle often produces what appears as false relation according to Western music theory: a degree is permanently tuned noticeably higher or lower than its octave counterpart.

6. Audio: Maqam Saba. “False” relation between upper Db and lower D. In maqamat, relations between adjacent tetrachords tend to outweigh octave relationship, evident in practice of adding “false-related” leading tones at the tetrachord margins (Shumays, 2013). http://bit.ly/1KwKy5g

Unfortunately, there formed a trend in Western musicology to elevate octave equivalence to the rank of cognitive universal, and retroactively ascribe it to early stages of tonal organization, when music was governed primarily by the melodic harmony. Such are the evolutionary theories by Fink (2003) and Kolinski (1990), proposing spontaneous discovery of natural harmonics and the circle of 5ths by a hominid – following Pythagorean lineage. Pythagoreanism is inherently achronic and therefore unsuitable for study of evolution of musical perception (Cazden, 1958). 5th and 4th are melodically difficult for intonation and would have required a long time-line of development. To this day children still acquire the ability to sing them in tune after mastering 2nds and 3rds (Davidson, 1985). Until they do so they tend to scale down wide intervals to the size close to 2nd (Kvitka, 1971, p. 235)—practice observable in infants' cry-melodies (Wermke and Mende, 2009) and first songs (McKernon, 1979)—despite their ability to vocalize across a wider range (Fox, 1990). Gradual interval expansion characterizes both, infant and “primitive” musics (Nettl, 1956). Hominids were unlikely to have vocally reproduced wide intervals sufficiently precise to establish the reference pitch and stability axis. And instrumental music usually follows vocal models (Kvitka, 1973, p. 21). Examples of dichordal and trichordal folk melodies based on 4th, 5th, or octave are scarce, whereas there is no shortage of them for 2nd and 3rd (Alekseyev, 1986, p. 119). Numerous archaic cultures employ scales narrower than 4th (Jordania, 2006, p. 69, 73, 110–113, 146): i.e., Lamaholot duet singing in Flores uses no intervals larger than 3rd (Rappoport, 2011).

Simple-ratio preference is a local Western feature—not a universal, against some claims (Burns and Ward, 1999). Even amongst native Westerners, ability to reliably identify intervallic relations is present mostly in musically trained listeners—, many non-musicians have difficulty distinguishing even between vertical 3rd and 4th, instead, they process pitch changes primarily by melodic intervals (Smith, 1997).

Butler and Brown (1994) note that listeners “pick up information about tonal harmony from one or several tones at a time as the music unfolds perceptually across time”—lamenting that this phenomenon has received little attention. They identify two reasons for this:

• Assertion that harmony is intrinsically related to the harmonic spectrum of periodic tones.

• Excessive credit given to abstractions such as scale and chordal structures.

There is abundant evidence that melodic consonance plays a more important role than harmonic consonance in many cultures across the globe. There is abundant evidence that melodic consonance rather than harmonic consonance determines concordance in music in many cultures across the globe. Such is Lithuanian sutartinë. Its setting includes 2-part polyphonic imitations in major 2nd: one part leans on C-E, whereas another—on D-F#. The vertical harshness, however, is apophatic: “sutartinė” means “fitting in agreement,” requiring great peacefulness and concurrence from female singers (Raciuniene-Vyciniene, 2006).

7. Audio: Sutartinė “Lioj liepa,” Lithuania. Musical apophasis: tender melody in harsh harmony. The singers are well-familiar with the standard Western harmony, yet carry their own style. http://bit.ly/1NXok0i

As apophatic is Papuan weii, with parallel minor 2nds, described by participants as nice “bell-like.” Messner (1981) coined the term Schwebungsdiaphonie to refer to this dissonant music-whose wide spread spanned from Western Europe through Balkans, Afghanistan, Central East Africa to Indonesia, suggesting its origin from a vast archaic proto-culture (Brandl, 2008).

8. Audio: Oe Bala, weeding work-song, Flores Timur. Its cluster-based vertical harmony, voice quality, warbling technique, and melodic patterns, especially cadences, are surprisingly similar to Bulgarian (compare Ex.3), Bosnian, and Macedonian multi-part singing (Yampolsky, 1995a) http://chirb.it/cOLsKH

Apparently, such proto-culture prioritized melodic consonance over harmonic. Moreover, Messner (2006) emphasizes that Schwebungsdiaphonie often engages “maximal roughness” (80–165 cents) and the same contrasting functionality of parts.

9. Audio: Teo Ne Wea-Dioe, Ngada wrestling music, West Flores. 3-part singing in parallel major and minor 2nd is learned by the participants, part by part, as accompaniment to the bass melody, where the upper part is supposed to keep the other two “in-tune” (Yampolsky, 1995b) http://bit.ly/1MrBLBd.

The capacity to hear the difference between harmonic consonance and dissonance is most likely genetically embedded in primates (Koda et al., 2013), however, the notion of tension related to consonance/dissonance is exclusive to humans and depends on the culture. The necessity for harmonic dissonance to resolve into consonance is realized following the negative affect generated by the incongruence between pitch processing on the one hand, and melodic priming mechanisms on the other (McLachlan et al., 2013). When the melodic template (PS) heard in a piece of music does not match the modal template (PCS) known to the listener, he experiences cognitive dissonance and binds it with harmonic dissonance. That is why diaphony is possible in PCS based on 2nd and 4th.

Pre-mode

We know that there are folk cultures without instrumental music, but there are none without vocal music. Moreover, in many cultures instrumental folk music does not serve to conserve an implicit music theory, but merely imitates the vocal models (Kvitka, 1973, p. 21). The very mechanism of sound production in wind and string instruments imitates vocal production (Terhardt, 1987). The vocal tract is designed for tonality: lung and trachea work as a primary linear resonating system, non-linear coupling occurs in glottis, and the entire vocal tract serves as a secondary linear resonating system . Human pinna, ear canal, and basilar membrane are all optimized for transmission of human vocalizations, suggesting that the sense of tonal integrity evolved in response to vocal sounds (Pierce, 1992). The most biologically relevant and frequently processed tonal stimuli are those that are produced by the representatives of the same species. And human ear is remarkably effective in extraction of behaviorally relevant information from the sound of human voice (i.e., speaker's gender, age, emotional state)—testifying to the centrality of spectral data to human life (Bowling, 2012).

Anthropological evidence shows that Homo heidelbergensis had modern hearing capabilities as well as modern vocal anatomy, which sets the time-frame for origin of music 700,000–300,000 years ago (Wurz, 2010). Singing must have been the prime reason for the descent of larynx which enabled sustenance of pitch throughout vocalization—without dropping it, as non-human primates do (Maclarnon and Hewitt, 2004) .

Why did the hominids need to upgrade their vocalization to sonorous holding of a pitch? Isn't singing in the savanna dangerous for an animal that neither outruns nor overpowers predators, and is mediocre at hiding? Jordania (2011, p. 85) notes that out of 5400 species that can sing, Homo is the only land animal—most other “singers” habituate on trees, in relative safety, and do not sing when they are on the ground. Jordania suggests a good reason for learning to sing—safety: as soon as hominids left their shelters, they could keep their predators away by loud sounds collectively made by the entire tribe. Good syncing would have been a must to project the impression of a single big creature—forming the distinguishing hominoid trait (Merker, 2000).

10. Audio: Dance of the Elephant Mask, Côte d'Ivoire. Representation of the elephant by a masked dancer and a choir in a Baule village; (Zemp, 1967). http://bit.ly/1bhwH6c

The counterpart of collective aggressive music-making was individual caretaking. A simple laryngeal vocalization, grunt, found in most primates, is a good candidate for “lyrical” proto-music—it is also employed as the earliest form of vocal behavior in human newborns (McCune et al., 1996). Grunts are the artifacts of bodily movement and physical straining (Oller, 2000, p. 251). In this capacity, grunts likely accompanied the first forms of dance—McNeill (2008, p. 16) describes a group of chimpanzees' jointly swaying and rocking to the sounds of rain. Grunting during grooming is a common behavior amongst baboons. Such behaviors could have become ritualized by hominids, with the accompanying vocalizations learned and reproduced in the absence of grooming motions (Dunbar, 2012). Then, reuse of the learned vocalization in new social settings, associated with a different emotional state, would promote abstraction of vocal expression, turning it into a symbol of a specific activity, and attaching to it a certain emotional denotation (Cross and Morley, 2009).

11. Audio: Tespeng Khoomei, Tuva. This introduction for a love song shows what “grunt intonation” could have sounded like. http://bit.ly/1bcHoXf

Jordania (2008) notes that humming vocalization is more wide-spread across modern population than is singing, and that this humming is probably the remnant of the grunt-like vocalizations (Mithen, 2005, pp. 221–245). Jordania explains that many animals lack a dedicated “danger call”—for them the sound of silence acts as a danger signal. For such species humming can serve as a “contact call,” signaling safety. Ability to hum with a closed mouth, even while eating, as well as the ease of humming, makes it favorable as a candidate for a universal safety signal. A semiotic stance obtained through contact-calling makes humming a probable prototype for musical vocalization. It is quite likely that the hominid motherese was initially hummed rather than sung—and only later developed into pitched vocalization, perhaps following suit of the caretaker in a proliferated tribe.

Rubtsov (1973) laid out the theory of song's genesis, emphasizing that it was neither physiological nor acoustic rules that brought to life tonal organization, but verbal intonation . Mode is nothing but generalization of the practice of intoning by the majority within a community—sustained over an extended period of usage. And the source material for musical intonations comes from intonations of speech. The immediate cause for musical implementation must have been the need to engage a greater number of individuals in sharing the same emotional experience. By “speech” here is meant not only words, but also interjections and other utterances like weeping—capable of bearing emotional denotations without words.

• Sighing (care),

• shouting (aggression),

• narrative (neutral)

Provided three archetypes that are most contrasting to one another in their pitch contour, rhythm, and metric organization. Similar intonational prototypes are found in “cry melodies” of babies, pitch contours of which are typified by their native tongues (Mampe et al., 2009). The formative role here is played by vowels that map to similar sites in auditory cortex as pitch (Lidji et al., 2010; Gutschalk and Uppenkamp, 2011).

Initially, musical proto-intonations could be fixed to specific utterances, but then they obtained their own semantic significance and became re-texted. The moment the meaning of a vocalization was decided not by text but by typological melodic contour, was the birth of song (Rubtsov, 1962).

12. Audio: Funeral lament, Tuva. Melodic contour of indefinite pitch, which carries its dedicated emotional expression. http://bit.ly/1F3B40h

13. Audio: Kilamê ser, Yezidis, Armenia. The remnant of proto-language must be the tradition of “melodized speech,” that is reserved for expression of negative feelings amongst the Yezidis—in contrast to positive feelings expressed in songs (Bretèque, 2012). http://bit.ly/1e4P8Ms

Multiple folkloric traditions all over the world employ formulaic organization of melody independent of lyrics. In fact, some cultures do not employ lyrics at all (Abkhazian, Georgian, Chuvash, Udmurt), instead, they use meaningless syllables or base an entire song on a single word—(Zemtsovsky, 1983).

14. Audio: Lullaby, Tuva. Use of vocables (hushabye); (Alekseyev and Levin, 1990). http://bit.ly/1O5Wyde

Such detachment of singing from speaking typifies substantial stock of early folk music—and is still evident in the existing practice of re-texting the same melodic formula with different, completely unrelated, lyrics—found in many traditions. Thus, numerous Dagestani, Tartarian, and Evenki songs receive different lyrics every time a tune is performed (ibid.) .

Repetition of familiar melodic formula, laid on unfamiliar text, is likely to create a semantic clash, when the semantic content associated with the music would push the interpretation of new verses of text in the direction away from their verbal meaning. Clashing, in fact, could very well be the re-texting goal: testing the power of melodic formula by imposing it on unrelated textual material.

Identification of a song by its melody rather than by its lyrics in such cultures confirms the prominence of melodic formula that should be viewed as musical implementation of ritual (Zemtsovsky, 1987). Any ritual is a culture of action—an algorithm of strict repetition in a prescribed order, applicable to histrionics, phonation, and religious thought. Fragmentation of a peculiar melodic contour and accurate reproduction of it from different pitch levels, and on different utterances, constituted an important achievement for human civilization. Ritualization of a melodic contour marked enculturation of semantic content peculiar to music—it was the birth of strictly musical cognitive typology, alternative to typology of speech, and a starting point in tonal organization—in the absence of fixed pitches.

15. Audio: Aije, Brazil. Sacred bull-roarer music of Bororo Indians, performed by Tugarege men as part of Death rite, while women and children are hiding in the huts (Canzio, 1989). http://bit.ly/1FYpqQj

An important reason for intonation to bifurcate into speech and music, evident in the opposite valence of high and low pitches for speech vs. music (Ilie and Thompson, 2006), must have been the issue of cognitive dissonance, as explained by Perlovsky (2012). Conceptually oriented, verbal language tends to bring to awareness discrepancies between interests of different language users, since linguistic processing occurs in terms of opposites (in order to define a concept we have to envisage what it is not). Music users, on the other hand, tend to share a common emotional state and the same mental attitude toward the goals of a musical behavior in which they are collectively engaged. Hence, linguistic semiosis is prone to generate cognitive dissonance, whereas musical semiosis—to resolve it. Music counterbalances language in pragmatics of communication: music focuses on “affective meaning,” whereas language only accounts for it (Gussenhoven, 2002).

Development of music compliments the development of language. There is some experimental support for “consonance effect” of music (Masataka and Perlovsky, 2012). Also, 6-month old infants display different reaction to music vs. speech: they babble, point, and move in a way suggestive of their attempt to socialize in response to speech—but not to music, which causes them to quiet down and listen (Fais et al., 2010). Perhaps, children are born with the knowledge of what constitutes sounds of speech, and what—music. Such suggestion is not unreasonable (Papoušek, 1996), since the ability to discriminate between relevant and irrelevant sounds is essential for survival right from birth. The ability to distinguish speech from non-speech is functional at the time of birth (Winkler et al., 2003), and segregation of musical sounds seems to follow suit (Háden et al., 2015).

Yet another distinction is the disposition of language toward rapid change, vs. the conservative tendency of music: there are numerous examples of ethnicities that lost their original tongue yet retained their unique music—which should be explained by the music's power to continually reaffirm one's connection to the group (Grauer, 2007)—a form of “cognitive consonance.” Comparative musicology has revealed cultures where music traits remained essentially unchanged over extremely long periods of time, wide geographical areas, and different environments (Grauer, 2007).

Opposition of music to speech is manifested in the manner of sound production. Musical vocalization usually reserves the register and spectral characteristics, contrasting to phonetics of the language native to the singer (Presentation 1 in Supplementary Material).

As contrasting is the manner of vocal articulation between the two: frequent caesuras and emphasis on phrasal ends in speech, vs. few caesuras, generous ornamentation, drastic timbral transformations, vibrato, and pronounced pitch-bending in early music (Graf, 1967) .

16. Audio: The 4-year-old light tan horse, praising song, Mongolia. Deep throat singing. http://bit.ly/1DqAPad

Artificiality of sound production in such singing prompted to characterize it as “timbral” (Sheikin, 2002, p. 245) because of the prominent role of timbral inflections, often of onomatopoeic nature .

17. Audio: Geese Katajjait, Canada. Vocal imitation of the geese cries. http://bit.ly/1O63ywe

Even non-alive objects could be imitated in sound.

18. Audio: Borbangnadyr, Tuva. Vocal imitation of the sound of the brook (Levin, 1999). http://bit.ly/1D36LSJ

Opposition of melodic intonation to speech was also achieved by deliberate flattening of the pitch contour and excessive rhythmisizing.

19. Audio: Katajjait, Baffin Land. Monotonous style of singing on stressed rhythmic pattern of the vocables. http://bit.ly/1Ga2lja

Many ethnicities of Siberia, Far East, and Amerindian tribes use personal songs to spiritually represent an individual . Sheikin emphasizes that it is not the configuration of pitch and rhythm that makes such song personal, but specifically the manner of vocalization, where timbre plays a pivotal role. The “owner” is recognized by his spectral signature—in the same way we recognize a familiar speaker—but expressed in an exaggerated style. Songs of Chukchis, Koryaks, Yukaghirs, Evens, Nganasans, Entses, Nenets, Mansies, and Khants are all personalized in this way, while reflecting the regional differences between different colonies. The Ancestor Cult, common across the entire Siberia, contributes to formation of musical styles—because one's individual song tends to stay close to his father's song.

Like family name, individual songs were often inherited. Ojamaa (2002) describes how in infancy, along with the name, the Nganasan child receives a brief song descriptive of his personal traits from his parents. Upon reaching adulthood, every Nganasan youth creates an individual song that accompanies them throughout their life. Their acquaintances know that this melody represents its owner, and often sing that melody while thinking about him/her. In parallel, the adult Nganasan may use his parent's song as a family memorabilia. Often such song carries signs of ethnicity or geographic origin of the family ancestors through its melodic features.

20. Audio: For Topahti, Nootka song of Kwaktiutl origin. An inherited ceremonial song, given as a dowry, and permitted for performance only by its owner (Halpern, 1974). http://bit.ly/1DZ5TlS

“Personal song” appears to represent a virtual self : an imaginary twin-person used to emotionally examine the interaction between the self and the environment as though from aside. A comparison of personal songs by the same performer recorded at different times shows great variability in text and emotional states, but permanence in melodic structure (Ojamaa and Ross, 2004), suggesting association between “self” and melody. Amongst a number of Siberian ethnicities, personal song functions like “passport”: different melodies represent the same individual in childhood, adolescence and old age—often also carrying information about his family and birthplace (Novik, 2004, p. 80).

The initial division of proto-music on “militant” hunting vociferation and “lyrical” caretaking grunts upgraded into two proto-genres: collective “for-others” and individual “for-oneself” (Alekseyev, 1986, p. 12). Songs “for-others” were consumed collectively, and promoted the development of tonal organization. Songs “for-oneself” remained frozen in their morphology, as revealed by comparative analysis of Siberian field studies over the last century (Alekseyev and Nikolayeva, 1981). The reason for such conservation was the self-communication functionality: the singer remains half-conscious of his performance, humming a tune in spontaneous release of his emotional energy rather than trying to “convince” listeners. Sheikin (2002, p. 304) nicknames personal singing tradition as “Cartesian”: “I sing therefore I am.” The manner of such singing reminds of “safety signals” employed by social animals.

21. Audio: Xöömei on Horseback, Tuva. Spontaneous singing while riding. http://bit.ly/1JWKwm7

Little need in perfection of musical communication discourages variation and innovation, preserving “song for oneself” in inherited from ancestors state, making it a monument of early tonal organization.

Khasmatonal Mode

The main formative principle in early individualized singing appears to be khasmatonal interval organization (Wiora, 1959), characterized by the stressed leaps (4th or larger), which are fixed for a particular registral span in a mode. Usually, a register with a bunch of close pitches opposes a register entered by a leap. Sometimes, mode includes two leaps.

22. Audio: High song, Bulgaria. Today there are no purely khasmatonal songs in use, and khasmatonal leaps are embedded in pitched context. http://bit.ly/1EhypRM

Russian ethnomusicology holds khasmatonal organization as the first genuine type of tonal organization—tones half-spoken/half-sung, with intense timbral/pitch modifications .

23. Audio: Menerik Yryata. Trance-song, Sakha. This reproduction of a song of a psychotic woman, sung by her repeatedly in semi-conscious state must be representative of khasmatonal style—with its glissando, vibrato, leaps, talk (Alekseyev and Nikolayeva, 1981, p. 58). http://chirb.it/vmIwaf

MRI measurements demonstrate that while listening to a song the brain is sensitive to discrete pitch changes in singing as opposed to gliding pitch in speech (Merrill et al., 2012)—a likely mechanism to promote khasmatonal leaps.

It is arguable whether or not a strictly pitchless khasmatonal mode contains “degrees,” because every occurrence of the “same” (by lyrics and contour) musical tone is tuned differently. What constitutes “sameness” here is the successive order of a tone in a melodic contour which imposes a specific function of starting, terminating, climaxing, or supporting a particular tone within a melody—prompted by registral position (Alekseyev, 1976, p. 120). Therefore, khasmatonal tones are in fact correlated “in pitch,” which makes them a peculiar form of degrees.

The main idea behind khasmatonal melodies is timbral contrast and variation. The pitch here merely supports the timbre: melodic steps accompany timbral variation, while leaps—timbral contrast .

24. Audio: Night chant, Navajo. Falsetto contrast. http://bit.ly/1O68q4s

A noteworthy chasm occurs as a result of abrupt timbral/pitch change, and serves as principal means of tonal organization. In the absence of fixed intervals and pitches, the contrast between registers remains the only strictly musical structural parameter usable for coordination of musical tones and their integration into mode. The other two—rhythm and music form—originate from lyrics. Syllabification of melodic line is confirmed to serve as grouping tones together (Sundberg, 1992), by turning stressed syllables into tonal anchors.

Khasmatonal intonation was born the instance the majority in a hominid tribe began recognizing the same timbral color applied to the same melodic contour in the same vocal register—memorizing the spectral characteristics and the approximate frequency of that vocalization as a signal. Most likely this happened during the Middle Pleistocene, in parallel with the newly developed ability to recognize unusually shaped or marked stones as “special” (Dissanayake, 2013). Mammoth bones painted with ochre were found at Mousterian sites (Demay et al., 2012). Straight lines, engraved on stone tools, dated between 350 and 250,000 BP, are characterized by rhythmic distribution: equality of size, intervals, angles (Frolov, 1992, p. 74). The skill of turning “ordinary things” into “extra-ordinary” is no different than turning “ordinary” sounds into “extra-ordinary.” And shaping timbre, pitch, and rhythm works essentially in the same way ochre helped cover familiar objects with attractive ornaments.

Vocal music presided in shaping the musical mode at its cradle. Individual song must have set the standard for the musical use of voice—in contrast to speech. Primitive instruments readily available to hominids before the Middle Stone Age did not allow individualization of timbre on the range of pitches. Sheikin (2002, p. 46), overviews over 150 instrumental types used by 31 Siberian ethnicities, and infers two characteristic traits: commonality of objects used as musical instruments and their dispensable use. Tuvans insert a twig in their mouth akin to a Jew's-harp; Yakuts hold wood chips by their jaws; taiga ethnicities whistle through the bark (116)—such “instruments” are discarded after a single use (which explains scarcity of archeological finds). Siberian folk instruments in modern use have changed little from the ones found in Neolithic settlements in middle Lena region (Sheikin, 2002, p. 86). Similar indication comes from comparison of records of the first ethnographers who visited Siberian region, with the current findings (Ojamaa, 2005).

First instruments were used to imitate sounds of nature—from “realistic” birdcalls or wind emulators, such as Tuvan xirlee, to more “abstract” xomuz.

25. Audio: Symysky call, Khakassia. Imitation of the cry of the male maral made by symysky—a piece of birch bark. http://chirb.it/8zt1tw

26. Audio: Pyrgy call, Khakassia. Imitation of the cry of the wild baby-goat made by pyrgy—a wooden cone. http://chirb.it/aegPcy

27. Audio: Xomuz imitating water stream. http://bit.ly/1DrvnDR

Commonality of an instrument and its timbral idiosyncrasy typify all archaic organology. Each object as though possesses its unique recognizable “voice,” discovered by accident, from everyday usage.

28. Audio: Sukute, Solomon Islands. Struck and occasionally blown bamboo tubes. http://bit.ly/1L9FJ5m

What keeps such an instrument alive is the uniqueness of its voice. Just as a person is recognized by the sound of his voice, archaic instruments are recognized by their “personal song.” When interviewed by ethnographers, instrument makers could not give their reason for the choice of specific size and makeup in construction of an instrument—they took common objects “as they were” (Sheikin, 2002, p. 160): a leaf, a stalk, a wooden chip made during cutting of a tree, or a common tool like a bow. This music seems to originate from “playing-for-oneself” just as in “singing-for-oneself”—half-consciously, and as self-entertainment. Once the unique voice of an object is discovered, it is preserved through reproduction on other dispensable objects of the same class—very much like the contour formula of a “self-song” is repeated by different singers from different pitches. Archaic instrumental music is as formulaic as the archaic song.

Similar to two flavors of lyrical and militant vocal proto-music, instrumental proto-music also had its aggressive counterpart. Almost all the oldest instruments known amongst Siberian peoples were, in one way or another, originally related to hunting, and retained mythological connections to aggression . Lawergren (1988) explains that earliest musical instruments either looked similar to weapons, served as signals between hunters, or used to frighten animals, and/or attract them in order to trap them. Jordania tells how musical instruments could be useful for scaring away predators in order to scavenge on the prey killed by them—revealing common etiology between hunting “instrument” and music “instrument” (Jordania, 2011, p. 102).

Not all applications of hunt-related music had to be loud and scary. Mastering the art of imitation of an animal's sound meant gaining control over that animal. Also, for a human to be able to produce “non-human” sound was a form of “super”-natural experience. Quiet music representative of hunted animals could have easily been an object of cult similar to the petroglyphs of hunted animals: it is not accidental that the greatest number of pictures are found in the most resonant cave areas—in Paleolithic French (Reznikoff, 2008) and Neolithic Spanish caves. Furthermore, acoustic measurements suggest that the painted wall was intended as a sound-reflecting surface (Díaz-Andreu and García, 2012). Placement of open-air rock art also seems to comply with the sound design concerns, evident in Didima Gorge, South Africa (Mazel, 2011), and canyons in Utah and Arizona (Waller, 2006).

If a cave or a megalith was selected for its acoustics conducive to human vocalization, then music must have been part of important daily activities back then. Likely, it was music that inspired artistic expression: earliest musical instruments predate the earliest known cave art (Morley, 2014). It seems that the generalization that less artistic species, Neanderthals, were supplanted by more artistic species, Homo Sapiens, is in fact accurate (Pettitt, 2008) . Greater proficiency in arts and music must have contributed to the development of social-cultural systems that put Homo Sapiens at a biological advantage as compared to Neanderthals (Conard, 2011). Symbolically mediated social systems allowed to expand social networks, thereby reducing personal risk, and music performance helped build and calibrate mechanisms for emotional mediation between an individual and a social group.

Cave culture served as a powerful catalytic factor that contributed to the radical acceleration in genesis of music. Living in near total darkness puts a much stronger importance on hearing. Many archeological megalithic sites were found to exhibit a primary acoustic resonance at 110 Hz peak—which is close to the average fundamental frequency of an adult male voice (Devereux, 2006). Resonance and echo aids navigation in complex cave structures. Greater attention to auditory detail could have stimulated more intense tonal development. Reznikoff (2004), who conducted extensive research of cave culture around the world, is convinced that cavemen constantly used vocalization as a sonar method to prompt locomotion in darkness, and placed marks on the walls in spots where resonance was most noticeable—which led to the emergence of cave art. Reznikoff rightfully stresses that vocalizing in a chamber with strong echo would necessarily amplify the vertical harmonic aspect in horizontal harmony by extending the reverberation and increasing tracing in melodic intervals. Therefore, the intonations that were cultivated outdoors would have transformed their sonic properties: consonant horizontal 2nd suddenly turned into dissonant vertical 2nd. Echo would encourage leaps over steps, favoring such leaps as harmonious 5th and octave. Echo could very well be the primary reason for promotion of khasmatonal music.

Lithophone music could have provided the model for frequent continuous leaps in the melodic line—which are quite unnatural for speech. Many Paleolithic caves in France, Spain, and Portugal contain stalactites painted and covered by marks—which emit pitched tones once they are hit with a stick. It is very possible that cavemen accidentally discovered that rocks had a “voice,” too, and decided to use them to support their own singing. Most lithophones that are within reach of one another generate pitches separated by a leap.

Genesis of Pitch

Singing along with the lithophonic music would encourage singers to tune up their voice and match the stalactite pitch—following the same tuning instinct that governs vocal imitation in primates and cetaceans (Mercado et al., 2014). FMRI testing of singers' performance in response to the accompanying tone which shifted in frequency demonstrated that singers had voluntary control of their voice when the shift was over 200 cent (=2 semitones), but engaged in involuntary pitch-matching response when the shift was 25 cents (Zarate et al., 2010). It is possible that early humans had rougher discrimination of pitch, and involuntarily matched intervals in the order of a semitone (see Part-2).

Dams (1985) undertook a field study of “singing rocks,” and reported the following lithopone scales: F-C-Eb (Roucador), B-D-E-G (Cougnac), C-Eb-F-G-A-C (Nerja). Perhaps, the hexatonic Nerja scale could be the result of human interference: carving the stalagmitic edge to tune a rock higher to his liking . Lithophones could have triggered aspiration for mode making in humans, materializing the concept of pitch, and supplying non-vocal intonations.

Sheikin (2002, p. 30) believes that the first intonations were “psychophysiological”: “natural,” determined by human anatomy and cognitive algorithms that originated from everyday non-musical behaviors. The pre-modal singer discovered capacities of his voice by experimentation.

29. Audio: Assalalaa, Baffin Land. Children game that involves singing until exhaustion of a single breath while heavily wiggling one's body (Nattiez, 1976). http://bit.ly/1FZ2a4J

He learned how to add whistling, growling, and hawking components to a sustained vocal tone (to differentiate it from speech).

30. Audio: Katajjait solo, Hudson Bay. Intense use of timbral variation. http://bit.ly/1F4PL35

These sounds were formatted according to the rhythms of heart-beat and respiration, inherent curves of acceleration/deceleration of the locomotor motions (Honing, 2003), and extraneous rhythms typical to the environment.

31. Audio: Marido paru, Brazil. Bororo work song illustrates rhythm of flint knapping as a prototype organizer of early music (Zubrow and Blake, 2006). http://bit.ly/1HJNHOS

Repertories of common vocal intonations were imitated on early instruments.

32. Audio: Xomuz, Tuva. Imitation of the Khoomei tune on the Jew's harp. http://chirb.it/efber4

The echolaliac instincts motivated attempts to imitate environmental sounds on instruments.

33. Audio: Igil Fantasy, Tuva. Imitation of horse's neighing and trotting on igil, a 2-string fiddle (Levin, 1999). http://chirb.it/1NDkpE

At this point, organophonic intonation—a “song” typical for the voice of particular instrument—was formed. New instrumental intonations were incorporated in an accompanied song.

34. Audio: Vocal imitation of animal calls, the sounds of chomuz and drum, along with instrumental accompaniment, Tuva. http://chirb.it/4w3Gge

In the reverse loop of influence, the brightest instrumental intonations prototyped the vocal ones. Thus, Croatian flat nasal tarankanje singing style imitates the sound of sopile (Boersma and Kovacic, 2006). Notable was the influence of chomuz on the Siberian and Mongolian singing styles (Alekseyev, 1976, p. 107). A resonant fundamental tone of chomuz must have modeled “tonicity” in Khoomei songs.

Ekmelic Mode

Kharlap (1972) traced the interaction of melodic line with folk lyrics and identified the influence of verbal rhyming on rhythmic parallelism. Rhyme's impact on rhythm shapes the intonation. Rhyme in itself contains important musical component: reciting poetry differs from prosaic speech by expanding the vowels, especially in stressed words, using vibrato and increasing harmonic periodicity in the spectral content of voice—all the features typical for singing (Nazajkinskij, 1972, p. 261). Moreover, rhyming reproduces the same intonation at the end of the rhymed strophes. When musical intonation duplicates parallel rhyming of the lyrics, it marks the rhymes with the same pitch, making it perceptually stand out. If intonational stress falls on a stand-alone rhymed syllable, the corresponding pitch obtains the quality of stability. Since cross rhyming is exceedingly common in folklore, musical mode inherits from it alternation as a formative principle: pitches in such early song, unlike tonality, are united not by tonal subordination, but by tonal coordination. One stable tone serves to counterbalance another—each magnetizing a bunch of satellite unstable tones.

Western researchers of prosody also uncovered ties between intonations of speech and music in early monuments of epic poetry and religious chant, across different languages (Cable, 1975). Each language seems to have an assortment of a few rules for conversion of the phonological accents into the melodic pitch-formula, where syllables with greater linguistic stress are set to higher melodic tones. Then, fixation of selected tones in pitch—and strict observance of 3–4 pitch classes throughout the narration becomes a means of hierarchic tonal organization: a way of converting the metric order of words into pitch order of tones. In essence, epos and chant organically produce musical modes .

The most thorough theory of origin of pitch organization in an early mode was laid out by Eduard Alekseyev. Based on his life-long research of his native Yakut music and neighboring Siberian cultures, Alekseyev identified what appears to be the earliest form of mode with an IS. Such mode is characterized by unfixed tuning of all degrees, where some degrees show more permanence in their tuning, presenting less pitch variants upon their reproduction within a song—as compared to other degrees.

Kholopov (1988, p. 117) proposed the term ekmelic to refer to a mode whose PCS includes tones that are unfixed or variable in pitch.

• Melodic consonance,

• scarcity of formulaic intonations, and

• close correspondence between rhythm of the lyrics and musical rhythm (limited sing-out)

—altogether generate a sense of unity that binds the tones of ekmelic song into a mode.

Rhythmic organization in ekmelic music is strictly regular, even monotonous—to compensate for looseness of pitches (Alekseyev, 1976, p. 52). Repetition of the same musical formula for each strophe of lyrics characterizes the oldest Yakut genre, monodic epic olonkho. However, repetitions affect only the melodic contour—exact pitches substantially vary. The very same performer, when repeating the same song, sets the same lyrics to varying pitches unaware of pitch discrepancies. When interviewed, he refers to multiple melodies as “the same” in music structure and musical meaning—and his listeners also share this conviction. Similar isomorphism was found by List (1987) amongst Hopi Indians.

The mathematical problem of defining unfixed ekmelic intervals is best resolved by counting not the absolute distance in pitch, but the numerical order within the mode (Alekseyev, 1976, p. 123) . Below is my realization of Alekseyev's taxonomic idea.

Ekmelic unison is a reproduction of the “same” degree (with possible wandering up or down).

35. Audio: Song of praise to the horse, Mongolia. Unichordal song based on a single degree—probably due to the rhetoric effect of listing all the virtues of the horse that just won the race (Desjacques, 1991). http://bit.ly/1yJDVKI

Ekmelic 2nd is the complimentary relationship between adjacent degrees, different in their melodic function (i.e., one leaning, and another supporting).

36. Audio: Old Woman's Song from olonkho Mighty Er Sogotokh, old epic Yakut style. 2-degree mode a 2nd apart (complementary relation). http://chirb.it/3cNa11

Ekmelic 3rd is the opposing relationship between two tones (adjacent or “over the tone”) of the same function (both leaning, or both supporting).

37. Audio: Baianai Yryata, Algys (invocation of taiga's spirit), dyiretii style (the oldest epic style of Yakut music). 2-degree mode with the ekmelic 3rd between adjacent degrees (opposing relation), responsible for shifting of the upper degree (Alekseyev and Nikolayeva, 1981, p. 67) http://chirb.it/JABE04

38. Audio: Usuiaana ebekkem (Song about Ust'Yan), a Sea chant from the coast of the Laptev Sea, old style. 3-degree mode with the ekmelic 3rd between I and III degrees (opposing), with II degree complementing the III (numeration proceeds in ascending pitch order). The II and III degrees keep shifting together (Alekseyev and Nikolayeva, 1981, p. 66). http://chirb.it/NpN5D5

Ekmelic 4th is the extreme relationship between non-adjacent degrees of different functions—unbound by resolution.

39. Audio: Bisik Yryata. 3-degree mode with the following intervallic set: ekmelic 2nd between II and III degrees (complementing relation), 3rd between I and II degrees (opposing), and 4th between I and III degrees (extreme)—especially the I degree strongly shifts down. http://chirb.it/zJGLkG

According to Alekseyev, ekmelic music hardly includes more than four fixed points, and therefore cannot present more than four functions (leaning, supporting, opposing, or extreme). There is no 5th in ekmelic ISC: when Yakuts encounter a 5th (filled up by 3 degrees) in Russian songs, they regard it as “foreign” (85).

Modal functions determine gravity in ekmelic mode. Complementing (supporting/leaning) and neutral (supporting/supporting) degrees retain their distances.

40. Audio: Bytta-bytta Maaryiabyn (“Beautiful Mary”), lyric song. 3-degree mode is made by adding two complementary 2nds—without forming the 3rd between the I and III degrees. As a result, none of the pitches shift. Ekmelic 3rd is not always equal to 2nd + 2nd. http://chirb.it/5mOz2N

Opposing leaning/leaning degrees become repelled, and tend to increase their distance throughout the song (126). The same applies to extreme supporting/leaning degrees.

Morphological and statistical analysis of such songs, conducted by Alekseyev, reveals the mechanism by which degrees become fixed in pitch, and subsequently shape the mode (129). It involves intonations that turn into formative motifs: they determine musical arrangements by virtue of articulating respiration and parsing of lyrics. Word(s) sung on a single breath is perceived as a single morphological unit by the ekmelic singer. Fenk-Oczlon and Fenk (2009) confirm that the breath cycle shapes perception of both, verbal and musical intonations.

Alekseyev identifies two earliest types of motif-intonations: ascending and descending. The ascending type assigns stability to the initial tone because of trochaic meter that overwhelms Yakut songs.

41. Audio: Dyakhtary Tuoyuu, Love song, ascending inclination http://chirb.it/KMFzky

The descending type leans on the tone that marks the completion of the contour's fall, when it slightly rolls up.

42. Audio: Tuul Yryata, Song in sleep, descending inclination http://chirb.it/znFtxL

Change in melodic direction (in conjunction with metric stress) marks the anchor point—causing the singer to stress the corresponding tone by fixing its pitch (in contrast to the rest of the tones). Majority of ekmelic songs contain two anchors, because the overall melodic motion in a song follows a sinusoid curve, where intonations only differ in phase. The sinusoid shape of ekmelic melodies contrasts the zigzag tendency of khasmatonal melodies. Ekmelic waves provide the most comfortable regulated manner of controlling the pitch. The ongoing oscillation by the same wavelength presents predictable and manageable model for ordering the pitches.

Each song consists of multiple cyclic repetitions of stereotypical formula that usually corresponds to a phrase in the lyrics. There are three options for the formula's start: at the trough, at the peak, or slightly pass the trough (respectively A, B, and C, Figure 1A). The ending points are also well defined (D, A1, and C1, Figure 1A). These points are likely to house fixed degrees of ekmelic mode. Most Yakut songs are built on the framework of two degrees, unless a longer formula leaves space for the third degree.

FIGURE 1

Figure 1. Sinusoid melodic line and phasing of the ascending/descending phrase-intonations. The horizontal dashed lines show the placement of the anchor tones in relation to the sine. (A) Typical starting and ending points for the melodic contour of the following varieties of melodic formulas: initial ascending A-B, A-D, A-A1, A-C1; initial descending B-A1, B-C1, B-B1, B-D1; initial wave figure C-C1 and C-B1. The letters for pitch points reflect functionality of pitches: letters A and B represent marginal pitches, and C and D - intermediate pitches in a 4-degree ekmelic mode. (B–D) Melodic contours of typical ekmelic motif-intonations that comprise phrase intonations indicated by black arrow. The vertical dashed lines indicate the margins between the motif-intonations: a, ascending motif; b, descending motif; c, concave wave motif. This figure is based on four figures from “The Problems of Genesis of Mode” by Alekseyev (1976, p. 134). Used by permission.

The reversal of direction defines the margins between the motif-intonations within a formula: a ascending, b descending, and c wave-like intonations (Figures 1B–D). The configurations a-b, b-c and a-b-c are most common.

43. Audio: Personal song about the native land, Amga region. Wave-like c-b-c formula. http://chirb.it/M7Betn

Greater expenditure of air and muscular effort in ascending singing ties up ascending type with buildup of tension, and descending—with relaxation. Their contrast generates melodic consonance/dissonance:

• Tones that follow a low leaning point (A) become associated with instability and tension.

• Tones that follow an upper leaning point (B) become associated with resolution.

• Leaning point of the ascending type (A) obtains greater gravitational value as compared to the leaning point of the descending type (B).

Alekseyev qualifies such functionality as genesis of the first true modality, and speaks of ascending and descending intonations evolving into modal “inclinations”—in analogy to major and minor inclinations of a key. With the passage of time, the ascending inclination developed into authentic mode, while the descending—into plagal, both of which opposed each other semantically. Each ekmelic inclination is determined by the opening of the melodic phrase—in total opposite to tonality, where the ending determines if the key is major or minor.

As singers developed a sense of coordination in pitch, they explored the idea of going over a degree. This produced a zigzagging melodic contour—which became affiliated with genres of dance, jocular song, and tongue-twister.

44. Audio: Song of the Virgin Abaasi (comically clamorous underworld spirit), from olonkho Urung Aiyy Toyon. Zigzagging formula. http://chirb.it/10PGGL

Next came the idea of skipping over two degrees—very different from khasmatonal leaps. It observed the sequential order of degrees—rather than arbitrarily skipping into marginal registers. Energy, contained within a leap, favored ascending direction. The extra effort expended into such leap prompted an immediate fall in pitch. This is how the fifth melodic type came into being: ascending leap followed by a descending fill-up. This completed the set of five melodic standards of ekmelic music (138).

45. Audio: Devil virgin's song, from olonkho Mighty Er Sogotokh. Leaps characterize the evil character (Alekseyev, 1996). http://chirb.it/47cHHO

It appears that each of these melodic contours is cross-modally connected to spatial perception of vertical height, and associated with a particular emotional state (Hair, 1995). Two experimental studies of pictorial shapes (Lundholm, 1921; Poffenberger and Barrows, 1924) discovered that gradual descending curve is associated with sad/lazy/weak adjectives; gradual horizontal curve—with quiet/gentle; medium rising curve—with merry/playful; whereas steep rising curve—with agitating/furious adjectives.

The greatest specialty of ekmelic mode is that it is scalable (“unfolding”) (Alekseyev, 1976, p. 148): intervallic distances between tones can be proportionally increased or decreased, from semitone to tritone. Transposition of a song often invokes “logarithmic” scaling of intervals toward the upper register. When the singer is asked to sing the same song higher, he compresses its intervals to a smaller compass (Alekseyev, 2013).

46. Audio: Sae Dyige-dyige, comic love song of a woman who has many lovers. Two performances of the same song by the same singer: ambitus of (1) 4th and (2) 3rd. http://chirb.it/g36sC2

Many ekmelic melodic formulas demonstrate the tendency to gradually expand the utmost high and low anchors in the singer's compass further away from the fixed center (Alekseyev, 1976, p. 50) (see the end of Appendix I). Alekseyev compares this effect with the absence of gravity in cosmic interspace (162): when the gravity of anchor points is weak, the tonal inertia can push the marginal tones “out of orbit.” Musical weightlessness manifests itself as relative lack of tonal tension.

47. Audio: It Was a Very Lovely Day When the Water Was Calm, Inuit personal dance-song, Alaska (Boulton, 1955). Series of leaps reduce tonal tension. http://bit.ly/1J1APVV

Similar scalability is found in Nenets (Ojamaa, 2003) and Pueblo Indian music (List, 1985). Sachs (1962, p. 64) noted that shrinking/expanding steps characterized Amerindian music that had no scale-wise tuned instruments. Proportional expansion of ambitus was found in Aboriginal music (Will, 1997). Mpyemo use scales with “mobile degrees” that are re-assigned pitch values in the process of a song (Arom, 2004, p. 25). “Elastic scales” are described by Kubik (1985). Yasser (1932) conceptualized “sub-infra-diatonic scale” (142) based on three “regular” degrees 5th and 4th apart, and “auxiliary” scalable degrees filling in-between—as typolog