We present evidence that the geographic context in which a language is spoken may directly impact its phonological form. We examined the geographic coordinates and elevations of 567 language locations represented in a worldwide phonetic database. Languages with phonemic ejective consonants were found to occur closer to inhabitable regions of high elevation, when contrasted to languages without this class of sounds. In addition, the mean and median elevations of the locations of languages with ejectives were found to be comparatively high. The patterns uncovered surface on all major world landmasses, and are not the result of the influence of particular language families. They reflect a significant and positive worldwide correlation between elevation and the likelihood that a language employs ejective phonemes. In addition to documenting this correlation in detail, we offer two plausible motivations for its existence. We suggest that ejective sounds might be facilitated at higher elevations due to the associated decrease in ambient air pressure, which reduces the physiological effort required for the compression of air in the pharyngeal cavity–a unique articulatory component of ejective sounds. In addition, we hypothesize that ejective sounds may help to mitigate rates of water vapor loss through exhaled air. These explications demonstrate how a reduction of ambient air density could promote the usage of ejective phonemes in a given language. Our results reveal the direct influence of a geographic factor on the basic sound inventories of human languages.

The most significant areas of high elevation on the earth’s inhabitable surface are located in major mountain ranges and associated plateaus. While many mountain peaks over 1500 m in height exist, the large inhabitable areas surrounding the associated mountains are often not themselves at high elevation. This is true, for instance, in the case of some peaks in the Alps and New Guinea. In fact, the vast majority of the world’s inhabitable high altitude surface area is found in six non-contiguous regions that include the world’s largest high elevation plateaus. These regions consist of (1) the North American cordillera, including the Rocky Mountains, Colorado plateau, and the Mexican altiplano, (2) the Andes and the Andean altiplano, (3) the southern African plateau, (4) the plateau of the east African rift and the Ethiopian highlands, (5) the Caucasus range and the associated Javakheti plateau, and (6) the massive Tibetan plateau and adjacent plateaus, most notably the Iranian plateau. In the case of the southern African plateau, one large region in the east and a smaller one in the west exceed 1500 m. Two large regions of the East African rift generally exceed 1500 m (though one of these is divided by a section below 1500 m), yielding a total of four main areas above 1500 m in the case of the African continent. While these are not the only regions of the earth with elevations greater than 1500 m, they represent the bulk of the high elevation surface area inhabited by humans and are readily apparent in charts of polygons exceeding 1500 m, for instance the one provided in [8] .

As is noted in one prominent survey of world regions of high elevation, only approximately 15% of the world’s occupied surface area is located at high altitude, typically defined as elevation exceeding 1500 m above sea level [8] . Less than 10% of the world’s population resides in such high altitude areas, and the median person resides at 194 m [8] . Despite the fact that only 15.6% of inhabited land lies within 100 m elevation of the sea, some 33.5% of people live on lands below 100 m. This tendency has become even more pronounced since the publication of [8] , as a majority of the world’s largest and growing metropolitan areas are found at or near sea level. Clearly humans tend to gravitate towards lower-lying areas, with relatively few people living in areas of high elevation.

In order to test the hypothesis that the presence of ejective consonants correlates positively and significantly with elevation, we analyzed the locations and elevations of all languages for which data are provided in Maddieson’s typological database of glottalized consonants including ejectives [7] . The database represents the most comprehensive survey of such sounds and was designed so as to fairly represent all world regions while avoiding overreliance on any particular language families. The geographic component of our data collection and analysis was carried out via Google Earth and ArcGIS v. 10.0, after importing the languages’ coordinates from the database into these programs.

We hypothesized that, if geographic factors do somehow directly impact phonemic inventories contra the common assumption in linguistics, the factor most likely to have such an impact would relate to atmospheric conditions. In particular, we speculated that atmospheric pressure might impact the production of non-pulmonic sounds, which do not rely on air egressed from below the larynx. More specifically, we generated the following heuristic conjecture: Ejective phonemes might be more likely to occur in areas of high elevation. This guiding hypothesis was based on simple physical modeling of the vocal tract, discussed below. In short, we speculated that the articulation of ejective consonants might be facilitated by reduced atmospheric pressure. These sounds are the only egressive non-pulmonic sounds in human languages, and involve the compression of air in the pharyngeal cavity, typically via the elevation of a closed glottis [6] . Since atmospheric pressure is reduced at higher elevation, we speculated that this compression would be more easily achieved in locations of relatively high elevation. The evidence we present is consistent with our initial hypothesis, though as we note below there are at least two plausible explanations for the geographic-phonetic correlation we have uncovered.

It is generally assumed that the worldwide variation of sounds in human languages is largely arbitrary [1] , [2] . That is, cross-linguistic disparities in phonological patterns are assumed to be primarily due to stochastic variation in the phonetic gestures relied upon in particular languages. Diachronic influences resulting from linguistic affiliations, both areal and familial, do yield some tendencies in the regional distributions of phonological patterns. In addition, some linguistic sounds are more common due to their relative ease of articulation or perceptual salience. Nevertheless, cross-linguistic phonetic and phonological variation is presumed to be fundamentally arbitrary in the sense that it is not due to nonlinguistic influences such as the geographic context in which a language is produced. One recent strand of research, however, has challenged this basic assumption by offering compelling evidence that warmer climates correlate positively with the degree of sonority of a given language, at least in a small though diverse sample of about sixty languages [2] , [3] , [4] . According to such work, more sonorous phonological features (such as simple syllables with higher rates of vowel occurrence and greater mean amplitude) are more likely to occur in languages spoken in warmer climates, putatively because cultures in hotter places rely more heavily on communication at greater distances. Assuming this pattern of sonority holds for larger samples of the world’s languages, its geographic impetus is indirect since the true motivation is supposedly relative proximity of interlocutors during typical communicative events. The pattern is also claimed to relate to factors such as terrain type and flora density, as well as cultural variables such as degree of sexual expressiveness [2] . The direct influence of a geographic variable on a language’s sound system has yet to be demonstrated. Here we offer evidence for a direct geographic effect on arguably the most basic facet of phonology, the inventory of phonemes in a given language. This evidence is based on the analysis of data from 567 languages, or approximately 8% of the world’s estimated total of 6,909 languages [5] .

Results

The locations of the languages in our sample are plotted in Figure 1. (The world’s major regions of high elevation are plotted in the inset of the figure.) For the sake of clarity, a large portion of the Pacific Ocean is omitted from the figure and, as a result, a handful of the 567 language locations are not depicted. The language locations are based on the latitude and longitude coordinates offered in [7], which were chosen in accordance with the location-finding criteria relied upon by the World Atlas of Linguistic Structures (WALS), of which Maddieson’s rich survey represents one chapter [9]. Since languages are treated as individual data points through these criteria, the geographic distributions of some widespread languages are treated as singular locales that reflect in many cases the larger area in which they developed. For instance, English is represented via one location in England only. In the vast majority of cases, languages are in fact spoken in relatively constricted areas geographically. After all, the median number of speakers of a language is approximately seven thousand, all of whom tend to live in relatively confined locales [5]. The WALS locations were selected to represent well the geographic centers of such locales [9].

PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Figure 1. Plot of the locations of the languages in the sample. Dark circles represent languages with ejectives, clear circles represent those without ejectives. Clusters of languages with ejectives are highlighted with white rectangles. For illustrative purposes only. Inset: Lat-long plot of polygons exceeding 1500 m in elevation. Adapted from Figure 4 in [8]. The six major inhabitable areas of high elevation are highlighted via ellipses: (1) North American cordillera (2) Andes (3) Southern African plateau (4) East African rift (5) Caucasus and Javakheti plateau (6) Tibetan plateau and adjacent regions. https://doi.org/10.1371/journal.pone.0065275.g001

The languages are categorized into two groups for the purpose of the present study. The first group is comprised of those languages with ejective phonemic consonants (n = 92), and the other consists of all remaining languages (n = 475) in the data set. Our grouping of languages into these two categories is derived from Maddieson’s more detailed categorization, which includes the pertinent information on basic ejective status. Just over 16% of the languages in the sample contain ejectives.

At the coarsest level, Figure 1 reveals that there is a discernible visible correlation between the six aforementioned major regions of high altitude and the locations of languages with ejective consonants. We see as well that there are eight visual clusters of languages with ejectives, highlighted via white rectangles. Two of the largest of these are located within the North American cordillera. Another is located immediately to the east of the cordillera, on the associated Colorado plateau. A fourth cluster is located just southeast of Mexican altiplano. A fifth cluster is located on the southern African plateau. The sixth and seventh clusters are located along the East African rift, on two areas of the plateau associated with this rift. The eighth cluster is located in the region of the Caucasus mountains and the Javakheti Plateau. In addition, a glance at South America reveals that a number of the languages with ejectives on that landmass are located in the Andean cordillera or on the Andean altiplano in Bolivia, as Maddieson has noted [7]. In Figure 1 a dashed rectangle highlights two proximate languages with ejectives spoken on the altiplano, to underscore this Andean bias. Remarkably, then, the clusters of languages with ejectives tend to be located on or very near five of the six major non-contiguous regions of high elevation on the earth’s inhabitable surface. The only major region of high elevation where languages with ejectives are absent is the large Tibetan plateau, along with adjacent regions of high altitude. It is not particularly surprising that one region should present such an exception, and in fact it strikes us as remarkable that only one region presents an exception.

So we can state that visible clusters of languages with ejectives are without exception located at or near one of the major regions of high elevation. Conversely, some of the richest areas of the world linguistically, in terms of languages and linguistic stocks, are largely devoid of languages with ejectives. The areas in question are Oceania (including New Guinea and Australia), Southeast Asia, West Africa, and Amazonia. Notably, all of these dense linguistic areas lack major regions of high elevation. In short, a visual analysis of the worldwide distribution of ejective languages suggests they are located at or near prominent areas of high elevation, and are markedly absent in large regions of low elevation, even though many of the latter regions are linguistically dense.

Such a coarse approximation is suggestive but inconclusive. To analyze the data in a more fine-grained manner, we ascertained the distances of all of the language locations from the nearest boundary of a major landmass exceeding 1500 m in elevation (regardless of the landmass in question). These values were derived via the distance and elevation measurement tools in Google Earth. A standardized approach to measurement was adopted, whereby an elevation map was consulted to find the closest regions of high elevation. The distance between a given language and these regions was then tested, and the shortest obtained distance was tabulated in the case of each language location. Approximately half the distance figures were tabulated by someone besides the author, a second data collector trained with this method. This second distance examiner was unfamiliar with the hypothesis being tested. No significant discrepancies were found in a contrast of the distances obtained by the author and the second data collector, when they analyzed the same set of sample data points. In short, the methods were found to be reliable across data collectors, yielding consistent distance measurements.

Remarkably, 57 of 92 (62%) languages with ejectives are located in high elevation ‘zones’, which are defined here as major regions greater than 1500 m in altitude, plus land within 200 km of such a region of high altitude. This finding is in itself surprising since, once again, only about 15% of the world’s inhabited surface area can be described as being at high elevation. In contrast, only 96 of 475 languages (20%) without ejectives are located in high altitude zones, i.e. in a major region greater than 1500 m in elevation or within 200 km of such a region. If, for the moment, we treat language locations as independent data points, we find that the disparity in the distribution of the two language groups is significant. This is apparent in Table 1.

Even more remarkably, 80 of 92 (87%) languages with ejectives are located within 500 km of a region exceeding 1500 m. In contrast, only 202 of the remaining 475 languages (43%) are so located. As we see in Table 2, this disparity too is highly significant. Another way to frame these results is to note that only 12 of 285 (4%) languages located further than 500 km from high elevation contain phonemic ejectives.

Clearly languages with ejectives evince a marked tendency to occur at or near areas of high elevation. One could object, however, that this tendency in the overall distribution may be due to the location of particular linguistic families or areas that happen to have ejectives. Such familial bias could lead to autocorrelation between data points (Galton’s problem). For instance, the fact that many languages of the Pacific Northwest have ejectives and are also located at or near high elevation could yield an overall impression of geographic influence that is merely epiphenomenal. In a similar vein, ejectives could just happen to be characteristic of certain language families that are coincidentally located in high elevation zones. Such objections would be difficult to maintain, however, if numerous language families were represented by the languages with ejectives in high elevation zones, and if such languages were clustered in many diverse geographic regions. To adopt the most conservative perspective towards the data, then, we carefully considered the locations of the eight clusters of languages with ejectives, highlighted in Figure 1, and found the mean geographic center of each of these clusters (i.e. the mean latitude and longitude of the languages in a given cluster). Crucially, the mean center of seven of the eight clusters of languages with ejectives occur within high elevation zones. This distribution is also significant, as evident in Table 3. In the table, languages without ejectives are treated as clusters also, by assuming clusters contain ten languages each, in keeping with the approximate size of the clusters of languages with ejectives. In this way we treat the results as conservatively as possible vis-à-vis our hypothesis, by assuming that the overwhelming pattern in Table 1 is the by-product of the distribution of a much more modest number of regional clusters of languages that happen to share phonetic characteristics due to areal influence. Even if this simplifying assumption is made, however, the distribution of clusters of languages with ejectives is striking. We should note as well that the lone exception, in which the mean geographic center of a cluster of languages with ejectives occurs further than 200 km from an area higher than 1500 m, is for a cluster on the southern African plateau that is only marginally further away from high elevation, at 380 km. Tellingly, the geographic center in question is itself located at a relatively high elevation of 1100 m.

The distribution in Table 3 is particularly remarkable given that the eight clusters in question are all geographically non-contiguous. They are separated by thousands of kilometers and oceans in many cases. Yet they all occur at high elevation or immediately adjacent to high elevation zones. Clearly, then, the marked tendency for languages with ejectives to occur in high elevation zones is not merely due to the distribution of such languages in one or a few language areas. It is also not simply the result of the undue influence of any particular language family or subset of language families, since all of the clusters of languages with ejectives represent multiple language families. More generally, the languages with ejectives in high altitude zones represent myriad language stocks including Southern Khoisan, Central Khoisan, Caucasian, Athapaskan (Na-Dene), Semitic (Afro-Asiatic), Lezgic (Nakh-Daghestanian), Armenian, Aymaran, Hadza, Mayan, Salishan, Cahuapanan, Quechuan, Siouan, Cushitic (Afro-Asiatic), Nilo-Sharan, Oto-Manguean, and Eyak (Na-Dene).

On the North American landmass (including Central America), 27 of 38 (71%) languages with ejectives occur in high elevation zones. For languages without ejectives in that same landmass, the ratio is smaller at 26 of 47 (55%). The disparity is even more apparent on other continents. In the case of South America, 7 of 13 (54%) languages with ejectives are found in high elevation zones, in contrast to 21 of 63 (33%) languages without ejectives. In Africa, 12 of 21 (57%) languages with ejectives occur in high elevation zones, whereas only 5 of 106 (5%) languages without ejectives do. Finally, in Eurasia, 11 of 13 (85%) languages with ejectives occur in a high elevation zone, while only 44 of 133 (33%) of the remaining languages do. It is worth re-stressing as well that languages with ejectives falling outside high elevation zones tend to occur very close to such zones, since worldwide only 12 of 92 (13%) languages with ejectives are located further than 500 km from regions of high elevation. In contrast, 273 of 475 (57%) languages without ejectives are located further than 500 km from major regions exceeding 1500 m in elevation.

Clearly, there is a marked cross-group disparity in terms of how proximate languages are vis-à-vis major inhabitable regions at high elevation. To more clearly appreciate this disparity, we examined in greater detail the locations of the languages on the four major landmasses. In the case of such languages with ejectives outside high elevation zones, the mean distance to a high elevation region is 788 km (n = 28). In contrast, the mean distance for languages without ejectives outside high elevation zones is 1937 km (n = 253). This disparity is highly significant. (t = 3.63, df 279, p<.0001). As we see in Figure 2, this inter-group difference is not simply due to the distribution of languages on any one particular landmass. In the case of each landmass, the languages with ejectives outside high elevation zones represent multiple language families and disparate geographic areas. Despite this heterogeneity, languages with ejectives outside high elevation zones were generally closer to such zones when contrasted to languages without ejectives. We should note that, in adopting a conservative approach to the data, we considered the distance of language locations from any clear major inhabited region of high elevation, not just the six principle regions of high altitude outlined in above. For instance, many of the distances for languages without ejectives in Europe were calculated with respect to the Alps or the Anatolian plateau. It is worth mentioning that, with respect to Africa, we did not consider the Atlas mountains to be a major region of high elevation, since the inhabitable area above 1500 m is comparatively minor when contrasted to the other African regions of high elevation mentioned above, and since the range is separated from the bulk of African languages by a major geographic barrier (the Sahara). This decision had little impact on the overall African analysis, since the disparate geographic distribution of languages with ejectives and without is so overwhelming on that continent.

PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Figure 2. Distances (in km) of language locations from regions of high elevation. Dark lines represent means for only those languages outside high elevation zones. Numbers at the bottom of each column represent total languages within high elevation zones, i.e. within 200 km of a region higher than 1500 m. Findings for individual languages are available from the author. https://doi.org/10.1371/journal.pone.0065275.g002

The typical distance from a high elevation region was much greater for languages without ejectives, regardless of continent, even when only those languages outside high elevation zones were considered. Note that in the case of the Eurasian landmass, all languages with ejectives are located in the region of the Caucasus mountains. There are only two exceptions to this pattern, which are quite distant from high altitude zones. Interestingly one of these exceptions is a language, Korean, in which the status of ejectives is actually dubious [7].

We do not consider data from other regions in this portion of our analysis since languages with ejectives are not typically found anywhere distant from the zones of high elevation on the four principal landmasses. Since the association in question surfaces for all four of these landmasses, the tendency for languages with ejectives to occur near high elevation zones is obviously not a simple by-product of the distribution of languages on any one continent. Table 4 presents the results of t-tests contrasting the distances from regions exceeding 1500 m in elevation, for all those languages outside high elevation zones. Eurasia is not included in the table since only two languages with ejectives on that continent occur outside high elevation zones. The results for South America approach statistical significance, despite the fact that only six languages with ejectives are located outside high elevation zones on that continent.

While we are interested in a worldwide association between ejectives and high altitude, the consideration of data from one landmass can be elucidative. We feel this is particularly true in the case of Africa, since there are several clusters of languages with ejectives on that continent, and since there are four principal areas of high elevation located on the continent’s two major plateaus. In contrast, consider that there is one principal region of high altitude in each of North America and South America, respectively, stretching primarily along a north-south axis, with languages deviating from this axis principally in terms of longitude. In the case of Eurasia, the regions of high altitude stretch primarily along an east-west axis. In a sense, then, Africa is the clearest test case for the claim that languages with ejectives tend to be located near regions of high altitude, since there are more ways in which such languages can deviate spatially from such regions given the size and placement of the high altitude zones on the continent. Despite this fact, however, it is clear from Figure 2 and Table 4 that languages with ejectives in Africa generally occur near regions of high elevation. In fact, the distribution of languages evident in Figure 2 suggests that the association between languages with ejectives and high elevation is most pronounced on the African continent. There are only two African languages in the sample (Hausa and Kotoko) that are located more than 1000 km from a region of high elevation. These two central African cases are visibly discernible in Figure 1. In that figure, it is readily apparent how they are isolated vis-à-vis the bulk of African languages with ejectives, which are generally clustered near high elevation regions. More specifically, they are clustered near high-elevation regions (3) and (4) in our list of major high elevation regions offered above. (The regions are evident in the plot of high elevation polygons offered in the inset of Figure 1.).

The clear correspondence between the locations of African languages with ejectives and major regions of elevation greater than 1500 m is particularly striking given that only a modest portion of Africa’s landmass is at such high altitude, and given that the regions of high elevation are comparatively scattered when contrasted to the major regions of high elevation on the other three major landmasses. Furthermore, each of the three African clusters of languages with ejectives represents multiple language families. In short, Africa offers a compelling illustration of the strong worldwide association between areas of high elevation and the usage of phonemic ejectives.

To this point, we have focused on the location of languages with respect to regions of high elevation. We also ascertained the actual elevation of each language point in the data set, including those outside the four major landmasses. We should note that these elevation figures are generally conservative with respect to our hypothesis, since many language locations are near regions of high elevation but are given low elevation scores. Most notably, in the Pacific Northwest many languages with ejectives were found to be at low elevation since their coordinates occur near the ocean. So not surprisingly the elevation figures for languages in this region, in which ejectives are a common feature, were often found to be low–despite the fact that the speakers of these languages also subsist in nearby mountainous areas, not just along the coast. Despite this conservative influence on our data, however, a significant difference was found between the elevations of languages with ejectives and those without. The mean elevation for all languages without ejectives was 631 m. (This does not imply that most people live at this elevation, since many of the world’s most-widely spoken languages like English and Spanish have low elevation values but are only considered singular data points in such an analysis.) In the case of languages with ejectives, the mean elevation was 955 m, a full 51% higher. This difference was significant. (t = 3.84, df = 565, p = .0001) In the case of languages without ejectives, the median elevation was 340 m. (This figure is nearly half that of the mean elevation for this group, in part since the mean elevation is influenced by outliers in the Himalayas.) In the case of the languages with ejectives, the median elevation was 668 m, a full 96% greater than the median of the remaining languages. While elevations differed on a by-continent basis, the disparity exhibited by the two language groups was clearly not simply the result of their distribution in any one major region, as we see in Table 5.

The only case in which there is not a significant disparity in the elevations of the two language groups is the North American landmass. As we observed in Figure 2 and Table 4, however, languages with ejectives on that landmass do occur closer to high elevation zones at a significantly greater rate, when contrasted to languages without ejectives. Given that the elevation of so many ejective languages in the Pacific Northwest is taken from points near the sea level, however, it is not surprising that no noticeable disparity is observed between the two language groups on this landmass, in terms of absolute elevation. In addition, it is worth noting that there are a number of languages without ejectives at high elevation on the Mexican altiplano. Again, though, in terms of location vis-à-vis high elevation zones, we have already found a robust difference between the languages with and without ejectives on the North American landmass. This difference in locations was observed for all four major landmasses, and the elevation figures offered in Table 5 further corroborate the worldwide correlation uncovered. This worldwide correlation is also apparent in Figure 3, in which we have plotted the elevations of all the language locations in our database, according to the landmass on which they occur. Note that the plots of the data points in the column labeled ‘World’ represent all 567 languages, including those from Australia, New Guinea, Indonesia, Melanesia, Polynesia, and elsewhere.

Perhaps the most remarkable facet of the elevation data gleaned from our analysis is presented in Figure 4. As we see in the figure, as elevation increases so does the likelihood that a language found at that elevation utilizes phonemic ejectives. Figure 4 is also based on all of the 567 languages in the data sample. (See Data Set S1 for the elevation of each language with ejectives.).

Given the robust nature of the relationship between the locations of languages with ejectives and high elevation zones, which is observed on a global scale, we are left to conclude that there is some underlying motivation for this correlation, which clearly cannot be explained in terms of the influence of particular linguistic families or in terms of the coincidental spread of ejectives across languages in particular regions. In the following section we offer two plausible motivations for the correlation, and then discuss the implications of our finding.