We demonstrate that a model with no prior knowledge of complex concerted or regular changes can nevertheless infer the historical timings and genealogical placements of events of concerted change from the signals left in contemporary data. Our model can be applied wherever discrete elements—such as genes, words, cultural trends, technologies, or morphological traits—can change in parallel within an organism or other evolving group.

Linguistic evolution, unlike the genetic substitutional process, is dominated by events of concerted evolutionary change. Our model identified more than 70 historical events of regular sound change that occurred throughout the evolution of the Turkic language family, while simultaneously inferring a dated phylogenetic tree. Including regular sound changes yielded an approximately 4-fold improvement in the characterization of linguistic change over a simpler model of sporadic change, improved phylogenetic inference, and returned more reliable and plausible dates for events on the phylogenies. The historical timings of the concerted changes closely follow a Poisson process model, and the sound transition networks derived from our model mirror linguistic expectations.

Concerted evolution is normally used to describe parallel changes at different sites in a genome, but it is also observed in languages where a specific phoneme changes to the same other phoneme in many words in the lexicon—a phenomenon known as regular sound change. We develop a general statistical model that can detect concerted changes in aligned sequence data and apply it to study regular sound changes in the Turkic language family.

The phonetically coded data for each language were then multiply aligned by identifying cognate sites within each word (analogous to homologous gene-sequence alignment). This yielded a 26 languages × 1,120 sites matrix, where a site represents an aligned column of speech sounds.

This general approach, when applied to linguistic data, allows us to trace the temporal patterns of phonemic change among a set of related languages. Here we fit the model to lexical data corresponding to 225 etymological classes in 26 Turkic languages that were phonetically coded following the North American Phonetic Alphabet for 62 phonetic symbols []. Ideally, the analysis would be carried out on phonemically coded data, but most available data sets only provide a standardized orthography that occasionally distinguishes allophones. In practice, this means that the results for a specific language could depend upon whether its transcription data were consistently subphonemic or phonemic relative to other languages in the data set. To the extent that such allophonic differences are regular, our analyses will not be affected.

The sporadic change matrix is estimated as a single homogeneous process that applies throughout the tree. For protein sequence data, the model must estimate 380 distinct transition rates ([20 × 20] − 20) in the sporadic change matrix; for a phonetically transcribed data set of 62 distinct speech sounds, this number rises to 3,782 ([62 × 62] − 62). We therefore adopt a reversible-jump MCMC procedure that we have described elsewhere [] to reduce the number of statistically distinct parameters. In comparison to the single sporadic matrix, the concerted or regular changes are discovered statistically on a branch-by-branch basis. The model proposes a separate sound change matrix and its position within the branch for each regular sound change that it identifies ( Experimental Procedures ).

We implement the model in a Bayesian Markov chain Monte Carlo (MCMC) approach ( Experimental Procedures ) that, when applied to a set of related sequences, simultaneously estimates posterior distributions describing the phylogenetic trees or genealogies, and the matrices that record the instantaneous rates of change from one phoneme (gene, amino acid) to another either at a single site (sporadic changes) or simultaneously at multiple sites (regular changes). The model places no constraints on the nature, rate, or temporal patterning of either sporadic or regular changes, starting instead with a set of uniform prior beliefs and then estimating all rates and patterns of change from the historical traces or imprints these changes have left in the contemporary data.

In contrast to genetic evolution, some historical linguists maintain that all sound changes are regular, with apparent irregularities arising from a number of processes working simultaneously, but others allow that sporadic effects also occur []. We will classify as irregular or sporadic all changes where there is not statistical evidence to support a concerted change. Some of these could be examples of rare regular changes, or of changes that occur in only a few phonetic contexts ( Box 1 ).

In a linguistic context, sporadic changes refer to the replacement, over some arbitrary interval of time, of one phoneme in one place by another and are analogous to single nucleotide or amino acid substitutions in gene sequences. Concerted or regular changes describe the parallel change of one discrete element such as a nucleotide, phoneme, or amino acid to the same other discrete element at many different sites ( Box 1 ).

We adopt a phylogenetic-statistical perspective that allows us to document events of concerted change that have occurred throughout the genealogical history of a linguistic or biological family, infer their historical patterning, and determine the rate and frequency with which they arise in nature []. The statistical model we develop implements a fully probabilistic description of the sporadic or irregular and concerted or regular changes that characterize the temporal patterns of substitutions in strings of inherited information such as DNA or sound sequences as they evolve along the branches of the phylogenetic trees that record their evolutionary histories.

Usefully, the genetic and linguistic phenomena share fundamental properties relevant to their statistical characterization. Phonemes are the units of sound that make up words and distinguish one word from another, just as the four nucleotide bases (A, C, T, G) make up DNA gene sequences or the 20 amino acids make up protein sequences. The number of distinct sounds in a language varies greatly, but somewhere around 30–60 phonemes are commonly sufficient to describe the range of distinctive sounds in a language’s words []. Collections of words can therefore be thought of as providing phonemic “sequence information” that might be informative as to the history, rate, and patterns of concerted evolutionary change in language, and in a manner analogous to sequences of DNA.

Can events of concerted change be detected statistically in sequence data, and do they improve the characterization of evolution and the inference of evolutionary histories? Although previous researchers working in a linguistic setting have used the concept of regular changes to build algorithms for automatically inferring cognacy, to our knowledge the model we report here is the first probabilistic description of concerted change. This places concerted evolution in a statistical setting that allows for formal hypothesis testing about the nature and rates of concerted changes. For example, the question of how many parallel changes are required to be recognized as an instance of concerted change is naturally dealt with in our model: the statistical signature of concerted or regular change is that the multiple parallel events are more probable if treated as a single coordinated change than as a collection of independent changes ( Box 1 ).

Currently, our model implements a general “context-free” description of concerted evolution applicable to a range of evolving systems, including genes and proteins. The theory can be extended to include context-dependent regularities ( Discussion Supplemental Experimental Procedures ; []), but in this work we focus on the improvement that arises solely from unconditioned regularity of sound changes, and statistical methods for detecting such concerted evolution.

In some cases, a change such as q to x will depend upon its context, that is, on other sounds in the word. A hypothetical example of context would be if leading q sounds in Shor words remained as q sounds in Khakas words when the leading q was followed by an e, but changed from q to x if followed by a or u vowels as above.

By comparison, the model of concerted or regular change identifies these 34 events as a single instance of concerted change across the affected sites. If we denote the probability of a regular linguistic change from q to x by P r ( q → x ) , then as the number of events n increases, there will be a point at which P r ( q → x ) > P s ( q → x ) n , and it will become statistically more probable to treat n events as a single instance of regular change. Not all instances of x and q will necessarily interchange between two languages, but if a sufficient number do, they are statistically more probable if treated as a single event of “regular” change.

A conventional sporadic change model would count the 34 transitions from q to x as 34 independent events. If the probability of a single sporadic change is denoted by P s ( q → x ) , then the probability of observing 34 independent q-to-x transitions is P s ( q → x ) 34 .

Given the corresponding sounds in all other Turkic languages, the ancestral sound for these two sister languages is most likely q. This means that these x’s in Khakas arose following a Shor-Khakas split.

Four pairs of words from closely related Siberian languages—Shor and Khakas ( Figure 2 )—are shown below. In each case, the leading q in Shor corresponds to an x in Khakas (leading x and q shown in italics). In total, there are 35 aligned positions in our data where q appears in a Shor word, and in 34 of these, x occurs in the same position in Khakas. The one exception is the Khakas kirə- “to grow old,” which is qarɨ- in Shor.

Linguists have long recognized concerted change that affects copies of the same sound (or phoneme) appearing in different words as a central feature of linguistic evolution []. A well-known example is thep>f sound change in the Germanic languages wherein an older Indo-European p sound was replaced by an f sound, such as inpater>father, orpes, ∗pedis>foot (linguistic convention is to use the “>” symbol to indicate a transition from one sound to another, and here thesymbol denotes a reconstructed ancestral form). These multiple instances of one phoneme changing to the same other phoneme yield regular sound correspondences between pairs or groups of languages. Linguists have proposed several explanations for the regularity of changes grounded in a number of basic processes, including speech production, perception, and cognition [].

Concerted evolutionary change is widespread in genetic systems, being implicated in the genome-wide control of repetitive elements [], the evolution of gene families [], and homogenization of Y chromosome sequences [] and as a means by which asexual organisms might escape the debilitating consequences of Muller’s ratchet []. It might arise from several mechanisms, including homologous recombination, that allow certain favorable elements to spread or damaging elements to be neutralized.

The observed range of 14 in the number of regular sound changes per language is, however, wide, being expected to occur in approximately 0.68% of outcomes ( Figure 5 D). The outgroup, Chuvash, with 15 regular sound changes, might be unusual in having four phonemes that are unique among this group of languages. These four phonemes account for five of the regular sound changes in the branch leading to Chuvash. Removing these five, Chuvash with ten events yields a range (10–1) that now falls well within the Poisson expectation.

Following expectations, the cumulative density of the observed number of events per branch (including branches with no regular sound changes) shows a close fit to the Poisson expectation ( Figure 5 B). The 21 branches in which no regular sound change occurred, along with those in which multiple events are inferred, can all be considered as samples from the same underlying stochastic process. A further characteristic of the Poisson process is that waiting times between successive events follow an exponential distribution. The distribution of waiting times between successive events of regular sound change on the phylogeny shows a striking fit to this expectation ( Figure 5 C).

The number of regular sound changes in a language’s history ranges from a low of 1 in Karaim and Balkar to a high of 15 in Chuvash ( Figure 2 B; the low count for Karaim might reflect phonetic transcription practices). The temptation is to interpret these as indicating different intrinsic rates, or perhaps different external pressures, for sound change, but large differences in the numbers of regular changes can arise among languages simply as a result of random fluctuations and shared phylogenetic histories. Thus, if events of regular change occur randomly at a constant rate (as in Figure 5 A), then the number of such events per branch of the tree is expected to follow a Poisson distribution with mean rate given by 0.0026 × t, where t is the length of the branch in years.

Regular sound changes emerge from our analyses as occupying a central role in sound evolution, consistent with the expectations of historical linguists []. These regular sound changes accumulate approximately linearly in time, implying a constant rate of about 0.0026 regular sound changes per year (approximately one every 385 years) averaged over the tree ( Figure 5 A). The linear trend suggests that the model is not missing regular sound changes that occur deeper in the tree (i.e., older events) and supports a “uniformitarian” view—that this family of languages has been changing in the same ways throughout its history, an important assumption for statistical inference and ancestral reconstruction.

(D) Expected range (max-min) of regular sound changes occurring in the histories of the 26 Turkic languages. Data were generated from 10,000 simulations of the Poisson expectation in each of the branches of the tree in Figure 2 . Yellow triangle shows observed range (15–1); yellow square shows range adjusting for unique phonemes in Chuvash (see text).

(C) Cumulative waiting times until the next regular sound change event (purple) and best-fit exponential distribution (gray). Exponential mean = 303 years; 95% confidence interval includes 385 years or 1/0.00262. The exponential provided the best fit when compared against gamma, Weibull, and log-normal cumulative densities.

(B) Expected Poisson (gray) and observed (purple) number of regular sound changes per branch. Expected values generated from a Poisson distribution with mean 0.0026 × t were calculated for each branch of the tree, where t is the length of the branch in years (generalized linear model test of deviation from Poisson expectation not significant: χ 2 = 16.95, df = 14, p > 0.26).

(A) Approximately linear trend in the cumulative frequency of regular sound changes through time, indicating a constant rate of regular sound change of about 0.0026 events per branch per annum; trend is counts of regular change events per unit time in the tree, averaged across the posterior sample of trees. Purple line is the mean trend; yellow line is 1:1 trend.

Thus, among the 43 regular consonant changes, 79% (n = 36) involved only a single change in one of the following: (1) voicing, (2) place of articulation (based on four categories: labial, dental/alveolar, postalveolar/palatal, and uvular/velar/glottal), or (3) manner of articulation (e.g., affricate to fricative), against a null expectation of 29% (χ= 50.9, p < 0.0001). Among the 30 vowel transitions, 70% (n = 21) involved only a single change in one of the following: (1) front-central-back, (2) open-mid-closed, or (3) rounding, against a null expectation of vowel pairs of 45% (χ= 7.5, p < 0.01) ( Table S1 ).

The regular sound changes (red lines in Figure 4 ) form a subset of the larger sound transition network, and sporadic and regular changes seem to obey the same rules. Consonantal changes group into subsets of articulation categories defined by the place and manner of vocal articulation. Sounds closer in speech production change to one another more readily than those further apart, highlighting a gradual or stepwise process of language change following “shortest routes,” similar to the phenomenon observed in protein evolution wherein amino acids are frequently replaced by amino acids with similar biochemical properties [].

The transition rate matrices that characterize the sporadic and regular sound changes define a network of connected phonemic substitutions or transitions that arise over time as words evolve at the level of their sounds ( Figure 4 ). The network identifies the two major recognized [] divisions of highly interconnected sound changes among pairs of consonants (mean transition rate/10years = 0.0061 ± 0.028) and among pairs of vowels (mean rate = 0.0091 ± 0.0373). Transitions between these two broad categories are rare, with a mean rate = 0.001 ± 0.003, corresponding to an approximately 0.2% chance of an ancestral consonant or vowel changing to the other category in 2,000 years. The network also finds the linguistically important bridge between consonantal and vowel changes through the high vowels (in particular through the semivowel or semiconsonant “w”).

Transitions among consonants (circles) and among vowels (squares) are frequent and regular (many connections) but are rare between them, save for those mediated by the semivowel w. Transitions are more frequent among sounds with similar places of articulation: consonants are coded as bilabials-labiodentals (red), nasal (light green), uvular-velar-glottal (purple), postalveolar-palatals (blue), and dental-alveolars (green); vowels divide into high (gray) and higher-mid to low (white) subsets. Blue lines denote sporadic transitions, with thicker lines denoting faster underlying rates. Red lines denote regular changes; arrows indicate the direction of change.

These figures include instances in which the ancestral sound was partially retained, cases for which the regular model might not be expected to improve upon the sporadic model. For 179 of the proposals the ancestral sound is not retained, and for these, the model of regular change yields an approximately 4-fold geometric mean improvement (mean ratio 3.71 ± 5.14, range = 0.14 to 150.12) and is similar for vowels and consonants (vowels = 3.72 ± 5.40, consonants = 3.70 ± 4.90). A 4-fold improvement corresponds to the sporadic model assigning less than a 0.25 total probability to the proposed descendant sounds (mean = 0.16 ± 0.11).

Overall, the model of regular change approximately doubles the probability of correctly predicting the descendant sounds, as estimated using a geometric mean of the ratios to account for positive skew (geometric mean ratio = 1.87 ± 2.98, range = 0.14 to 150.12, n = 371 language X ancestral sound combinations), performing somewhat better for vowels (mean ratio = 3.38 ± 5.38, range = 0.47 to 150.12, n = 97) than for consonants (mean ratio = 1.52 ± 2.46, range = 0.14 to 39.72, n = 274). This difference in performance might merely be because vowels change more readily (faster) than consonants and so are more likely to show a change from the ancestral state.

Red-tinted cells in Figure 3 denote instances where the regular change model improves on the sporadic model (ratio > 1 to 10) and generally correspond to cases in which the ancestral sound has been replaced by one or more different descendant sounds ( Table S2 ). White cells correspond to ratios of approximately 1:1 and are typically cases in which ancestral sounds have been partially retained in the descendant languages. Blue-tinted cells record ratios < 1 where the regular model performs worse than the sporadic model.

For each of 634 proposed sound changes in the 23 languages ( Figure 3 Table S2 ), we calculated the probabilities that the regular and sporadic change models assigned to the descendant sound, conditional upon the ancestral sound. Where more than one sound change is proposed to have occurred from the same ancestral sound, we summed the probabilities over all of the proposed descendant sounds, along with, in some cases, proposed partially retained ancestral sounds. We then calculated the ratio of the probability derived from the regular model to the probability of the sporadic change model as a measure of relative performance.

Colored (nongray) cells correspond to instances of regular sound change as proposed by linguists [] (see text and Table S2 ), ranging from ≥10× improvement by the regular change model (dark red) to cases in which the sporadic model outperformed the regular model (blue). Gray cells correspond to cases in which the ancestral phoneme has been retained (no phonological change has occurred). Geometric mean improvement across all colored cells (probability of regular model/probability of sporadic model) = 1.87 ± 2.98, range = 0.14 to 150.12; n = 371. Geometric mean improvement excluding cases of partial ancestral retention (white cells) = 3.71 ± 5.14, range = 0.14 to 150.12; n = 179. Leftmost columns: LA = ancestral phoneme derived from linguists’ proposals; MA = model-derived ancestral phoneme.

Linguists have proposed regular sound changes affecting consonants and vowels in the Turkic language family based on historical linguistic studies of 23 of the 26 languages we report in Figure 2 (see also Table S2 ). A proposal takes the form of a putative proto- or ancestral sound changing to a different sound or set of sounds in a descendant language. For example, the ancestral u sound is proposed [] to have changed to o in Bashkir and Tatar, and to əʷ in Chuvash, but to have been retained as u in the other languages. In agreement with these proposals, the model of regular change finds a regular u>o sound change in the branch of the Turkic phylogeny that is ancestral to Bashkir and Tatar, and finds a regular o>əʷ event in the Chuvash branch ( Figure 2 ).

The same regular sound changes are frequently repeated in different parts of the tree such that 21 changes involve unique pairs of consonants, and 17 involve unique pairs of vowels ( Table S1 ). Fewer than half (23 of 62) of all speech sounds produce a detectable regular sound change, and those that do tend to be more common (measured as a sound’s frequency of occurrence in the alignment, Spearman’s r= 0.54, p < 0.001), although this relationship might reflect the difficulty of inferring changes in rare sounds as being regular. The median number of regular sound changes does not differ between vowels and consonants (U test, p > 0.10), and vowels and consonants are equally likely to produce at least one event of regular change per sound (binomial test, p > 0.10).

Typically, around 29 of the 50 branches of the phylogenetic trees in the posterior sample record at least one event of regular change, with an average of 1.49 ± 2.49 such events per branch, although this distribution is skewed (mode = 0, range = 0 to 15). Of the roughly 74 regular sound changes, 43.03 ± 0.17 involve changes between pairs of consonants, 31.22 ± 0.44 involve pairs of vowels, and 0.02 ± 0.14 occur between a vowel and a consonant (all means ± SD refer to the distribution over the posterior sample).

The model also estimates the ordering of sound changes within a branch, in some cases allowing inferences to be made about “chaining” of sound changes. For instance, in the branch leading to Yakut, h>s appears before ž>h, indicating that h sounds at the beginning of the branch are more likely to be s by the end and that ž sounds later in the time period represented by that branch are more likely to be h by the end.

Most of these regular changes make a substantial contribution to the log-likelihood: the geometric mean improvement is 89.1 ± 72.1 log-units per event, measured as the improvement in log-likelihood when the effect is added conditional upon all the other regular sound changes being present. The three largest effects are the a>ɔ, ž>j, and q>k transitions, each of which contributes at least 275 log-units to the overall likelihood. Because these sounds are common in our data set, they make a large contribution to the likelihood when they are part of a regular sound change ( Box 1 ).

The regular-change tree largely replicates the proposed major and minor divisions of the Turkic languages [], inferring a distinct Siberian branch, which also includes Yellow Uighur, now located in China. In contrast, the sporadic-sound-change model describes the Siberian languages as successively diverging from a Turkic trunk. The regular-sound-change tree estimates a mean divergence time between the outgroup Chuvash and other Turkic languages of 204 BCE, with a 95% credible interval of 605 BCE to 81 CE. This compares to proposals from glottochronological analyses that suggest dates of 30 BCE to 0 CE [] and 500 BCE to 50 CE from historical data []. The sporadic-sound-change model estimates the mean age of the tree to be more than two millennia older (2408 BCE, 95% CI = 3994–1279 BCE), because it wrongly assumes that the many occurrences of regular sound change along the outgroup Chuvash branch are multiple instances of independent phonological change.

Events of regular sound change can provide strong signals preferring some phylogenetic placements over others and can improve the estimation of divergence times over the sporadic-change-only model, which will routinely overestimate the amount of independent change by assuming that each phonemic substitution is independent ( Box 1 ). Both effects can be seen in Figure 2 , where the model including regular changes (shown along branches) produces a different and better-supported consensus dated tree than that derived from the sporadic model, and one that conforms more closely to linguistic scholarship [].

Consensus topologies for the model allowing only sporadic changes (A) and the model allowing regular sound changes (B). Regular sound changes are indicated along the top and bottom of branches of the topology: events in black show directional changes from the beginning to ending phoneme; events shown in purple indicate two phonemes that have replaced each other. The model additionally estimates the position of each regular sound change along the branch. Mean estimated age of root between Chuvash and other Turkic languages: sporadic model (A) = 2408 BCE, with 95% credible intervals of 3993–1279 BCE; regular model (B) = 204 BCE, with 95% credible intervals of 605 BCE–81 CE. The posterior date of the calibration node (red dot; []) is 1017 ± 20 CE.

The sporadic-mutation-only model returns a mean log-likelihood in the Bayesian posterior distribution of phylogenetic trees of −32,303.9 ± 14.9 (mean ± SD), compared to −29,196.2 ± 15.1 for the model including regular and sporadic sound changes (hereafter the “regular model”), an improvement for the regular model of 3,108 log-units ( Figure 1 ). The regular model’s improvement derives from its discovering an average of 74.27 ± 0.47 regular sound changes that have occurred in the phylogenetic history of the Turkic languages (mean ± SD in the posterior sample of trees; see Supplemental Experimental Procedures and Table S1 available online). A deviance information criterion test overwhelmingly favors the model of regular changes as a description of these data (ΔDIC = 3,739).

Discussion

Our analysis has shown how a model of concerted evolution can discover the timings and phylogenetic placements of multiple events of regular sound change, and without prior knowledge of the forms those regular changes might take. The events we find conform closely to linguistic expectations, and the model produces a description of the sound transition networks among the 62 speech sounds that captures the well-known patterns of sound change. Including regular sound changes also improves the reconstruction of the phylogenetic tree describing the languages’ evolutionary histories and returns more plausible and less variable dates. This confirms the importance that historical linguists have long attached to including regular sound changes into attempts to reconstruct protolanguages, identify borrowings, and infer the genealogical history of a set of related languages, including their probable dates of origin and subsequent divergences.

The close conformity of the timings of regular linguistic sound changes to a Poisson process model over the approximately 28,000 language-years of evolution represented by the branches of the Turkic tree is striking in revealing an underappreciated regularity in this otherwise complex process. It also provides a parsimonious explanation for why some languages experience so few and others so many regular sound events in their histories: these differences can in principal be explained as expected outcomes of a homogeneous random process, and hence there is no need to seek factors either internal or external to the languages in question to explain the variation among them, at least until the statistical expectation is violated.

25 Gillespie D.J.H. The Causes of Molecular Evolution. 26 Khintchine A.Y. Mathematical Methods in the Theory of Queuing. That such a complex phenomenon could conform so closely to a homogeneous random process over such long time periods is surprising but finds an interpretation in statistical theory: where the potential causes of a discrete phenomenon (such as a regular sound change) are many, independent, and rare, and each one is individually capable of causing a regular change, the waiting times between successive events can be shown [] to follow an exponential distribution (as in Figure 5 C), and events per unit time will follow a Poisson distribution. This interpretation, then, draws researchers’ attention to the “catalog” or list of potential cognitive, linguistic, and social causes of regular sound changes to explain their timings and frequencies throughout history. The excellent fit of the Poisson distribution indicates that this catalog has stayed roughly stable for the at least two millennia over which the Turkic family diverged.

27 Pagel M.

Mace R. The cultural wealth of nations. 28 Pagel M. Wired for Culture: Origins of the Human Social Mind. Regular sound changes by their very nature make a disproportionate contribution to linguistic diversity. Regular sound changes might also help groups of language speakers create and then maintain a distinct identity []. In this context, there are several reasons to believe that the 74 regular sound changes we have identified probably underestimate their true extent in these languages. For example, some regular changes might have decayed or been replaced by others over time, rare sound changes might not yet have been observed, and the relatively high rates of sporadic transition among vowels might also mean that some number of vowels affected by a regular change might have been masked by a later sporadic change.

20 Johanson L.

Csato E.A. The Turkic Languages. In addition to these factors, in the form used here, our model provides a general “context-free” statistical description of concerted change that can be applied to any evolving hierarchical system of discrete elements. As a result, we might have missed some forms of regular sound change that depend upon multiphoneme combinations ( Box 1 ). Many Turkic languages, for example, can exhibit a form of correlation of sounds within words known as vowel harmony, whereby vowels (and some consonants) in a word are homogenized into classes. In some Turkic languages, words can be harmonized according to whether the vowels and the uvular/velar consonants have “front” or “back” articulation []. For example, the plural suffix in Turkish can depend on the class of the word, such that the plural of horse is [at-lar] (using a back vowel) whereas the plural of cat is [kedi-ler] (using a front vowel).

29 Bouchard-Côté A.

Hall D.

Griffiths T.L.

Klein D. Automated reconstruction of ancient languages using probabilistic models of sound change. 30 Wolfe P.M. Linguistic Change and the Great Vowel Shift. 29 Bouchard-Côté A.

Hall D.

Griffiths T.L.

Klein D. Automated reconstruction of ancient languages using probabilistic models of sound change. A second and more general factor common in human languages is context, in which sound changes are influenced by where the sound occurs in a word, or by its proximity to other sounds []. Sounds can be lost within words in a manner equivalent to nucleotide deletions. Occasional metathesis, or reordering of sounds, is also observed. Finally, entire classes of phonemes often shift because of loss or gain of a phonemic feature like voicing, or when the change of one sound or phonemic distinction in a sound system may lead to cascades of other sound changes in the system, as has been postulated with the “Great Vowel Shift” in English []. These factors might prove valuable in understanding differences in the propensity of a given phonemic site to be affected by a regular change. There are methods for extending our theory to context-dependent regularities [], and future work with our model will explore how they help to improve the statistical reconstruction of protowords.

3 Flot J.-F.

Hespeels B.

Li X.

Noel B.

Arkhipova I.

Danchin E.G.

Hejnol A.

Henrissat B.

Koszul R.

Aury J.-M.

et al. Genomic evidence for ameiotic evolution in the bdelloid rotifer Adineta vaga. 31 Elder Jr., J.F.

Turner B.J. Concerted evolution of repetitive DNA sequences in eukaryotes. Molecular biologists might recognize genetic analogs to the linguistic processes of context and harmony in some features of gene conversion. Thus, a recent study [] of the rotifer (Adineta vaga) genome identified “abundant” evidence of gene conversion manifested in greater-than-expected similarity among alleles—in a sense, the presence of one allele “harmonizes” the other by making a particular form of the other more likely. Equally, concerted evolutionary changes can sweep through genomes, deactivating transposable elements []. Here, the presence of a particular string of nucleotides in a wider context of a transposable element appears to invite a deactivating change. A model such as we describe here could identify these instances of gene conversion statistically and on a genome-wide basis and, if applied to a group of related organisms, could provide a description of their extent and taxonomic distribution in nature. Identification of such events might also prove valuable for inferring and dating molecular trees.