We computed a relative entropy measure to quantify the degree of ordering in word sequences from languages belonging to several linguistic families. While a direct estimation of the overall entropy of language yielded values that varied for the different families considered, the relative entropy quantifying word ordering presented an almost constant value for all those families.

The language faculty is probably the most distinctive feature of our species, and endows us with a unique ability to exchange highly structured information. In written language, information is encoded by the concatenation of basic symbols under grammatical and semantic constraints. As is also the case in other natural information carriers, the resulting symbolic sequences show a delicate balance between order and disorder. That balance is determined by the interplay between the diversity of symbols and by their specific ordering in the sequences. Here we used entropy to quantify the contribution of different organizational levels to the overall statistical structure of language.

Funding: Marcelo A. Montemurro was supported by the UK Medical Research Council (MRC), the Royal Society, and UK Engineering and Physical Sciences Research Council (EPSRC) grant EP/C010841/1. Damián H. Zanette was supported by the Agencia Nacional de Promoción Científica y Tecnológica and Universidad Nacional de Cuyo of Argentina (SECTyP-UNCuyo). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

In our analysis, we considered individual words as the most elementary units of linguistic information. Therefore, the first organizational level in a linguistic sequence is given by the distribution of frequencies with which different words are used. Zipf's law [20] states that if the word frequencies of any sufficiently long text are arranged in decreasing order, there is a power-law relationship between the frequency and the corresponding ranking order of each word. Moreover, this relationship is roughly the same for all human languages. Zipf's frequency-rank distribution, however, does not bear any information about the way in which words are ordered in the linguistic sequence, and would be exactly the same for any random permutation of all the words of the sequence. A second organizational level is then determined by the particular way in which individual words are arranged. Discriminating between the contributions of those two levels of organization can add relevant insights into statistical regularities across languages. The present paper is focused on assessing the specific impact of word ordering on the entropy of language. To that end, we estimated the entropy of languages belonging to different linguistic families. Our results show that the value of the total entropy depends on the particular language considered, being affected by the specific characteristics of grammar and vocabulary of each language. However, when a measure of the relative entropy is used, which quantifies the impact of word patterns in the statistical structure of languages, a robust universal value emerges across linguistic families.

A rigorous measure of the degree of order in any symbolic sequence is given by the entropy [15] . The problem of assigning a value to the entropy of language has inspired research since the seminal work by Claude Shannon [16] , [17] , [18] , [19] . However, to comprehend the meaning of the entropy of language it is important to bear in mind that linguistic structures are present at various levels of organization, from inside individual words to long word sequences. The entropy of a linguistic sequence contains contributions from all those different organizational levels.

Written human languages encode information in the form of word sequences, which are assembled under grammatical and semantic constraints that create organized patterns. At the same time, these constraints leave room for the structural versatility that is necessary for elaborate communication [11] . Word sequences thus bear the delicate balance between order and disorder that distinguishes any carrier of complex information, from the genetic code to music [12] , [13] , [14] . The particular degree of order versus disorder may either be a feature of each individual language, related to its specific linguistic rules, or it may reflect a universal property of the way humans communicate with each other.

The emergence of the human language faculty represented one of the major transitions in the evolution of life on Earth [1] . For the first time, it allowed the exchange of highly complex information between individuals [2] . Parallels between genetic and language evolution have been noticed since Charles Darwin [3] and, although there is still some debate, it is generally accepted that language has evolved and diversified obeying mechanisms similar to those of biological evolution [4] . There may even be evidence that all languages spoken in the world today originated from a common ancestor [5] . The extant languages amount to a total of some 7,000, and are currently divided into 19 linguistic families [6] . Within the Indo-European family some of the languages differentiated from each other not long after the end of the last glacial age [7] , which pushes cross-family divergences far into prehistoric times. The evolutionary processes that acted since then have led to a degree of divergence that can make distantly related languages totally unintelligible to each other. Notwithstanding the broad differences between languages, it has been found that linguistic universals exist both at the level of grammar and vocabulary [8] , [9] , [10] .

Results