This essay was first published at the Scandinavian blog Snaphanen, but since some of my readers may not have seen it I republish it here. It was inspired by the book Indo-European Linguistics: An Introduction, by James Clackson. The discovery of the Indo-European language family was made by Sir William Jones, a gifted British classical scholar who had mastered French and Italian and some Hebrew and Arabic at an early age. He is said to have known thirteen languages well, and twenty-eight fairly well, at the time of his death. In 1786, Jones elaborated a theory of the common origins of most European languages and those of Iran and northern India. Here is Jones as quoted by Ibn Warraq in his excellent book Defending the West:

"The Sanskrit language, whatever be its antiquity, is of a wonderful structure; more perfect than the Greek, more copious than the Latin, and more exquisitely refined than either, yet bearing to both of them a stronger affinity, both in the roots of verbs and in the forms of grammar, than could possibly have been produced by accident; so strong indeed, that no philologer could examine them all three, without believing them to have sprung from some common source, which, perhaps, no longer exists: there is a similar reason, though not quite so forcible, for supposing that both the Gothic and the Celtic, though blended with a very different idiom, had the same origin with the Sanskrit; and the Old Persian might be added to the same family, if this were the place for discussing any question concerning the antiquities of Persia."

As the linguist Trautman has later said, "The modernity of the formulation is remarkable: the grouping of Sanskrit, Greek, Latin, Gothic (Germanic), Celtic, and Old Persian; their mutual resemblance in lexicon and grammar; the conception of their relationship as co-descendants of a lost ancestral language – these are exactly the views historical linguists hold today."

Jones was obviously not the first person to notice that various languages showed signs of being related. This was suspected by other scholars before him. But he was the first to connect European languages to non-European ones in this way. According to Nicholas Ostler in Empires of the Word:

"This was the origin of historical comparative linguistics. Applying it to languages all over the world was one of the great intellectual adventures of the nineteenth and twentieth centuries; and as a direct result we now know much of the flow of human languages, and so of human history, well before the start of the written documents. To give just three examples, this is how we know that the Hungarians came from northern Siberia, that Madagascar was colonised from Borneo, and that the European Gypsies originated as far away as India. For all the self-generated excellence of Sanskrit's own tradition in linguistics, it could never have gone off in this new direction on its own: what was needed was confrontation with other languages, far beyond the Indian ken, but also the ability to view these languages as somehow on a par with Sanskrit, something else that the tradition would have found simply inconceivable."

Ostler comments on the fact that the Mughal rulers of northern India, largely of Turkish origins but influenced by Persian culture, did not make this connection: "The new Muslim masters, despite their independent knowledge of Arabic, Persian and Turkish, did not distinguish themselves for their linguistic scholarship."

If you believe Mr. Edward Said and his numerous supporters, Sir William Jones was actually a racist pig who invented comparative linguistics in order to establish his dominance over "the Other." It's strange that Muslims didn't think of this when they ruled other peoples for centuries. After all, Persian, which they knew, is an Indo-European language, as is Sanskrit, as well as Greek, Armenian and the tongues of many of their subjects. Muslim scholars had access to a number of Semitic languages, from Arabic and Hebrew to Aramaic, in addition to languages of other Afro-Asiatic branches in North and East Africa. They were thus in a position to discover this linguistic tree, but they didn't. Did they simply lack curiosity?

Why was European civilization the only civilization on earth to invent comparative linguistics? It is interesting to ponder why nobody had made this connection before. The Indians hadn't done so, neither had the Persians, despite their cultural sophistication. Muslims were generally uninterested in other cultures for reasons of religious bigotry and cultural supremacism, and rarely bothered to learn non-Muslim languages. The few translations that were made from non-Muslim cultures, for instance works of the ancient Greeks in the early stages of Islamic rule, were mainly concerned with scientific matters, not with historical events or cultural ideas, and the translations were often made by non-Muslims.

Tracking the spread of the Austronesian languages from Taiwan and Southeast Asia to Madagascar in the west and Pacific islands such as Hawaii and Easter Island in the east was not done independently by Thais, Malays, Indonesians or other Asians. The first serious analyses of the Chinese language were made during the translations of Buddhist scriptures, when a number of gifted Chinese scholars went to India to study, but this did not develop into comparative linguistics as a science. I've seen no indications that the Mayans, the Incas or other American civilizations did anything of this sort, either. Similarly, the creation of archaeology as a true scientific discipline was done by Europeans and people of European origins in North America and elsewhere during the nineteenth and twentieth centuries. In short, the tools we now use to uncover human prehistory throughout the world were developed by Europeans. Those who think this claim is "Eurocentric" can prove me wrong. As James Clackson says in Indo-European Linguistics: "Indo-European (IE) is the best-studied language family in the world. For much of the past 200 years more scholars have worked on the comparative philology of IE than on all the other areas of linguistics put together. We know more about the history and relationships of the IE languages than about any other group of languages. For some branches of IE – Greek, Sanskrit and Indic, Latin and Romance, Germanic, Celtic – we are fortunate to have records extending over two or more millennia, and excellent scholarly resources such as grammars, dictionaries and text editions that surpass those available for nearly all non-IE languages. The reconstruction of Proto-Indo-European (PIE) and the historical developments of the IE languages have consequently provided the framework for much research on other language families and on historical linguistics in general."

One Indo-European language, Hittite, is attested nearly 4,000 years ago, written on clay tablets in cuneiform script in central Anatolia from the early second millennium BC. We have extensive textual remains of three more IE languages from more than 2,000 years ago: Ancient Greek, Latin and Sanskrit, and the stock of recorded IE languages further increases as we move forward in time. According to Clackson:

"The majority of IE languages currently spoken belong to six large sub-groups of IE. Modern Irish and Old Irish are members of the Celtic sub-group, which also includes Welsh, Scots Gaelic, Breton, Cornish and Manx. Sinhala is part of the large Indic family, comprising most of the languages currently spoken in North India and Pakistan, Sanskrit and the Middle Indian Prakrits. English is a member of the Germanic branch; this includes Dutch, German and the Scandinavian languages among living languages, as well as earlier stages of these languages, such as Old English, Old High German and Old Norse, and other extinct varieties such as Gothic, once spoken in south-east Europe and southern Russia. The other large sub-groups are Romance and Slavic in Europe, and Iranian in Asia. All of these sub-groups of IE were themselves recognised as linguistic families before Jones' identification of the larger IE family cited above."

Two IE sub-groups no longer exist: Anatolian was once widespread in Anatolia (present-day Turkey) before the Christian era, and Tocharian was spoken in Central Asia until the eighth century AD. Lithuanian and Latvian are attested from the early modern period, and together with the now extinct Old Prussian they form the Baltic sub-group. A few Indo-European varieties still spoken are not allocated to sub-groups but constitute separate "branches," notably Greek, Albanian and Armenian. Greek has a long history, whereas Armenian dates from the middle of the first millennium and Albanian from the second millennium of the Christian era. The Indic branch and the Iranian branch are more closely related to each other than to other branches and together constitute one Indo-Iranian sub-group.

As James Clackson puts it: "The IE languages for which we have fairly extensive records from before 1000 AD – Latin, Greek, Germanic, Iranian and Indic – have been the carriers of cultures which have in time predominated over other indigenous groups, with resultant language shift. Populations which once spoke Messapic, Venetic and Lusitanian eventually shifted to speaking Latin, Phrygians adopted Greek and Thracian lost out to overlapping waves of Greek, Latin, Germanic (Gothic) and Slavic. In the Mediterranean area, the early adoption of literacy allows us to know of a range of IE varieties. In northern and eastern Europe, where the first written records appear considerably later, we do not know whether there was a similar diversity in the territories later occupied by speakers of Celtic, Germanic, Slavic and Baltic languages." The parent language of the entire Indo-European family, which is generally labelled Proto-Indo-European (PIE), is lost in prehistory, but we can reconstruct quite a few words it must have contained through comparative linguistics. The question regarding the geographical origins of PIE has been hotly debated since William Jones first proposed his thesis, and has been variously placed in Eastern Europe, Central Asia, Iran and northern India. My personal opinion is that the cradle of the Indo-European language family most likely is somewhere close to the Black Sea coast of Russia and the Ukraine, but I will discuss this in later essays.

The primacy of Sanskrit in the early days of research into PIE has left its mark on subsequent analyses. This does have some advantages, as Greek and Vedic Sanskrit are two of the oldest and most conservative branches of IE, but the later decipherment of Hittite and greater understanding of Anatolian languages has modified our understanding of PIE. Sanskrit has eight cases, three genders and three numbers. PIE is often assumed in textbooks to have had eight nominal cases, three numbers (singular, dual and plural), plus an array of nominal declensions, partly corresponding to the three grammatical genders of masculine, feminine and neuter. Generally speaking, most newer IE languages have fewer cases, or none at all.

According to James Clackson, "The dual is lost prehistorically in Germanic (in nouns), Latin, Albanian and Armenian, and although attested in Classical Greek, Old Irish and Old Church Slavonic, it only fully survives today in some Slavic languages. The three separate nominal genders found in Sanskrit, Greek and Latin have been merged in many different branches. Several languages have 'lost' one gender: in Romance, Modern Celtic and Modern Baltic, the neuter has been assimilated into the other two declensions; in Dutch and Scandinavian the distinction between masculine and feminine is lost, the surviving distinction being between common and neuter nouns. Some languages have lost the nominal category of gender completely: in Armenian, gender was lost from both nouns and pronouns before the language is attested in written form in the first millennium of the Christian era, and English retains gender only in pronouns (although vehicles such as boats, cars and motorbikes may still be referred to by feminine pronouns)."

Among the IE sub-families we find the Slavic (Slavonic) languages. East Slavic languages include Russian, Ukrainian and Belarusian, West Slavic ones include Czech and Slovak as well as Polish while the South Slavic ones include Serbian, Slovene, Croatian and Bosnian in addition to Bulgarian and Macedonian. The first written Slavic language is called Old Church Slavonic, and was developed by Byzantine missionaries for the purpose of spreading the Gospels among Slavic-speaking peoples. Those in the western regions of the Slavic-speaking area eventually adopted the Western (Catholic) variety of Christianity, while those in the eastern provinces retained the original link with the Orthodox Christianity of Byzantium.

In the Mediterranean zones at the onset of literacy in the first millennium BC a number of different languages are attested, but because of the Roman Empire, during the early centuries of the Christian era these were replaced by Latin and its descendants. The Italic subfamily of IE includes a number of extinct languages, among them Latin, but also the Romance group comprising modern languages that descend from Vulgar Latin, such as Spanish (Castilian), Catalan, Galician and Portuguese in the Iberian Peninsula and its former colonies in Latin America as well as Italian, Romanian, French and a number of smaller languages. The development of these languages is relatively easy to record. Clackson again:

"In the Romance languages, for example, the words for 'bread' and 'water', for 'mother' and 'father', and many other lexemes are similar. In the case of the Romance languages, we have the bonus of having records of Classical Latin, which is close enough to the spoken variety from which the Romance group evolves to be considered the sub-group parent. We can see in Latin the word-forms which will eventually evolve to become the shared vocabulary of Romance: aqua 'water' can be considered the earlier form ancestral to Italian acqua and Spanish agua; pater 'father' develops into Italian padre and Spanish padre. For the Romance group, we can unearth the phonological changes which words have undergone in the centuries between Roman times and the present. We can identify which words are borrowings and which stem from Latin."

In some cases, as with the Romance language or the Indic sub-group, we have written records of the sub-group parent or a language which is very close to it (Latin and Sanskrit respectively). However, in other cases, for instance the Germanic languages, which include German, Dutch, Frisian, Afrikaans and English in addition to Norwegian, Danish, Swedish, Icelandic and Faroese, we do not have a recorded sub-group parent. As with the Slavic languages, the oldest extensive written text we know of in Germanic is a Christian text, the Gothic translation of the New Testament by Ulfilas in the fourth century AD. However, the Germanic languages started splitting apart centuries prior to this, and thus differ from each other more than do the Romance or Slavic languages. The proposed Proto-Germanic language was probably spoken at some point during the first millennium BC. James Clackson explains:

"For the Germanic group, we have no attested sub-group parent, but we hypothesise that there must have been such a language. We can further hypothesise what the vocabulary of the sub-group parent must have been: from the English, Dutch and German words for 'bread', for example, we might guess that the original word was *brod or something like it, and *water the original word for 'water'. (The * before the word highlights the fact that the word is a hypothetical item, and not directly attested.) Yet our reconstructed items here are mere guesswork, worked out on a principle that the form which was found in two languages won out over a variant found in the other. Thus in reconstructing *brod for 'bread' we take the vowel from the Dutch and German words, and the final consonant from English. In Dutch, final consonants written voiced are standardly devoiced, but we can assume that the spelling with –d represents an earlier stage of the language where final consonants could be voiced. In reconstructing *water for 'water' we took the medial consonant from Dutch and English as against the German form."

Prior to the advent of mass immigration, the vast majority of Europeans spoke an Indo-European language, but one other linguistic family does traditionally exist in Europe: The Uralic languages. This family contains perhaps 20 million speakers, a modest number compared to the billions of people now speaking an Indo-European language. This group is believed to have its original homeland, or urheimat, somewhere close to the Ural mountains, hence its name. Among those speaking Uralic languages are the Hungarians (Magyar) in Central and Eastern Europe, otherwise these languages are mainly concentrated in the Nordic and Baltic Sea region. Finland is officially bilingual, as it was for centuries a part of the kingdom of Sweden and still has a Swedish-speaking minority in the coastal regions in the southwest, closest to Sweden. However, the vast majority of the population speak Finnish. Closely related to Finnish is Estonian. Finally, we have the languages of the Sami people who inhabit parts of northern Sweden, Norway, Finland and the Kola Peninsula in Russia. This region is sometimes called Lapland as the Sami people were previously called "Lapp" or "Laplanders" by outsiders, but they generally consider this term to be derogatory.

The Uralic languages probably expanded at roughly the same time as the IE ones did. We know very little about the languages that existed in Europe prior to the Indo-European expansion, but we can detect traces of them here and there. "Alps," as in the mountain range in Central Europe, is for instance believed to be a name of extremely ancient origins. The Celtic dialects spoken in Britain during Roman times display peculiarities thought to be a grammatical echo of a pre-existing language. However, even though a few words or names have survived, and even though we may be able to detect a substratum of a different language in a few instances, the pre-Indo-European languages were wiped out by the IE expansion. The only exception on the entire European continent is believed to be the Basque language.

The Basque people inhabit the Pyrenees in northern Spain and southwestern France. Their tongue is a so-called language isolate, with no known living or dead relatives. It contains words for knife, axe and other tools which carry the root meaning of "stone," and is perhaps a direct descendant of the languages spoken in some regions of Europe during the Paleolithic and Mesolithic (Old Stone Age and Middle Stone Age) periods. Interestingly enough, in the age of DNA analysis, some of the earlier findings of comparative linguistics can now be confirmed (or contradicted) through genetics. In February 2008, Fox News reported that a Cornell University-led study found that white (European) Americans are genetically weaker and less diverse than their black compatriots. This follows the first rule of Political Correctness, which says that there are no significant genetic differences between different groups of people, and if there are, whites must always be inferior. I'm glad our weak genes didn't prevent Europeans from producing individuals such as Aristotle, Galileo, Copernicus, Newton, Beethoven and Pasteur.

The study showed that genetic diversity was greatest among Africans, while Native Americans had the least genetic diversity of all. This is consistent with the fact that North and South America were the last major landmasses to be settled by humans. It also showed that the Basques are not closely related to anyone else, neither are the residents of Sardinia off the coast of Italy. Judged by a combination of linguistic and genetic evidence, the Basque people have a strong claim as being the oldest distinct nation in Europe.