Your language is not your own. The words you speak have been borrowed, modified, and molded by the forces of linguistic evolution. And the sentences they form are not so much "English" as they are a shapeshifting hodgepodge of different languages that have intersected with English over the years.


Not that many of us would ever know it. Sure, the etymologies and histories of many words may only be a dictionary-reference away, but few of us have the time or inclination to investigate where these words — let alone entire sentences — actually come from.

Unless, of course, you're Mike Kinde, who maintains the ridiculously enthralling data visualization blog Ideas Illustrated. Looking to better understand the role of foreign words in his day-to-day use of the English language, Kinde whipped up a program that would allow him to actually see precisely that:

Using Douglas Harper's online dictionary of etymology, I paired up words from various passages I found online with entries in the dictionary. For each word, I pulled out the first listed language of origin and then re-constructed the text with some additional HTML infrastructure. The HTML would allow me to associate each word (or word fragment) with a color, title, and hyperlink to a definition.


Kinde associated Old English with pink, Middle English with red, Anglo-French with orange, Old French with light orange, Middle French with pale orange, Classical & Medieval Latin with yellow, Gallo-Roman & Middle Low German with gray, and American with green. His system allowed him to analyze everything from simple, etymologically homogenous-looking sentences:

To complex Monty Python quotes:

G/O Media may get a commission LG 75-Inch 8K TV Buy for $2150 from BuyDig Use the promo code ASL250

To passsages from classic American literature, like this excerpt from Mark Twain's The Adventures of Tom Sawyer:


Things get even more interesting when Kinde starts creating pie charts that compare the word origins in work by American vs non-American authors, or in legal texts, medical publications, and sports articles. The two pie charts shown here, for example, illustrate the marked difference between word origins in the Tom Sawyer passage and a paragraph from a medical journal. With etymological underpinnings like these, it's no wonder people can find medical and scientific articles so impenetrable; only about half the words have origins in Old English. Compare that to something like a sports article, where Kinde finds that figure hovering around 80%.


Kinde says a website where you can upload your own passages and have them analyzed and color-coded is in the works. In the meantime, however, you'll find many more word-origin visualizations and distribution breakdowns on his blog, Ideas Illustrated. (By the way: I wasn't kidding about it being enthralling; if you have the slightest interest in data science, design, or visualization, Kinde's blog entries will consume hours of your time. You've been warned. Proceed with caution.)

All figures via Ideas Illustrated.