In the past few years, a number of films have made a point of Getting the Science Right. Interstellar (2014) famously consulted with astrophysicist Kip Thorne, in order to achieve a realistic on-screen depiction of black holes (amongst other things). Just last year, The Martian was praised for basing much of its appeal around its scientific realism.

Now, in 2016, the sci-fi film Arrival is attracting similar accolades for it’s portrayal of linguistics, and of how scientists approach solving a problem. One standout piece hails from Science vs. Cinema – a YouTube channel devoted to examining how Hollywood fares on various science-related matters:

Since the movie only had so much time to cover exactly how Amy Adams’ character Dr. Louise Banks unraveled the aliens’ writing system, let’s do a deep dive and actually answer the question:

How do linguists do what they do?

So, the short answer is pattern recognition. Just like how a physicist might toss a ball up into the air a bunch of times to figure out which rules it follows coming down (or toss some baryons at each other to see what comes out the other side), a linguist will observe how the sounds and symbols of a language regularly connect up to meaning. With even a small amount of data, a linguist can begin the process of uncovering the relationships that hold between speech and the world it’s used to represent.

What does this look like? Well, we don’t have any real alien languages to work with (for now), but we’ve got plenty of human languages to choose from – between six and seven thousand. So, let’s pick one, look at a few examples, and see what we find!

To get as far away from English as we can, let’s go to Kashaya – an endangered Pomoan language spoken in California. Unlike English, which is analytic and tends to keep its morpheme-to-word ratio low (relying heavily on word order to keep track of who’s doing what to whom), Kashaya is polysynthetic; it tends to pack lots of information into each word, by way of affixation, and has relatively free word order.

Now, about half the world’s languages don’t have any developed writing system associated with them; with those that do, many lack native representation, relying instead on some version of the Latin alphabet. Kashaya is one such language, with its written form having been developed by Dr. Robert Oswalt in the 1960s. In the examples that follow, you can think of each word as being written in a kind of quasi-IPA; the words on the left represent the actual sounds of the language, as understood and transcribed by a linguist, while the meanings hang in the right-hand column.





bahcúw ‘to jump’ bahcubedu ‘always jumps’ coqocedu ‘always shoots’ coqów ‘to shoot’ kelci ‘peek!’ kelcíw 'to peek’ coqo ‘shoot!’





(At this point, I should admit I grabbed this data from a linguistics assignment. In the context of the film, the Heptapod language doesn’t come with an instruction manual, so Louise has to employ a monolingual demonstration – a technique used to elicit linguistic data when no common tongue exists between the parties involved. If you want to see this in action [you do], I highly recommend having a look here. In under 40 minutes, and without any assistance, linguist Dan Everett works with a speaker of a language he has no knowledge of and manages to uncover basic facts about its word order, sound inventory, and morphology. It’s really an impressive thing to behold, even for other linguists.)

Let’s start by choosing two words which share some meaning, to see if we can figure out what else they have in common. Beginning with the words for ‘to jump’ and ‘always jumps,’ we might tentatively conclude that what they share – “bahc” – means ‘jump,’ and that “úw” and “ubedu” mark the verb as being tenseless and habitual, respectively. But throwing a third word into the mix – “coqocedu” for ‘always shoots’ – forces us to either reject or revise our hypothesis; “ubedu” can’t mean ‘always,’ since the only common sounds between the second and third words are “edu.” What can we do?

Here, it helps to know a bit about what languages sometimes have up their sleeves. In this case, it’s important to notice that the first consonants – “b” and “c” – are repeated in “bahcubedu” and “coqocedu.” This looks like reduplication, which involves the repetition of some or all of a root word, sometimes with some modification. It’s not as exotic as it sounds, since English speakers use this strategy when treating something dismissively (e.g., “alien-shmalien”). So, Kashaya might instead express ‘always’ using partial reduplication, followed up with “edu.” We’d need more data to know for sure, but it’s a good start.

As for those shifting vowels – “bahcubedu” vs. “coqocedu”? Maybe they’re not part of the the affixes, after all; maybe they’re part of the verb! In fact, the fourth word – “coqów” – seems to confirm this, since they each tend to stick to one verb root pretty consistently. Which means we need to say something more about that tenseless marker: it’s really just a “w,” with some stress added onto the last syllable. And, again, it’s not so unusual for stress to play a role in grammar; “complex” can mean either “complicated” or “a bunch of buildings,” depending on where you put the main emphasis.

Looking at the rest of the list, we can add that the absence of an affix altogether apparently marks the verb as imperative, meaning it can be used to issue a command. So, all in all, just 7 words have told us something about the sounds used in Kashaya, whether or not it has stress and reduplication, how to mark verbs as tenseless, habitual, or imperative, and how to say “jump,” “shoot,” and “peek.” Not bad, eh?

And this is just how Louise approaches Heptapod B, the film’s alien writing system. You can see for yourself, in this behind-the-scenes snapshot, that each logogram can be divided up into 12 segments. (Having an upper limit to sentences isn’t something you really see in human language, but of course this isn’t a human language!)

Looking closer, we can see the kinds of patterns we’d need to pay attention to, to begin cracking open the language. In the image below, the lowest righthand portion is shared across both logograms. (Actually, the bottom twelfth is the same across both of them, too. Maybe, like in Kashaya, this absence communicates something meaningful!)

And armed with this understanding, Louise devises a program to automatically analyze the logograms’ parts.

She even reverses the process by taking that database of segments and using it to construct her own logograms, in order to pose the all-important questions at the heart of the story.

Too cool!

It’s encouraging that the movie managed to capture the process so well (if only briefly), and it’s impressive that it’s resonated with audiences so much. I know I speak for more that just myself when I dare to hope this means we’ll be seeing more of a willingness on moviemakers’ parts to represent real science on the big screen. It might’ve sounded farfetched to say so just a few years ago, but a little linguistics in Hollywood seems not to be such an out-of-this-wolrd idea, after all!

If you enjoyed this story, and want to learn more about the linguists who worked on Arrival, definitely have a look at this!

And if you happen to be a student at the University of Pennsylvania, and you’re interested in helping to document and preserve endangered languages like Kashaya, be sure to check out the work being done by Professor Eugene Buckley, here! No experience needed!