The epic poem Beowulf is the most famous surviving work of Old English literature. For decades, scholars have hotly debated both when the poem was composed and whether it was the work of a single anonymous author ("the Beowulf poet"). Lord of the Rings' scribe J.R.R. Tolkien was among those who famously championed the single-author stance. Now researchers at Harvard University have conducted a statistical analysis and concluded that there was very likely just one author, further bolstering Tolkien's case. They published their findings in a recent paper in Nature Human Behavior.

Set in Scandinavia, Beowulf recounts the adventures of its titular hero. The Danish King Hrothgar's mead hall is under attack from a monster called Grendel. Beowulf obligingly slays the beast, incurring the wrath of Grendel's equally monstrous mother. He slays her, too, and eventually becomes king of his people, the Geats. Some 50 years after those adventures, Beowulf slays a dragon, although he is killed in the process. Scholars believe many of the characters are based on historical figures in sixth-century Scandinavia.

The original manuscript dates back to between the eighth and early 11th centuries; a more precise date is one of the most heated academic debates about Beowulf. The second debate centers on whether Beowulf is the work of many different authors, stitched together from multiple sources, or a single person. According to Madison Krieger, a postdoc in evolutionary dynamics at Harvard University and one of the new paper's authors, the questions about Beowulf's authorship began in earnest in 1815 with the publication of the first widely available edition of the poem.

"The way we read it now seems very disjointed," said Krieger. "From high school, everyone remembers the battle with Grendel and Grendel's mother and maybe the dragon, but if you go back and read the whole poem, there are weird sections about, for instance, how good Beowulf is at swimming and other sections that go back hundreds of years and talk about hero kings that have ostensibly nothing to do with the story."

Then there's the fact that the handwriting of the original manuscript is different; at line 1,939, mid-sentence, a second scribe's hand takes over. Scholars agree that these were different scribes copying the poem, not two different poets. However, "It has helped contribute to a narrative according to which the writing of Beowulf, and maybe its original composition, was a long and collaborative effort," said Krieger.

That was the prevailing view until 1936. That's when Tolkien published his seminal literary analysis, "Beowulf: The Monsters and the Critics," in Proceedings of the British Academy, based on a lecture delivered that same year. Tolkien was a great admirer of Beowulf, which greatly influenced the world-building of The Hobbit and his Lord of the Rings trilogy. In Tolkien's paper, he argued for an earlier eighth-century composition date, based on textual evidence of a strong influence of Anglo-Saxon paganism. As Krieger points out, every character in the poem is a pagan, although "it's overlaid throughout with a Christian perspective and infused with Christian language." Tolkien also defended the single-author viewpoint in the essay.

This new study brings the power of computational analysis to bear on the debate. "Arguments based on the poem's content or its author's supposed belief system are vital, of course, but equally important are arguments based on the nitty-gritty of stylistic details," said Krieger. "The latter also have the merit of being testable, measurable. This is the first step in taking an old debate and refreshing it with some new methodology."

The methodology in this case is called stylometry, which analyzes the statistical characteristics in textual style: the poem's meter, for example, or how many times different words or letter combinations appear in a given text. Everybody uses language a bit differently: we favor different punctuation or turns of phrase, have a broader or narrow vocabulary, and so forth. Stylometry is intended to identify and quantify those individual tics. Collecting that data for an unknown text and comparing it to texts by known authors can, in theory, make a positive identification. A paper published last week in Information Sciences by Polish scientists found that it's possible to identify an author based on connections between just a dozen words in an English text and even fewer words for texts in Slavic languages.

For their analysis, Krieger and his co-authors examined four broad categories: meter, "sense pauses" between clauses and sentences in many of the lines (serving as a kind of punctuation), word choice, and letter combinations. The latter provided the best markers for this kind of measurement. They found that, based on those metrics, the text of Beowulf is remarkably consistent throughout.

"Across many of the proposed breaks in the poem, we see that these measures are homogeneous," said Krieger. "So as far as the actual text of Beowulf is concerned, it doesn't act as though there is supposed to be a major stylistic change at these breaks. The absence of major stylistic shifts is an argument for unity."

He and his colleagues reasoned that if there is homogeneity in a single feature, such as how bigrams are distributed (ab, ac, ad, etc.), one can conclude that the text was written by a single author or by multiple authors seeking to trick the analysis into thinking it was written by a single author. Add additional features into the mix, like punctuation and meter, and the homogeneity still persists. This makes a single author more likely, since would-be tricksters would have to be even more clever in their subterfuge.

"So as we keep adding more and more features, it becomes harder and harder to believe that a text isn't just the homogenous work of one person," said Krieger.

That said, this finding isn't likely to settle the authorship question once and for all. "Every tool in this field is statistical," said Krieger. "Even as they get more and more sophisticated, they are just tipping our needle of probabilistic reasoning in one direction or another."

Krieger et al. also analyzed a collection of four Old English poems believed to be written by an author known only as Cynewulf. Some scholars have noted certain similarities in style and theme in other anonymous Old English poems and sought to also attribute them to Cynewulf, although the consensus is that those were likely the product of a Cynwulfian school of poetry. Krieger et al.'s analysis showed that Cynewulf was very likely the author of three (and possibly all four) of the signed poems, based on stylistic homogeneity, as well as an anonymous poem called Andreas, which tells the story of St. Andrew the Apostle.

"Authorship attribution is hard," said Simon DeDeo of Carnegie Mellon University, a former physicist who now applies mathematical techniques to the study of historical and current cultural phenomena, who was not involved with the study. Case in point: in 1995, a scholar named Donald Foster claimed, based on his computer analysis, that a poem entitled "Funeral Elegy for Master William Peter" had been written by William Shakespeare. But it wasn't: the poem was written by John Ford, a Shakespeare contemporary best known for his play, 'Tis Pity She's a Whore, who was known for imitating the Bard.

"In the end," said DeDeo, "Foster's attribution was disproved—as much as it could be—the old-fashioned way: by close reading." (Foster was a good enough scholar to admit he was wrong.)

"This group is working in that Foster style, building statistics of word usage and metrical patterns that might count as fingerprints to distinguish interlopers," said DeDeo. "I certainly believe their no-detection result for Beowulf: i.e., the claim that their metrics cannot find evidence for a change in authorship. That doesn't disprove the historical claim, of course—absence of evidence is not evidence of absence, and a critic could reply that their measures were too coarse for the task at hand."

“We demonstrate a lot of homogeneity in ‘Beowulf,’ tipping the scales toward unitary authorship.”

Krieger recognizes the inherent challenges, and he is familiar with the Foster case. But he points out that he and his colleagues are posing a simpler question than attributing a text to a specific author. "We demonstrate a lot of homogeneity in Beowulf, tipping the scales toward unitary authorship, but we don't say who," he said. "If you gave me some choices, I would not be nearly as confident in the results of my tests. It's important to distinguish our work from this kind of higher-order problem of saying who wrote something."

As for the Cynewulf result, DeDeo thinks that claim is a little bit stronger, but he notes that the statistics here are also coarse. "In as much as these poems differ because of style, I could make a forgery that would pass their tests simply by substituting in synonyms," he said. "You're relying on the authors not trying to imitate a match."

"It's exciting to think that authorship could be settled the way crimes are on C.S.I., by technology alone, but I don't think it ever can be," said DeDeo. "One of the nice things this work can do, however, is challenge received wisdom and open up new hypotheses about where these texts did come from. Computers 'see' things in a different way than human readers do: projected onto these more abstract spaces, they can often reveal something unexpected that the conscious mind can miss."

DOI: Nature Human Behavior, 2019. 10.1038/s41562-019-0570-1 (About DOIs).