Amazon recently released its Good Omens mini-series, based off of the book co-written by Neil Gaiman and Terry Pratchett. Concurrent with its release, I happened to be attending a course at the Digital Humanities Summer Institute on Stylometry with R. In a mini-project, I found a way to combine my love of fantasy literature with my bourgeoning skills in the programming language R. In the course we were learning how to use statistics to analyze style and attribute authorship. I decided to see if I could figure out which sections of Good Omens were written by Gaiman and which by Pratchett.

Gaiman has been asked this question before, and he describes nine weeks of feverish, glorious collaboration filled with writing, rewriting, swapping, and editing of sections. He concludes “People still ask us who wrote what, and, mostly, we've forgotten.” Well, stylometry can help!

Using a training set of texts by Pratchett and Gaiman, I used the R package Stylo to analyze Good Omens. (Specifically rolling nsc classification with 50 features and 5000 words per slice). The figure below shows my results. The words of the novel progress along the x axis. The pattern below the horizontal white line represents the signal from the author to whom the program attributed the majority of the authorship (Gaiman is in red and Pratchett is in green). The top, fainter pattern roughly shows how much signal there is from the other author. Together they add up to 100% in each section of the text.