Where do witches come from, and what do those places have in common? While browsing a large collection of traditional Danish folktales, the folklorist Timothy Tangherlini and his colleague Peter Broadwell, both at the University of California, Los Angeles, decided to find out. Armed with a geographical index and some 30,000 stories, they developed WitchHunter, an interactive ‘geo-semantic’ map of Denmark that highlights the hotspots for witchcraft.

The system used artificial intelligence (AI) techniques to unearth a trove of surprising insights. For example, they found that evil sorcery often took place close to Catholic monasteries. This made a certain amount of sense, since Catholic sites in Denmark were tarred with diabolical associations after the Protestant Reformation in the 16th century. By plotting the distance and direction of witchcraft relative to the storyteller’s location, WitchHunter also showed that enchantresses tend to be found within the local community, much closer to home than other kinds of threats. ‘Witches and robbers are human threats to the economic stability of the community,’ the researchers write. ‘Yet, while witches threaten from within, robbers are generally situated at a remove from the well-described village, often living in woods, forests, or the heath … it seems that no matter how far one goes, nor where one turns, one is in danger of encountering a witch.’

Such ‘computational folkloristics’ raise a big question: what can algorithms tell us about the stories we love to read? Any proposed answer seems to point to as many uncertainties as it resolves, especially as AI technologies grow in power. Can literature really be sliced up into computable bits of ‘information’, or is there something about the experience of reading that is irreducible? Could AI enhance literary interpretation, or will it alter the field of literary criticism beyond recognition? And could algorithms ever derive meaning from books in the way humans do, or even produce literature themselves?

Computer science isn’t as far removed from the study of literature as you might think. Most contemporary applications of AI consist of sophisticated methods for learning patterns, often through the creation of labels for large, unwieldy data-sets based on structures that emerge from within the data itself. Similarly, not so long ago, examining the form and structure of a work was a central focus of literary scholarship. The ‘structuralist’ strand of literary theory tends to deploy close – sometimes microscopic – readings of a text to see how it functions, almost like a closed system. This is broadly known as a ‘formal’ mode of literary interpretation, in contrast to more historical or contextual ways of reading.

The so-called ‘cultural’ turn in literary studies since the 1970s, with its debt to postmodern understandings of the relationship between power and narrative, has pushed the field away from such systematic, semi-mechanistic ways of analysing texts. AI remains concerned with formal patterns, but can nonetheless illuminate key aspects of narrative, including time, space, characters and plot.

Consider the opening sentence of Gabriel García Márquez’s One Hundred Years of Solitude (1967): ‘Many years later, as he faced the firing squad, Colonel Aureliano Buendía was to remember that distant afternoon when his father took him to discover ice.’ The complex way in which Márquez represents the passage of time is a staple of modern fiction. The time corresponding to ‘Many years later’ includes the fateful time of ‘facing’ the firing squad, which in turn is simultaneous with that final ‘remember’-ing, which is years after ‘that distant afternoon’. In a single sentence, Márquez paints a picture of events in the fleeting present, memories of the past and visions for the future.

According to numerous psychological studies, when we read such stories, we construct timelines. We represent to ourselves whether events are mentioned before, after or simultaneous with each other, and how far apart they are in time. Likewise, AI systems have also been able to learn timelines for a variety of narrative texts in different languages, including news, fables, short stories and clinical narratives.

In most cases, this analysis involves what’s known as ‘supervised’ machine learning, in which algorithms train themselves from collections of texts that a human has laboriously labelled. Timeframes in narratives can be represented using a widely used annotation standard called TimeML (which I helped to develop). Once a collection (or ‘corpus’) of texts is annotated and fed into an AI program, the system can learn rules that let it accurately identify the timeline in other new texts, including the passage from Márquez. TimeML can also measure the tempo or pace of the narrative, by analysing the relationship between events in the text and the time intervals between them.

AI annotation schemes are versatile and expressive, but they’re not foolproof

The presence of narrative ‘zigzag’ movements in fiction is one of the intriguing insights to emerge from this kind of analysis. It’s evident in this passage from Marcel Proust’s posthumously published novel Jean Santeuil (1952), the precursor to his magnum opus In Search of Lost Time (1913-27):

Sometimes passing in front of the hotel he remembered the rainy days when he used to bring his nursemaid that far, on a pilgrimage. But he remembered them without the melancholy that he then thought he would surely some day savour on feeling that he no longer loved her.

The narrative here oscillates between two poles, as the French structuralist critic Gérard Genette observed in Narrative Discourse (1983): the ‘now’ of the recurring events of remembering while passing in front of the hotel, and the ‘once’ or ‘then’ of the thoughts remembered, involving those rainy days with his nursemaid.

Even though AI annotation schemes are versatile and expressive, they’re not foolproof. Longer, book-length texts are prohibitively expensive to annotate, so the power of the algorithms is restricted by the quantity of data available for training them. Even if this tagging were more economical, machine-learning systems tend to fare better on simpler narratives and on relating events that are mentioned closer together in the text. The algorithms can be foxed by scene-setting descriptive prose, as in this sentence from Honoré de Balzac’s novella Sarrasine (1831), in which the four states being described should (arguably) overlap with each other:

The trees, being partly covered with snow, were outlined indistinctly against the greyish background formed by a cloudy sky, barely whitened by the moon.

AI criticism is also limited by the accuracy of human labellers, who must carry out a close reading of the ‘training’ texts before the AI can kick in. Experiments show that readers tend to take longer to process events that are distant in time or separated by a time shift (such as ‘a day later’). Such processing creates room for error, although distributing standard annotation guidelines to users can reduce it. People also have a hard time imagining temporally complex situations, such as the mind-bending ones described in Alan Lightman’s novel Einstein’s Dreams (1992):

For in this world, time has three dimensions, like space. … Each future moves in a different direction of time. Each future is real. At every point of decision, whether to visit a woman in Fribourg or to buy a new coat, the world splits into three worlds, each with the same people, but different fates for those people. In time, there are an infinity of worlds.

Spotting temporal patterns might be fun and informative, but isn’t literature more than the sum of the information lurking in its patterns? Of course, there might be phenomenological aspects of storytelling that remain ineffable, including the totality of the work itself. Even so, literary interpretation is often an inferential process. It requires sifting through and comparing chunks of information about literature’s form and context – from the text itself, from its historical and cultural background, from authorial biographies, critiques and social-media reactions, and from the reader’s prior experience. All of this is data, and eminently minable.

I don’t think it’s too outlandish to suggest that an automaton might one day be able to simulate, for itself, the feelings we have when we read a story. At the moment, AI systems are notoriously bad at an important aspect of how humans make meaning from words: the ability to discern the context in which statements occur. But they’re getting better. Automatic sentiment and irony detectors are exposing some of the hidden associations lurking below the surface of texts. Meanwhile, social robots are also starting to improve their emotional intelligence.

Like many other AI practitioners, I’m a philosophical functionalist: I believe that a cognitive state, such as one derived from reading, should not be defined by what it is made of in terms of hardware or biology, but instead by how it functions, in relation to inputs, outputs and other cognitive states. (Opponents of functionalism include behaviourists – who insist that mental states are nothing other than dispositions to behave in certain ways – and mind-brain identity theorists – who argue that mental states are identical with particular neural states, and are tied to specific biological ‘hardware’.)

Whether we like it or not, slicing up a text into comparable bits is already an undeniable part of our critical repertoire

Machines, in the functionalist view, can therefore be said to ‘experience’ certain basic cognitive states. ‘Siri understood my request,’ in relation to the iPhone, means that Siri processed my request to achieve a desired functional outcome. Similarly, ‘The system understands temporal relations’, in relation to an algorithm for analysing text, simply means that it digested and produced a functional timeline that is similar to a human one. A functionalist stance also allows for a comparison of qualitative experiences or ‘qualia’. I have my own subjective experience of the translation of the last haiku written by Matsuo Bashō, a 17th-century Japanese poet:

Sick on a journey –

over parched fields

dreams wander on.

While my experience of reading these lines is private and different from anyone else’s, it can be compared with yours – or a computer’s – by experimentally testing how similar our reactions are.

This empirical kind of analysis might strike the sensitive reader of fiction or poetry as rather strange. Algorithms are still very far off being able to produce the full range of functional outputs that a human can upon digesting a text. But if it weren’t possible to compare the effects of different subjective experiences of reading, it would make no sense to talk of literature resonating among different people, either between the writer and the reader or among multiple readers. Yet that’s exactly what literature does. Whether we like it or not, slicing up a text into comparable bits is already an undeniable part of our critical repertoire. And, as research into machine intelligence progresses, such functional, computational analysis promises to become only more significant.

Algorithms might be poor at grasping context, but they excel at sifting through large amounts of data. This means they’re well-suited to what Franco Moretti at the Stanford Literary Lab calls ‘distant reading’ – a zoomed-out, macroscopic literary analysis of hundreds, sometimes thousands, of texts. By crunching through this ‘big data’, Moretti and his followers hope to discover aspects of literature that are invisible to scholars who go about merely reading books.

Conversation is one area where computational methodology has been shown to trump the claims of literary scholars – even scientifically inclined ones. In his Atlas of the European Novel (1999), Moretti suggested that the bustling urban setting of much 19th-century fiction tends to involve more characters but less dialogue, compared with narratives set within the confines of the family in the village or the countryside. A group of computational linguists and literary scholars at Columbia University decided to investigate this claim, using software that built a conversational social network from a corpus of 60 novels from the 19th century.

The software parsed each sentence in terms of its syntax, and then found references to people. It also flagged stretches of quoted speech and attributed the quotes to speakers. This allowed the system to discern who was talking to whom. Although Moretti’s theory predicted an inverse correlation between the amount of dialogue and the number of characters, these scholars found no such statistically significant effect. Instead, they discovered that narrative voice, such as first- or third-person narration, was more relevant than the setting in urban or rural environments.

Characters are another area ripe for empirical re-examination. Readers often have strong intuitions about fictional figures. We recognise the imprint of an individual author, seeing characters as, say, Dickensian or Kafkaesque. We are also aware that characters can fall into certain functional classes across different works. It’s clear that a villain such as Lord Voldemort resembles Count Dracula more than he does his antagonist, the hero Harry Potter.

The computational linguist David Bamman, now at the University of California, Berkeley, and colleagues, mined a database of more than 15,000 novels to produce a Bayesian statistical model that could predict different character types. They used features such as the actions that a person participates in, the objects they possess, and their attributes. The system was able to identify cases where two characters by the same author happen to be more similar to each other than to a closely related character by a different author. So the system discovered that Wickham in Jane Austen’s Pride and Prejudice (1813) resembles Willoughby in her Sense and Sensibility (1811), more than either character resembles Mr Rochester in Charlotte Brontë’s Jane Eyre (1847).

They can discover trajectories from a database of 1,300 novels – this would take literary scholars a huge amount of time

The computer could also tell when protagonists by the same author are distinguished, for example, by being more thoughtful. Their system infers that Elizabeth Bennet in Pride and Prejudice, one of Austen’s most popular characters, resembles Elinor Dashwood in Sense and Sensibility more than either character resembles Elizabeth’s foolish, marriage-obsessed mother, Mrs Bennet. Having a human specify what underlies these scholarly intuitions is hard, but the computer has little difficulty spotting and testing them.

Algorithms are also becoming adept at unpicking the knotty entanglements of characters’ relationships. For example, the computer scientist Mohit Iyyer and colleagues at the University of Maryland have developed a system that discovers, from reading Bram Stoker’s Dracula (1897), the correct trajectory of the relationship between Arthur and Lucy, which starts with love and ends with murder. Their method can correctly discover numerous other trajectories from a database of more than 1,300 novels – inferences that would take literary scholars a huge amount of time to detect.

It’s not hard to imagine a near-term scenario where a character such as Robin Hood could be tracked through time across multiple texts. He starts out as a cut-throat, anti-clerical outlaw who robs the rich to help the poor; moves to his 19th-century incarnation as a regional hero battling the Norman nobles; and ends up as a fox in a Disney film. To a scholar attuned to the cultural turn in literary studies, the details of Robin Hood’s transformation through time could reveal facts about class conflict, the interactions of literature and power, and the constraints and pressures of mass entertainment.

In 1928, the Russian structuralist Vladimir Propp published an inventory of 31 narrative archetypes or ‘functions’ that underpin common Russian folktales. In the narrative function of ‘Villainy’, for example, a villain abducts someone, while in ‘Receipt of a Magical Agent’, a character can place himself at the disposal of the hero.

Could an algorithm today generate and improve upon Propp’s narrative functions? In his AI dissertation at MIT, the computer scientist Mark Finlayson built a system that drew on an annotated English translation of Propp’s Russian corpus. He discovered several new narrative plot structures – finding, for example, that kidnapping, seizing and tormenting are the hallmarks of Proppian villainy.

Until this sort of analysis came along, finding and examining the morphologies of folklore took years of careful reading and analysis. Though structuralism is no longer in fashion among literary scholars, computational embodiments of these insights have led to intriguing results. Using Propp’s narrative functions, a group of AI researchers at the Complutense University of Madrid have developed a system known as PropperWryter, which automatically generates Russian-style fairy tales. The results are still rudimentary, but intriguing all the same:

Once upon a time there was a princess. The princess said not to go outside. The princess went outside. The princess heard about the lioness. The lioness scared the princess. The lioness kidnapped the princess. The knight departured. The knight and the lioness fought. The knight won the fight. The knight solved the problem of the princess. The knight returned. A big treasure to the knight.

The team have since extended the tool to create plot lines for musical theatre – including Beyond the Fence, the first ever computer-generated musical, which ran for several weeks at the Arts Theatre in London this year.

Such experiments raise the tantalising possibility that AI systems could be literary creators themselves one day. Several years ago, Marc Cavazza and his colleagues at Teesside University in Middlesbrough built an immersive interactive storytelling system in virtual reality, using excerpts of Gustave Flaubert’s novel Madame Bovary (1857). Human users took on the role of a character and interacted with Emma Bovary to influence the plot outcomes. The developers created an inventory of character feelings based on Flaubert’s preliminary studies for the novel.

Without algorithmic assistance, researchers would be hard-pressed to make such intriguing findings

In one path through the system, by the time her affair has been going on for a while, Emma is comfortable with the risk of adultery, and also swayed by Rodolphe’s power over her. These states are preconditions for her expressing her feelings to Rodolphe, causing her to tell him: ‘There are times when I long to see you again!’ At this juncture, the user (in the role of Rodolphe) could reply: ‘I will leave you and never see you again.’ This response will make Emma angry and trigger a chain of events, including regret for falling for Rodolphe, and discovery of happiness in family life (an outcome that might have upset Flaubert). On other occasions, users ended up drastically curtailing the story by providing excessive ‘emotional input’ to an already overwrought Emma.

More recently, these researchers have focused on generating animated medical soap operas involving virtual characters such as doctors, nurses and patients. Participants can specify certain social relations between characters, such as extreme antagonism between a pair. These choices produce unpredictable narrative actions, such as the spreading of malicious gossip, and result in the creation of an episode that users can watch.

Computational analysis and ‘traditional’ literary interpretation need not be a winner-takes-all scenario. Digital technology has already started to blur the line between creators and critics. In a similar way, literary critics should start combining their deep expertise with ingenuity in their use of AI tools, as Broadwell and Tangherlini did with WitchHunter. Without algorithmic assistance, researchers would be hard-pressed to make such supernaturally intriguing findings, especially as the quantity and diversity of writing proliferates online.

In the future, scholars who lean on digital helpmates are likely to dominate the rest, enriching our literary culture and changing the kinds of questions that can be explored. Those who resist the temptation to unleash the capabilities of machines will have to content themselves with the pleasures afforded by smaller-scale, and fewer, discoveries. While critics and book reviewers may continue to be an essential part of public cultural life, literary theorists who do not embrace AI will be at risk of becoming an exotic species – like the librarians who once used index cards to search for information.