“My prettiest contribution to the culture” was how the novelist Kurt Vonnegut described his old master’s thesis in anthropology, “which was rejected because it was so simple and looked like too much fun”. The thesis sank without a trace, but Vonnegut continued throughout his life to promote the big idea behind it, which was: “stories have shapes which can be drawn on graph paper”.

In a 1995 lecture, Vonnegut chalked out various story arcs on a blackboard, plotting how the protagonist’s fortunes change over the course of the narrative on an axis stretching from ‘good’ to ‘ill’. The arcs include ‘man in hole’, in which the main character gets into trouble then gets out again (“people love that story, they never get sick of it!”) and ‘boy gets girl’, in which the protagonist finds something wonderful, loses it, then gets it back again at the end. “There is no reason why the simple shapes of stories can’t be fed into computers”, he remarked. “They are beautiful shapes.”

"Thanks to new text-mining techniques, this has now been done. Professor Matthew Jockers at Washington State University, and later researchers at the University of Vermont’s Computational Story Lab, analysed data from thousands of novels to reveal six basic story types – you could call them archetypes – that form the building blocks for more complex stories. The Vermont researchers describe the six story shapes behind more than 1700 English novels as:

1. Rags to riches – a steady rise from bad to good fortune

2. Riches to rags – a fall from good to bad, a tragedy

3. Icarus – a rise then a fall in fortune

4. Oedipus – a fall, a rise then a fall again

5. Cinderella – rise, fall, rise

6. Man in a hole – fall, rise

The researchers used sentiment analysis to get the data – a statistical technique often used by marketeers to analyse social media posts in which each word is allocated a particular ‘sentiment score’, based on crowdsourced data. Depending on the lexicon chosen, a word can be categorised as positive (happy) or negative (sad), or it can be associated with one or more of eight more subtle emotions, including fear, joy, surprise and anticipation. For example, the word ‘happy’ is positive, and associated with joy, trust and anticipation. The word ‘abolish’ is negative and associated with anger.

Do sentiment analysis on all the words in a novel, poem or play and plot the results against time, and it’s possible to see how the mood changes over the course of the text, revealing a kind of emotional narrative. While not a perfect tool – it looks at words in isolation, ignoring context – it can be surprisingly insightful when applied to larger chunks of text, as this blog post on Jane Austen novels from data scientist Julia Silge shows. The tools to do sentiment analysis are freely available, and much out-of-copyright literature can be downloaded from online repository Project Gutenberg. We looked at some of the best-loved tales from BBC Culture’s 100 stories that shaped the world poll to try and find the six story types.

The Divine Comedy (Dante Alighieri, 1308-1320)

Translated by Henry Wadsworth Longfellow

Story type: Rags to riches