Over the weekend, Drudge linked to this sensational click-bait article from the Daily Mail (click image for link):

Can all of storytelling be broken down into six and only six basic classifications? Are there secret story structures that will enhance the success and popularity of an author’s narrative? The answers are no, and yes respectively. If you’d like to understand why, then read on.

In his groundbreaking work, The Storytelling Animal, author Jonathan Gottschall offers a unified theory of story-telling. He argues that the human mind is wired to seek out the structure or pattern underlying everyday events and spin them together in a coherent narrative. Drawing on neuroscience, evolutionary biology, and psychology, Gottschall argues that storytelling is a key characteristic of human cognition. Gottschall’s insights are critical for anyone seeking to change the culture at its roots through better, more compelling narratives, rather than wasting time attempting to prune the political branches.

Thus, I was fascinated to see the claim that human storytelling may be neatly classified into six basic story shapes. Writing in “The emotional arcs of stories are dominated by six basic shapes,” [arXiv:1606.07772v1 [cs.CL] 24 Jun 2016], Andrew J. Reagan and his colleagues argue that stories may be categorized analytically:

“Advances in computing power, natural language processing, and digitization of text now make it possible to study our a culture’s evolution through its texts using a “big data” lens. Our ability to communicate relies in part upon a shared emotional experience, with stories often following distinct emotional trajectories, forming patterns that are meaningful to us. Here, by classifying the emotional arcs for a filtered subset of 1,737 stories from Project Gutenberg’s fiction collection, we find a set of six core trajectories which form the building blocks of complex narratives. We strengthen our findings by separately applying optimization, linear decomposition, supervised learning, and unsupervised learning. For each of these six core emotional arcs, we examine the closest characteristic stories in publication today and find that particular emotional arcs enjoy greater success, as measured by downloads.”

Their paper reviews a wide variety of proposed story classifications. For instance, writing in The Seven Basic Plots: Why We Tell Stories, Christopher Booker proposed seven narrative structures:

Overcoming the monster (e.g., Beowulf ).

Rags to riches (e.g., Cinderella).

The quest (e.g., King Solomons Mines).

Voyage and return (e.g., The Time Machine).

Comedy (e.g., A Midsummer Night’s Dream).

Tragedy (e.g., Anna Karenina).

Rebirth (e.g., Beauty and the Beast).

Other more complicated story classification schemes include those in Twenty Master Plots and How to Build Them by Ronald Tobias, and Georges Polti’s The Thirty Six Dramatic Situations. Reagan et al further share a fascinating historical footnote. Kurt Vonnegut’s rejected master’s thesis defined stories on axes of “beginning-end,” and “Ill-fortune – great-fortune.”

Following in that spirit, Reagan et al devised a scheme to measure the emotional arc of a story. The axes they chose are “beginning-end,” and “happy-sad.”

They employed Amazon’s Mechanical Turk to crowd source emotional scores for 10,000 common words. Individuals assigned high numerical scores to emotionally positive words (happy, love, joy, etc.) and low numerical scores to emotionally negative words (death, sad, hate, anger) on a 1 to 9 point scale.

The researchers then slide a window across the text, looking for variations in the emotional score of the various sections. The result is a plot of the emotional arc of a story yielding insights to pacing and structure.

Here’s an example from the researchers’ paper, analyzing Harry Potter and the Deathly Hallows:

The emotional arc analysis ties nicely in with the obvious plot ups and downs. The researchers analyzed 1700 public domain works from Project Gutenberg and found six story arcs account for about 75% of all stories:

”Rags to riches” (rise; SV 1) 15.4%.

”Tragedy”, or “Riches to rags” (fall; -SV1) 25.4%.

“Man in a hole” (fall-rise; SV2) 12.7%.

“Icarus” (rise-fall; -SV2) 9.7%.

“Cinderella” (rise-fall-rise; SV3) 6.0%.

“Oedipus” (fall-rise-fall; -SV3) 6.3%.

“Count of Monte Cristo” (fall-rise-fall-rise; SV4) 6.2%.

I added a seventh category to the six of the researchers, since it makes no sense to truncate the SV4 category at 6.2% of all stories while including Cinderella stories at 6.0%. The next category, -SV4 or rise-fall-rise-fall comes in at 2.7%, a factor of two below the others. These seven categories then, account for 80% of the works in the sample.

The chart captures these and additional details:

These seven story arcs account for about 80% of the Project Gutenberg sample, so clearly not all stories fit into these six story types – the remaining 20% have more complicated trajectories.

In a way, these results are unremarkable. Stories have ups and downs. Some have more than others. And we can classify stories according to the number of their emotional ups and downs. Two key takeaways are particularly interesting.

First is the overall distribution of the works. More works have emotionally simpler arcs like rags-to-riches and tragedy. There are fewer of the more complicated arcs, like Cinderella or Oedipus. Writers prefer simpler arcs: there are more simple stories written than complex stories. But what do readers prefer?

The second conclusion is more interesting to me as an author. The researchers analyzed the download frequency from Project Gutenberg. The most popular works (those with the most relative downloads) tended to be more complex. Stories with a fall-rise-fall-rise structure (SV4) showed the greatest relative popularity, followed closely by the Oedipus and Tragedy arcs. So although writers may prefer simple arcs, the most popular arcs with readers are more complicated arcs.

The researchers’ story analysis engine is available online here, and users may select from a wide selection of public domain novels and adjust the analysis settings. I tried out the engine in default settings on a few of my favorite novels to see if I could draw any further conclusions. Let’s take a look at:

The links go to the free Kindle editions (except for The Count of Monte Cristo which apparently has no Kindle edition less than $0.99). I left out The Scarlet Pimpernel, since for some surprising reason, I could not find it in the researchers’ results.

Captain Blood fits into SV5: fall-rise-fall-rise-fall.

Scaramouche is harder to characterize. One might categorize it as fall-rise-fall (-SV3), ignoring the more subtle variations in the second half of the book, or perhaps fall-rise-fall-rise-fall-rise (-SV6).

King Solomon’s Mines is an interesting outlier that defies easy analysis. A quest story, it has more modest ups and downs, followed by a climactic battle at the two thirds point and a return to the previous equilibrium. The best categorization is probably fall-rise (SV2).

The Three Musketeers is a real roller coaster, even more difficult to analyze. It might be considered something like -SV11.

Count of Monte Cristo is a classic fall-rise-fall-rise (SV4), and since the researchers declined to name SV4, I’ve dubbed it “The Count of Monte Cristo” arc.

Les Miserables might loosely be categorized as fall-rise-fall-rise (SV4), but the many twists and turns of the plot defy an easy categorization. Also, Hugo’s work has the most extreme variation of any of the others.

Any writer would be thrilled to create works the equal of any of these, and we see these great works follow a wide variety of emotional templates. They do tend to exhibit more complicated emotional arcs, consistent with the researchers’ observation that more complicated structures tend to be more popular. The fascinating conclusion – although seven arcs account for 80% of works in the sample, some of the best works in literature follow the more complicated arcs of the remaining 20%.

In conclusion, an analysis of the emotional arc of a story provides a valuable tool for understanding how and why the story works. A simplistic conclusion that all stories fit into six patterns is highly misleading. A more reasonable cutoff would be seven, but there are many excellent stories with far more complicated structures. Furthermore, more complicated structures tend to be more popular with those downloading from Project Gutenberg, and feature among my own favorite works.

The emotional analysis appears to be a useful analytic tool. The researchers promise that an interface is coming that would allow authors to analyze their own works, for instance. Certainly, however, there are additional modes of literary analysis that might prove similarly useful. Instead of plotting the “happy-sad” emotional arc, it would be interesting to analyze works on a “passive-active” scale to isolate description from action. And it would be interesting to see if the researchers’ conclusions – drawn from an analysis of older (pre-1924) public domain works – would be the same if they studied more modern works.

The researchers offer some other valuable tools, including a tool to analyze movie scripts, and a daily analysis and historical record of the mood of tweets. And the full article is well worth a read: “The emotional arcs of stories are dominated by six basic shapes,” [arXiv:1606.07772v1 [cs.CL] 24 Jun 2016].