This may not seem like anything special, Vonnegut says—his actual words are, “it certainly looks like trash”—until he notices another well known story that shares this shape. “Those steps at the beginning look like the creation myth of virtually every society on earth. And then I saw that the stroke of midnight looked exactly like the unique creation myth in the Old Testament.” Cinderella’s curfew was, if you look at it on Vonnegut’s chart, a mirror-image downfall to Adam and Eve’s ejection from the Garden of Eden. “And then I saw the rise to bliss at the end was identical with the expectation of redemption as expressed in primitive Christianity. The tales were identical.”

Vonnegut, in his ever charming way, was quite pleased with himself for making this connection. And 35 years later, his idea had resonated enough with a group of mathematicians and computer scientists that they decided to build an experiment around it. Vonnegut had mapped stories by hand, but in 2016, with sophisticated computing power, natural language processing, and reams of digitized text, it’s possible to map the narrative patterns in a huge corpus of literature. It’s also possible to ask a computer to identify the shapes of stories for you.

That’s what a group of researchers, from the University of Vermont and the University of Adelaide, set out to do. They collected computer-generated story arcs for nearly 2,000 works of fiction, classifying each into one of six core types of narratives (based on what happens to the protagonist):

1. Rags to Riches (rise)

2. Riches to Rags (fall)

3. Man in a Hole (fall then rise)

4. Icarus (rise then fall)

5. Cinderella (rise then fall then rise)

6. Oedipus (fall then rise then fall)

Their focus was on the emotional trajectory of a story, not merely its plot. They also analyzed which emotional structure writers used most, and how that contrasted with the ones readers liked best, then published a preprint paper of their findings on the scholarship website arXiv.org. More on that in a minute.

First, the researchers had to find a workable dataset. Using a collection of fiction from the digital library Project Gutenberg, they selected 1,737 English-language works of fiction between 10,000 and 200,000 words long.

Then, they ran their dataset through a sentiment analysis to generate an emotional arc for each work. “We’re not imposing a set of shapes,” said Andy Reagan, a Ph.D. candidate in mathematics at the University of Vermont and the lead author of the paper. “Rather: the math and machine learning have identified them.”

They did this by training the machine to take all the words of the book, section by section, and measure the average happiness of a given bag of words based on how an individual word scored. The researchers assigned individual happiness scores to more than 10,000 frequently-used words by crowdsourcing the effort on the website Mechanical Turk. This portion of the research is fascinating in and of itself: The 10 words that people ranked as happiest were laughter, happiness, love, happy, laughed, laugh, laughing, excellent, laughs, and joy. The 10 words that people ranked as least happy were terrorist, suicide, rape, terrorism, murder, death, cancer, killed, kill, and die. (You can see how all the words ranked by visiting this site.)