Back in 1969, a couple of psychologists from the University of Illinois began studying the way people in different cultures use words. Their conclusion was that whatever their culture, people tended to use positive words more often the negative ones.

This finding is now known as the Pollyanna hypothesis, after a 1913 novel by Eleanor Porter about a girl who tries to find something to be glad about in every situation.

But although widely known, this work involved a relatively small number of people. So the findings are generally thought of as suggestive rather than conclusive. Indeed, since then various researchers have conducted similar studies with various contradictory results.

What’s needed, of course, is a study so large and comprehensive that it settles the question beyond doubt. And today we get one thanks to the work of Peter Dodds of the Computational Story Lab at the University of Vermont in Burlington and a few pals.

These guys have measured the frequency of positive and negative words in a corpus of 100,000 words from 24 languages representing different cultures around the world. And their happy conclusion is that the data backs up the Pollyanna hypothesis. “The words of natural human language possess a universal positivity bias,” they say.

They begin by collecting a corpus of words for each of 10 languages, including English, Spanish, French, German, Brazilian Portuguese, Korean, Chinese, Russian, Indonesian and Arabic. For each language, they selected the 10,000 most frequently used words.

Next, the team paid native speakers to rate how they felt about each word on a scale ranging from the most negative or sad to the most positive or happy. Overall, they collected 50 ratings per word resulting in an impressive database of around 5 million individual assessments. Finally, they plotted the distribution of perceived word happiness for each language.

The results bring plenty of glad tidings. All of the languages show a clear bias towards positive words with Spanish topping the list, followed by Portuguese and then English. Chinese props up the rankings as the least happy. “Words—the atoms of human language — present an emotional spectrum with a universal positive bias,” they say.

This is just the beginning for Dodd and co, however. They go on to use these findings as a ‘lens’ through which to evaluate how the emotional polarity changes in novels. So for a wide range of novels, they counted the frequency of positive and negative words in a section of text to determine its emotional bias.

This shows, for example, that both Moby Dick and Crime and Punishment end on low notes, while the Count of Monte Cristo culminates with a rise in positivity. That’s more or less exactly how a human reader would view these novels.

And so that anyone can sample their wares, the team has produced an online tool that allows anybody to interrogate a wide range of major novels to see how the positivity and negativity of words changes throughout. This tool is available at this website. It’s worth a look if you have 20 minutes to spare.

The same site also allows direct comparisons between the same words in different languages. This reveals some interesting contrasts between languages. For example, on a scale of 1 to 9 with nine being the happiest, Germans rate the word “gift” as 3.54. That’s slightly negative. By contrast, English speakers rate “gift” as strongly positive at 7.72.

That’s an interesting study that reveals a universal bias in towards positivity human language. And it fits nicely into a broader body of research in psychology suggesting that positivity plays a more important role in most people’s existence than negativity. For example, we tend to remember pleasing information more accurately than unpleasant information.

The research raises a number of interesting questions. For example, what accounts for the differences in positivity. Why is Chinese a less happy language than German or Portuguese or any other language in the study? And why is Spanish the happiest?

These are clearly questions for the future. But what Dodds and co have been able to show is the huge power that data mining brings to psychology and linguistics when coupled with crowd sourced research.

Of course, it’s not the first time that anyone has combined data mining and crowdsourcing in this way. But it should help to set the standard by which other studies can be judged. For example, sentiment analysis is fast becoming an important tool on Twitter for analysing everything from product reviews to political affiliation. But if there is a strong bias towards positive language in the first place, that is obviously an important factor to take into account.

Clearly, there is a dramatic change in how psychologists, social scientists and anthropologist are carrying out their work. And we’ll be watching to see what else comes from the fascinating conjunction of computer science and social science.

Ref: arxiv.org/abs/1406.3855 : Human Language Reveals A Universal Positivity Bias