This week I noticed an article titled “ALGORITHMS FIND TOP 11 ADJECTIVES FOR MEN V. WOMEN IN 3.5M BOOKS”, discussing a scientific paper published this week for a conference in Copenhagen. It stated:

Machine learning analyzed 3.5 million books to find that adjectives ascribed to women tend to describe physical appearance, whereas words that refer to behavior go to men. “Beautiful” and “sexy” are two of the adjectives most frequently used to describe women. Commonly used descriptors for men include righteous, rational, and courageous. Researchers trawled through an enormous quantity of books in an effort to find out whether there is a difference between the types of words that describe men and women in literature. Using a new computer model, the researchers analyzed a dataset of 3.5 million books, all published in English between 1900 to 2008. The books include a mix of fiction and non-fiction literature.

I was thinking, this could have been a job for Writer’s Secret! Writer’s Secret uses the same 3.5 million book dataset that this report was based on, via Google’s Ngram Corpus.

As Writer’s Secret gives us words commonly associated with other words, based on this dataset, it also can be used to highlight biases or prejudice intrinsic in our society, or at least demonstrated in the massive dataset of books available.

This could be interesting! Inspired by the study, let’s take a look at the results if we enter ‘man’ and ‘woman’ into Writer’s Secret:

Straight off the bat, you can see that age is important, as young and old are the most frequently used adjectives that come before both man and woman, but after that things do start to diverge.

Actually, you can see by how red young and old are, that these have higher relative scores for woman.

Here are the words* with their scores graphed, for man:

*I removed non-descriptive adjectives, so it’s goodbye one, in, and on, first, last, same, other.

And here are the words with their scores graphed, for woman:

To see the data more clearly, I categorised the words, into categories such as age, marital state, beauty, etc.

Here’s what we get if we compare the categories for man and woman:

Age appears to be slightly more important for woman, but its importance in general is also making it difficult to compare the rest of the categories, let’s zoom in on the graph without age:

So, there you go. Very interesting – marital state, beauty, and race/nationality* appear to be much more often used to describe woman, while morals, stature, respect and social status were much more commonly used to described man.

*Actually, other than white/black, race or nationality didn’t even come up for man, whereas woman was described as american, indian, and jewish. (jewish was one category that was difficult to know whether to put under race/nationality or religion.)

For anyone interested, you can check out the data I have used for this analysis here.

If you’re interested to play around yourself with Writer’s Secret, download it here.

You can try Writer’s Secret out for free for 14 days, or activate the app for just US$0.99! (Or the equivalent in your local currency)