Gender bias is an insidious problem throughout society. It arises most obviously through deliberate discrimination but also exists through widespread unconscious bias. This permeates our culture, our workplaces, and even our language, often in ways we are unaware of.

The first step in changing this is uncovering bias where it exists. And that’s where the emerging science of computational linguistics is turning out to be useful.

This relatively new discipline uses data mining and machine learning to study text. And it has begun to reveal biases in everything from Wikipedia articles to the language itself.

Adjectives associated with male and female terms in novels shortlisted for the Man Booker prizes.

Today, Nishtha Madaan at IBM Research India and colleagues go further. They say they have used the same technique to reveal a significant gender bias in books nominated for the Man Booker Prize, one of the world’s top literary prizes, awarded each year to the best original novel written in English.

Their approach is relatively straightforward. Madaan and his colleagues consider all books shortlisted for the prize between 1969 and 2017, some 275 novels in total. Instead of analyzing the text from the novel, the team studied the description of the books posted to Goodreads, a social catalogue owned by Amazon that offers free access to descriptions, reviews, and ratings of more than 400 million books.

They then asked how men and women were portrayed in these descriptions. The answers make for uncomfortable reading. “This reveals the pervasiveness of gender bias and stereotype in the books on different features like occupation, introductions, and actions associated to the characters in the book,” say Madaan and co.

For a start, women are mentioned far less than men in these books—on average around 15 times versus 30 for men.

They are also described very differently. To show how, Madaan and co extracted adjectives associated with male and female terms in the text. They then created word clouds to show which terms appear more often for each sex.

These word clouds are shown here in the accompanying graphic—no prizes for guessing which is which.

The team also study stereotypes by extracting the occupation of characters and then creating male and female word clouds. The top occupations for men are: doctor, physician, surgeon, psychologist, professor, scientist, business, director, and so on.

By contrast the top occupations for women are: teacher, lecturer, nurse, whore, hooker, child wife, child bride, and so on.

“We observed that while analyzing occupations for males and females, higher level roles are designated to males while lower level roles are designated to females,” say Madaan and co.

There are some positive signs of change, however. The team says that in recent years, shortlisted books have begun to appear in which women play a central role in the ext. These include Do Not Say We Have Nothing by Madeleine Thien, How to be Both by Ali Smith, We Are All Completely Beside Ourselves by Karen Joy Fowler, and others.

That’s interesting work but it suffers from some shortcomings. Most significant is that the team does not clearly describe the data it has gathered, the size of this database, when it was written, or by who. That makes the work hard to assess.

For example, it may be that the descriptions of the books are not written by the authors themselves but by a correspondent on Goodreads. So any bias may come from this correspondent rather than reflect the book. The authors do not appear to have explored this possibility.

And of course, the authors of the books might argue that their novels explore bias and its impact on society. For this reason, the novels must reflect this bias in the text. The authors might say it was never their intention to produce a gender-neutral novel, for example.

Nevertheless, this paper shows the potential to explore bias in culturally significant work. Indeed, the authors have already used this technique to explore bias in Bollywood movie scripts and have found significant gender stereotyping, particularly with respect to occupations.

The team is also developing a mechanism for removing bias. Just how useful this would be for novels shortlisted for the Man Booker Prize isn’t clear. But it certainly serves to highlight a problem that undoubtedly needs more attention.

Ref: arxiv.org/abs/1807.10615: Judging a Book by its Description: Analyzing Gender Stereotypes in the Man Bookers Prize Winning Fiction