Most characteristic words in pro- and anti-feminist tweets

Here are, based on my analysis (which I’ll get to in a moment) clouds of the 40 words most characteristic of anti-feminist and pro-feminist tweets, respectively.

anti-feminist pro- feminist

Word clouds my may be only semi-quantitative but they have other virtues, like recognizability and explorability. For the purists, there’s a bar chart below.

I’ll mostly talk about my results here; the full methodology is available on my other, nerdier blog, which links to all the code so you can reproduce this analysis yourself, if you so desire. (We call ourselves data scientists, and science is supposed to be reproducible, so I strongly believe I should empower you to reproduce my results if you want … or improve on them!) Please also read the caveats I’ve put at the bottom of this post.

Full disclosure: I call myself a feminist. But I believe my only agenda is to elucidate the differences in vocabulary that always happen around controversial topics. As CPG Grey explains brilliantly, social networks of ideologically polarized groups like republicans and democrats or atheists and religious people mostly interact within the group, only rarely participating in a rapprochement or (more likely) flame war with the other side. This is fertile ground for divergent vocabulary, especially in this case when one group defines itself as opposed to the other (as if democrats called themselves non-republicans). I am not going into this project with a pro-feminist agenda, but of course I acknowledge I am biased. I worked hard to try to counter those biases, and I’ve made the code available for anyone to check my work. Feel free to disagree!

A brief (for me) description of the project: In January, I wrote a constantly running program that periodically searches the newest tweets for the terms ‘feminism’, ‘feminist’ or ‘feminists’ (and random intervals and random depth, potentially as often as 1500 tweets within 15 minutes), and collected almost 1,000,000 tweets up to April 2015. Then with five teammates (we won both the Data Science and the Natural Language Processing prizes at the Montreal Big Data Week Hackathon on April 19, 2015), We manually curated 1,000 tweets as anti-feminist, pro-feminist or neither (decidedly not an obvious process, read more about it here). We used machine learning to classify the other 390,000 tweets (after we eliminated retweets and duplicates, anything that required only clicking instead of typing), then used the log-likelihood keyness method to find which words (or punctuation marks, etc.) were overrepresented the most in each set.

And here are my observations:

1. Pro-feminists (PFs) tweet about feminism and feminist (adjective), anti-feminists (AFs) tweet about feminists, as a group.

Since they’re search terms so at least one of those words was in every tweet, their absolute log-likelihood values are inflated so I left them out of the word clouds. However, the differences between them are valid, and instructive. (But see the caveats below) AFs seem to be more concerned with feminists as a collective noun (they tweetabout the people they oppose, not the movement or ideology), while PFs tweet about feminism or feminist (usually as an adjective). 2. PFs use first- and second-person pronouns, AFs use third-person pronouns

Similarly to #1 above, and inevitably when one group defines itself as not belonging to the other, AFs tweet about feminists as a plural group of other people, while feminists tweet about and among themselves. Note that in NLP, usually pronouns are so common they’re considered “stopwords”, and are eliminated from the analysis. But with 140-character tweets, I figured every word was chosen with a certain amount of care. 3. The groups use different linking words to define feminism

PFs talk about what feminism is for or about, why we need feminism, what feminism is and isn’t, what feminists believe; AFs tweet about what feminists want, ask can someone explain why feminists engage in certain behaviors which they don’t get, say feminists are too <insert adjective>, and often use the construction With <this, then that>. 4. PFs link to external content, AFs link to local and self-created content.

PFs link more in general to http content via other websites; AFs use the #gamergate hashtag, reference @meninisttweet, and link to @youtube videos rather than traditional media (that term doesn’t appear in the word cloud, but it has a log-likelihood of 444 in favor or AFs). AFs also reference their platform, Twitter, a lot; feminists don’t, presumably because they’re also interacting in other ways. 5. AFs use more punctuation

Besides “feminists”, the number-one token for AFs was the question mark; they have a lot of questions for and about feminists, many of them rhetorical. The exclamation point wasn’t far behind, followed by the quotation mark, both to quote and to show irony. PFs start tweets with ‘+’ and “=” (usually as ‘==>’) for emphasis. Rounding out the non-alphabetic characters, AFs use 2 as a shorter form of ‘to’ or ‘too’, while PFs link more often to listicles with 5 items. 6. AFs tweet more about feminist history.

Unsurprisingly, PFs tweet about their goals, equality and rights, and defend themselves against accusations of misandry. But it’s the AFs who tweet about modern and third-wave feminism, displaying knowledge about the history of the movement. 7. PFs use more gender-related terms

This one is all PF: they reference gender, genders, sexes, men and women more than AFs. 8. AFs use more pejorative terms

AFs use fuck, hate, annoying and, unfortunately, rape a lot; they also use derisive terms like lol, the “face with tears of joy” emoji and smh (shaking my head, not in the top 40 but still a high log-likelihood value of 484).

Caveats: