Using a large social media dataset and open-vocabulary methods from computational linguistics, we explored differences in language use across gender, affiliation, and assertiveness. In Study 1, we analyzed topics (groups of semantically similar words) across 10 million messages from over 52,000 Facebook users. Most language differed little across gender. However, topics most associated with self-identified female participants included friends, family, and social life, whereas topics most associated with self-identified male participants included swearing, anger, discussion of objects instead of people, and the use of argumentative language. In Study 2, we plotted male- and female-linked language topics along two interpersonal dimensions prevalent in gender research: affiliation and assertiveness. In a sample of over 15,000 Facebook users, we found substantial gender differences in the use of affiliative language and slight differences in assertive language. Language used more by self-identified females was interpersonally warmer, more compassionate, polite, and—contrary to previous findings—slightly more assertive in their language use, whereas language used more by self-identified males was colder, more hostile, and impersonal. Computational linguistic analysis combined with methods to automatically label topics offer means for testing psychological theories unobtrusively at large scale.

Funding: This work was supported by the Templeton Religion Trust, 0048, https://www.templeton.org , to MEPS LHU HAS JCE GP. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Data Availability: All result files are available from the Open Science Framework database: https://osf.io/cv73k/?view_only=1c1bd198f906475b857277b8645b955e (database accession number(s) osf.io/2gd5v). The result files that we made available on the OSF database contain the data necessary to reproduce the Tables and Figures contained in the document. The authors are not authorized, however, to share the individual-level Facebook data because it would with be an IRB ethics violation—the privacy of participants would be compromised. Interested users with appropriate CITI certification and IRB approval can contact the MyPersonality Application ( http://mypersonality.org/wiki/doku.php?id=database_use_guidelines ) for permission to access the original dataset. This sensitive and private data is not available directly through the Facebook API.

Copyright: © 2016 Park et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Introduction

How do women and men use words differently? While language use typically differs minimally across self-reported gender, statistical models can accurately classify an author’s gender affiliation with accuracies exceeding 90% [1], suggesting that some differences do indeed exist. Black box statistical models, however, provide little insight into the psychological meaning of these gender differences. In this study, we combine techniques from computational linguistics with established psychological theory. Through an exploration of the language of over 68,000 participants, language analysis identified the linguistic features that most differentiate language used by either self-reported females or males.

Gender-Linked Language The study of gender differences in language has a long history that spans gender studies, psychology, linguistics, communication, and computational linguistics, among other fields. Investigating gender differences has been, at times, considered controversial [2, 3], although a consensus has emerged that gender remains an important variable worthy of scientific investigation (e.g., [4, 5, 6]. While language use varies only minimally across gender [7], algorithms capable of identifying female versus male authors with a high degree of accuracy (e.g., [8]) beg the question: what linguistic features account for these measurable gender differences? Individual studies and meta-analytic reviews have found evidence for gender-linked language features, such as words, phrases, and sentence length, that are used consistently more by one gender than the other (male-linked if used more by men; female-linked if used more by women). In most studies, researchers have identified gender-linked features by comparing text samples from self-identified females and males, counting the frequencies of theoretically interesting features in each text (e.g., use of the first-person singular), comparing average frequencies across gender, and then interpreting results in terms of psychological theory [9, 10, 11]. For example, a meta-analysis conducted by Newman et al. [12] compared the language of men and women across 14,000 samples of text from a broad range of sources. Individuals’ writings were processed into word categories using the Linguistic Inquiry and Word Count tool (LIWC; [13]). The authors reported gender differences in 35 word categories, although most effect sizes were small by conventional standards (|d| ≤ .20; [14]). Men used more articles (e.g., “a”, “an”, “the”), quantifiers (e.g. “few” “many” “much”), and spatial words (e.g., “above”, “over”), were more likely to swear, and were more likely to discuss money- and occupational-related topics. Women used more personal pronouns, intensive adverbs (e.g., “really”, “very”, “so”), and emotion words, and were more likely to discuss family and social life. The differences were interpreted as reflecting a male tendency towards objects and impersonal topics and a female tendency towards psychological and social processes. Another line of research found similar gender-linked features [15, 16]. Across these empirical studies and literature reviews, male-linked features included directives (e.g., “do this.”), judgmental adjectives (e.g., “good”, “stupid”), and references to location and quantity, whereas female-linked features included hedging (“seems”, “maybe”, “kind of”), longer sentences, intensive adverbs (e.g., “so”, “really”), and references to emotions (e.g., “excited”, “happy”, “hurt”). Mulac et al. [17] compared the magnitude of gender differences to that of two cultures speaking the same language, suggesting that these features reflect a male culture that is direct, succinct, status-oriented, and object-focused, and a female culture that is indirect, elaborate, and person-focused. These differences matter because they influence perceptions of an author’s interpersonal qualities. On the basis of language samples alone, judges blind to authors’ self-reported gender tended to rate females as nicer, more pleasant, and more intellectual, and rated males as stronger, louder, and more aggressive [18, 19]. Leaper and Ayres [20] summarized decades of research by organizing meta-analyses of gender-linked language around the interpersonal dimensions of affiliation and assertiveness. They defined assertive language as language used to influence, such as imperative statements, suggestions, criticisms, and disagreements. Affiliative language was defined as language affirming the speaker’s relationship with the listener, including statements of support, active understanding, agreement, and acknowledgment. The meta-analysis indicated that men used more assertive language and women used more affiliative language, but the sizes of these differences was moderated by methodological features of each study. For example, differences in assertiveness were most pronounced when participants were asked to discuss non-personal topics or to deliberate a specific issue. The prevalence of affiliation/assertiveness in gender research has motivated inquiry into how these dimensions relate to the Big Five personality framework. Assertiveness was found to correlate with extraversion, particularly the activity and excitement-seeking facets, whereas affiliation is captured by empathy-related aspects of agreeableness [21, 22]. Affiliation and assertiveness are the main axes of the interpersonal circumplex, a visual representation of behavioral tendencies (Fig 1) [23, 24]. The interpersonal circumplex is described in detail in Study 2, in which we demonstrate a method of automatically labeling topics as affiliative or assertive, based on personality scores of the people that use the topics most frequently. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Fig 1. The Interpersonal Circumplex. https://doi.org/10.1371/journal.pone.0155885.g001