Abstract Previous studies suggest a significant role of language in the court room, yet none has identified a definitive correlation between vocal characteristics and court outcomes. This paper demonstrates that voice-based snap judgments based solely on the introductory sentence of lawyers arguing in front of the Supreme Court of the United States predict outcomes in the Court. In this study, participants rated the opening statement of male advocates arguing before the Supreme Court between 1998 and 2012 in terms of masculinity, attractiveness, confidence, intelligence, trustworthiness, and aggressiveness. We found significant correlation between vocal characteristics and court outcomes and the correlation is specific to perceived masculinity even when judgment of masculinity is based only on less than three seconds of exposure to a lawyer’s speech sample. Specifically, male advocates are more likely to win when they are perceived as less masculine. No other personality dimension predicts court outcomes. While this study does not aim to establish any causal connections, our findings suggest that vocal characteristics may be relevant in even as solemn a setting as the Supreme Court of the United States.

Citation: Chen D, Halberstam Y, Yu ACL (2016) Perceived Masculinity Predicts U.S. Supreme Court Outcomes. PLoS ONE 11(10): e0164324. https://doi.org/10.1371/journal.pone.0164324 Editor: Ian D. Stephen, Macquarie University, AUSTRALIA Received: March 9, 2016; Accepted: September 25, 2016; Published: October 13, 2016 Copyright: © 2016 Chen et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability: All data files are available at https://figshare.com/s/eede53edfedf12a75c01. Funding: This work was partially supported by the Social Sciences and Humanities Research Council of Canada, the European Research Council, Humanities Division of the University of Chicago, the Connaught Fund at the University of Toronto, Swiss National Science Foundation, and Agence Nationale de la Recherche. Competing interests: The authors have declared that no competing interests exist.

Introduction Voice-based first impressions can be formed rapidly with very brief exposure (less than half a second of speech [1–4]) and such impressions often are associated with subsequent behavior of the perceiver [5–7]. For example, voice-based personality judgments are associated with mate selection [8], leader election [9, 10], housing options [11], consumer choices, and jury decision [12]. Although researchers have demonstrated how vocal perception influences the communication process [13], it remains unclear whether such influences find resonances in a communicative setting like oral arguments at the Supreme Court of the United States (SCOTUS), where subtle biases have consequences for major policy outcomes. To be sure, previous studies suggest a significant role for linguistic cues in the court room [12, 14, 15], yet none has identified a definitive connection between voice perceptions and actual court outcomes. A priori, there are many reasons why inferences from voice should not play an important role in Supreme Court decisions. From a rational perspective, information about the advocate should override any first impression. From an ideological perspective, court outcomes are largely predetermined. From a judge’s legal perspective, decisions are justified not in terms of the advocate’s voice but in terms of the legal content of the argument. And from an economic perspective, correlations between malleable advocate characteristics and high-stakes outcomes in the United States Supreme Court should not persist as law firms and advocates are likely to adjust their behavior to eliminate such correlations. At the same time, from a behavioral perspective, however, it has been repeatedly shown that the way one speaks reveals a lot about one’s personality, level of confidence, as well as ethnicity, socio-economic circumstances, geographic background, sexuality, and ideological stance [8, 16–18]. The identification of African American speakers can be made even on the basis of the single word “hello” [11]. The percept of gay male speech and/or feminine male speech is linked to vowel formant structure [19], pitch [20] and the length and quality of /s/ [21, 22]. The released variant of word-final /t/ may be used as a resource of constructing nerd identity among female nerds [23], learnedness among Orthodox Jewish men [24], gayness [25] and articulateness among US politicians [26]. To be sure, listeners’ interpretations of the meanings behind these linguistic cues might vary according to the listener’s level of experience with different speech varieties [27] and the identity of the speaker [26]. Nonetheless, even when visual cues are present, potential employers rely more on voice-based impressions of a job applicant’s competence and intellect in making hiring decisions [28]. In this study, we examine the relationship between how people perceive the voice personalities/attributes of advocates arguing before the court and whether these perceptions can predict real outcomes. To this end, we utilize recordings of oral arguments of the Supreme Court of the United States, which offer a wealth of court decisions that have real world impact. Specifically, we focus on the introductory statement of an oral argument. During an oral argument, counsels representing the competing parties of a case (i.e., the advocates for the petitioner and the respondent) each present their sides to the Justices. As the introductory statement of an advocate’s argument before the court is customarily “Mister Chief Justice, (and) may it please the court”, the corpus of introductory statements we have amassed provides a unique opportunity for examining the effect of speech and language on real world outcomes since the lexical content (the words) being evaluated is identical across speakers. The listeners can therefore focus their judgments on how the words are pronounced, rather than on the word choice of the advocates. Our empirical strategy is focused on testing models of cognitive bias. To infer the bias, we need to measure perceptions, which are typically unobserved, and how they relate to outcomes. Here, we focused on six dimensions, selected based on previous research on listener’s perceptual evaluations of linguistic variables [18, 29, 30]. These include masculinity, attractiveness, confidence, intelligence, trustworthiness, and aggressiveness. Masculine voices increase perception of dominance and fighting ability among men [31] and they increase attractiveness to women. Vote choices have also been shown to be influenced by perceptions of masculinity and femininity in male faces [32] and judgments about faces are shown to predict the outcomes of actual elections [32, 33]. Vocal attractiveness is often found to be linked to facial attractiveness [34–36]. Judgments of attractiveness are important in everyday interaction as physically attractive people are found to be more persuasive [37] and judged to be more socially desirable and to get better jobs [38]. Confidence, trustworthiness, and aggressiveness are all important aspects of human communication, which can be processed upon one’s very first encounter with an individual [39, 40]. Trustworthiness may, at least partly, influence attribution of competence and might affect voting behavior [33]. It is also an important precursor in the development of cooperation [41] and a fundamental aspect of the legal system [42]. Expressions of confidence have been shown to affect persuasion [43]. Aggressiveness, which indexes a person’s assertiveness, also provides a means to counter the positive orientation of the dimensions considered. A person’s intelligence cannot be observed directly and must be inferred from indirect cues such as voice. Perceived intelligence has been found to affect an individual’s employability [28]. Listeners’ judgments along these dimensions are used as predictors of court outcomes. Given the exploratory nature of this study, it is worth emphasizing at the outset that it is not the goal of this study to advance any claims for any specific causal influence of voice on the SCOTUS outcomes. Rather, we aim to test whether people’s subjective voice-based trait judgments are predictive of the SCOTUS outcomes at all. To the extent that such correlations can be established, future studies will be needed to determine the causal mechanisms behind such relationships. This article begins with detailing the materials and methodologies used in this study in Section 2. The results are reviewed in Section 3, followed by robustness checks and extensions in Section 4. A discussion of the general findings is given in Section 5.

Materials and Methods Ethics Statement The study was approved by the Social and Behavioral Sciences Institutional Review Board at the University of Chicago, including a wavier of informed consent as it was determined that the research presents no more than minimal risk to subjects and a waiver of informed consent would not adversely affect the rights and welfare of subjects. Stimuli The stimuli for this study were drawn from oral arguments made in the Supreme Court of the United States between 1998 and 2012. A novel feature of our data is the use of identical 2 to 3 seconds of content delivered at the outset of each argument: “Mr. Chief Justice, (and) may it please the Court”. Our data consist of 1634 oral arguments made by 916 distinct male advocates, where about 80 percent of these advocates argued only once in the Supreme Court. Oral arguments at the Supreme Court have been recorded since the installation of a recording system in October 1955. The recordings and the associated transcripts were made available to the public in electronically downloadable format by the Oyez Project (http://www.oyez.org/), which is a multimedia archive at the Chicago-Kent College of Law devoted to the Supreme Court of the United States and its work. The audio archive contains more than 110 million words in more than 9000 hours of audio synchronized, based on the court transcripts, to the sentence level. Oral arguments are, with rare exceptions, the first occasion in the processing of a case in which the Court meets face-to-face in consideration of the issues. Usually, counsels representing the competing parties of a case each has thirty minutes in which to present their side to the Justices. The Justices may interrupt these presentations with comments and questions, leading to interactions between the Justices, the lawyers and, in some cases, the amici curiae, who are not a party to a case but nonetheless offer information that bears on the case not solicited by any of the parties to assist the Court. While oral arguments have been recorded since 1955, with the exception of those between 1998 to 2012, the bulk of the transcripts available on the OYEZ archive at the time this experiment was set up did not identify the speaking turns of individual Justices, referring to them all as “The Court”. The archive has since diarized all recordings. Participants Participants from Amazon MechanicalTurk (AMT) rated the voice clips of the Supreme Court advocates. About half (321) of the 634 distinct participants who completed our survey were female. Two thirds of the participants aged between 20 and 35 years old and one third were older than 35. Likewise, one third indicated they had some college education, whereas one third claimed to have a bachelor’s degree. The median income of those who completed the survey was about 40,000 US dollars. The racial and geographical distribution of the participants broadly reflect that of the US population. The correlation between the share of participants from a given state and the state share of US population is 0.9588. Further descriptive statistics of the AMT participants who participated in this research are presented in Table 1. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Table 1. Descriptive Statistics of Survey Participants (N = 634). This table presents descriptive statistics of survey participants who rated audio clips of Supreme Court oral arguments made by male advocates. The data are self-reported by participants before beginning the audio survey. https://doi.org/10.1371/journal.pone.0164324.t001 Procedure Participants were asked to rate the voice clips of Supreme Court advocates on a scale of 1 to 7 in terms of aggressiveness, attractiveness, confidence, intelligence, masculinity, and trustworthiness. As noted in the Introduction, these six dimensions were selected based on previous research on listener’s perceptual evaluations of linguistic variables [18, 29, 30]. Each voice clip was played aloud once automatically, but participants were allowed to replay the clip as many times as they chose; in another survey variant, each clip was played only once and participants were unable to replay the clip. We discuss this and other survey designs below. The order and polarity of the attributes were randomized across survey participants. For example, masculine would vary vertically along the 6 attributes, and very masculine and not at all masculine would vary from left to right as bounds on a 7-point scale. The order and the polarity of attribute scales were held fixed for any particular participant to minimize cognitive fatigue. Participants were also asked to predict whether the lawyer would win the case and to rate the quality of the audio recordings. Each participant rated 66 voice recordings. Of these, 60 were randomly drawn from the audio clip sample pool, and 6 of these were repeated as recordings 61 to 66 to measure the consistency of participant ratings. The participants were asked to use headphones to listen to the recordings. Amici curiae were also rated among the advocates, but are excluded from this study. No information regarding the identity of the speaker or the nature of the case were given to the participants. In Fig 1, we present a screenshot of the survey ratings page. (See S2, S3, and S4 Figs for screenshots of other sections of the task.) PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Fig 1. Survey filled by AMT participants. This figure is a screenshot of the survey matrix used by AMT participants to record their impressions of the audio recordings of advocates. The order and polarity of attributes were randomized across participants. Participants were not able to proceed to the next recording without completing the survey matrix and questions. https://doi.org/10.1371/journal.pone.0164324.g001 Analysis This section lays out the general analytic framework we employed in this study. To operationalize our empirical analysis we begin by constructing a measure of voice-based trait judgments. Let attribute itw be participant w’s perception of a given attribute of advocate i in case t, where attribute refers to any one of the six traits. These untransformed scores (range = 1–7) give more weight to participants who provide more signal amid greater variance in their ratings. Thus, to be conservative, our preferred measure adjusts for cross-participant variability in the cardinality of ratings as well as spread. Formally, for each participant and voice attribute, the normalized rating is given by (1) where is the average perception of a given attribute across participant w’s advocate ratings and σ(attribute) w is the standard deviation of these ratings. As a result, for each participant w, is a continuous measure with mean equals to zero and standard deviation equals to one. Using these measures, we estimate regression of the following form: (2) where the dependent variable is an indicator for whether advocate i actually won (= 1) or lost (= 0) case t, and the key independent variables denoted by the vector are continuous measures of the set of six attributes of the advocate in case t as perceived by participant w, as well the (normalized) perceived likelihood of winning. Given the regression equation, β represents the bias in actual wins associated with advocate traits. The vector is a set of advocate and participant covariates (described in Table 1) that we use to explore the influence of heterogenous perceptions of survey participants on our findings. These covariates include their age, gender, race, income, education and state of residence. To address the correlation in ratings among survey participants, we adjust the standard errors of the regression estimates for clustering at the oral argument level. For comparison purposes and for robustness, we also show baseline results using the untransformed scores as well as a collapsed version of the data, whereby we match only one voice measure to each oral argument by taking the average rating across participants for a given oral argument. In these regressions we lose variation in perceptions across participants. Broadly, these aggregated regressions mitigate the influence of classical measurement error that typically biases coefficient estimates toward zero. Additionally, using the collapsed data addresses any concern for mechanically increasing power by duplicating the number of oral arguments by the number of ratings per recording (even though we cluster at the recording-level in all regressions). On the other hand, aggregated regressions can lose precision because we also can no longer control for rater-specific correlations across perceptual ratings and participant characteristics. For these reasons, the aggregated regression is generally viewed as too conservative in terms of statistical precision [44]. For sake of completeness, we provide baseline results using the collapsed data as well. We use the linear probability model (OLS) as our primary estimation method, and show that our results are robust to the use of probit and logistic models. There are two main reasons for this choice. The first is that our objective is to estimate the correlation coefficients between perceived attributes of advocates and case outcomes rather than to develop a forecasting model of case outcomes, and OLS is superior for estimation purposes. And second, probit and logit are not well-suited to the use of regressions with controls for fixed effects (e.g., dummies for lawyer, participant, year of case argued, etc.) because of the incidental parameters problem [45], and our analysis includes many regressions with controls for fixed effects.

Robustness and Extensions In this section, we expand our analysis in a number of directions, including robustness to sample, ratings, and model variations. Given our findings that, even once removing cross-advocate variation, the negative correlation between perceptions of masculinity and court outcomes persists, we examine more closely whether our results are driven by cases argued in a certain year or by advocates with a certain degree of experience in arguing cases at the Supreme Court. To do this, we compare our baseline regression results for petitioners (column (1) in Table 5) to the regression results in Table 6. By including year fixed effects, column (1) in Table 6 addresses whether our findings are driven by a certain set of cases in our sample of oral arguments. Similarly, column (2) includes fixed effects for the number of oral arguments in our sample made by the same lawyer, which we take as a proxy for experience. In both specifications, the estimate on masculine remains significant but is slightly smaller in magnitude (1.7 versus 2 percentage points in the baseline regression). Given this, we can rule out that cohort, or time effects are significantly influencing our findings. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Table 6. Robustness Checks: Male Petitioners. This table presents coefficient estimates from regressions using data on Supreme Court oral arguments made by male advocates for the petitioner. The dependent variable is an indicator for whether the advocate won the case or not. Independent variables are voice-based ratings of advocate attributes normalized by survey particiapnt. Columns 1-2 report coefficient estimates using OLS with dummies for year of argument and number of cases argued by the lawyer where noted. Columns 3-4 report coefficient estimates using OLS where ratings that exceed the Mahalanobis distance of are omitted in column 3, and ratings by survey participants with scores in the top quintile on a measure of rating inconsistency are omitted in column 4 (see S1 Table). Columns 5-6 report baseline probit (logistic) regression results with marginal effects calculated at the means of the independent variables. Standard errors in parentheses are clustered by oral argument. https://doi.org/10.1371/journal.pone.0164324.t006 We next examine how our results change if we remove ratings that can be deemed as outliers. The first method to identify such outliers is by computing the Mahalanobis distance (MD) for ratings given by each participant for each audio clip. We then run the baseline regression excluding ratings that exceed the critical value associated with a 2.5 percent significance level, about 15 percent of our ratings. Column (3) in Table 6 shows the regression results excluding these ratings. The estimate on masculine is significant and slightly larger: one standard deviation increase in masculinity is associated with 2.2 percentage points decrease in winning. A second method we use to identify outliers is based on examining ratings on the set of 6 repeat audio clips. For each participant we computed a consistency score defined as the average absolute difference in attribute ratings on the set of identical audio clips. The mean (and median) consistency score across participants and attributes is approximately one (further details are available in S1 Table). In column (4), we present regression results excluding ratings by a 1/5 of participants with the worst consistency scores. As seen, the association between perceived masculinity and outcomes remains similar to the one in the baseline regression. We take these results to indicate that our findings are likely to be stronger if we were to carefully screen out ratings by participants who may have misunderstood or exerted insufficient effort on the task. In the final set of regressions, we show that our baseline estimates are robust to estimation method. In columns (5) and (6) of Table 6, we report estimates of marginal effects derived from applying a probit and logistic regression, respectively. In both cases, the estimate on masculine is nearly identical to the one we obtained using OLS. To examine whether the ratings we gathered are specific to our procedure, we varied the survey design on a subsample of 60 voice clips. Instead of the basic design where the listener is presented with one voice sample and rates the sample on all attributes, the participants were randomly assigned to rate only one attribute for each recording, thus obviating the potential of cross-attribute influence on each other for a given voice clip and also to control for the possibility of within-voice modeling by participants. The key difference between this survey and our main survey depicted in Fig 1 is that only one attribute, selected at random for each voice recording, appeared in question 1. While there are slight differences in ratings across surveys, the results are very similar suggesting further robustness of our key findings on the connection between voice-based trait judgments of advocates and Supreme Court outcomes. We illustrate the high degree of correlation in perceptions across surveys (S1 Fig) in the Supplemental Information (SI). Likewise, for this same subsample of 60 voice clips we were able to collect detailed information regarding the biographical characteristics of the advocates. Specifically, these include age, law school, whether the advocate was a member of the law review, had an additional graduate degree, was a Supreme Court clerk, and the total number of clerkships the advocate had. We found that including these covariates in a regression increased the precision of the estimate on masculine (see S2 Table). Overall, we acknowledge that we are unable to make far reaching conclusions from these regressions given the small sample size; however, if perceptions of masculinity were simply reflecting other important advocate covariates, then the coefficient estimates on masculine should be driven to zero. That this is not the case suggests that the channel of how trait judgments stemming from an extremely brief voice clip predict outcomes may not be as simple as one might expect. Likewise, our results are unlikely to be driven by any specific choice of number of ratings or survey framing. In sum, these findings are unlikely to be driven by spurious correlations or measurement error and provide further credence to the notion that snap judgments that stem from even 3-second voice samples can influence listeners beliefs about those they face and subsequent actions. Finally, it is worth noting that only about 15 percent of the advocates who argued in the Supreme Court during the time period of our study were female. The gender-specificity of our findings is a question that warrants further investigation, especially since studies on voice-based social biases observed significant differences in how listeners react to voices of different perceived gender [47]. However, due to the lack of statistical power, we leave this question for future studies with an expanded female advocate dataset. Relatedly, we explored whether perceptions differ by gender of survey participant and whether such differences could affect how the perceived attributes of male advocates predict case outcomes. While we found some differences in ratings (most notably, female participants, more than male, perceive masculine advocates as more intelligent), we did not find these to play a role in our key finding on the relationship between voice-based perceptions of masculinity and outcomes in the Supreme Court.

Discussion To the best of our knowledge, this is the first study documenting an association between voice-based impressionistic judgments and judicial decisions. To benchmark our findings, the 2 percentage point difference in court outcomes attributed to one standard deviation change in perceived masculinity is equivalent to more than 1/2 of the gender gap (i.e., in our sample, male lawyers are 3.7 percentage points more likely to win a court case than female lawyers). These associations are comparable to effects of other external factors that have been shown to influence judicial behavior. For example, asylum judges are 2 percentage points more likely to deny asylum to refugees if their previous decision granted asylum [48]. Likewise, asylum judges are roughly 2 percentage points more likely to grant asylum on the day after a home-city Sunday football game win instead of a loss [49]. In a similar vein, U.S. District judges are a 0.3 percentage point less likely to assign any prison length in criminal sentencing cases after a home-city football game win instead of a loss [49]. More generally, judges’ demographic background characteristics, such as gender, race, and in particular, party of appointing president [50, 51], especially before elections [52], have all been shown to correlate with their decision-making over a range of legal issues. Our findings echo earlier research documenting associations between voice-based personality judgments and human behavior. For instance, previous studies have found vocal attractiveness to be an important social evaluation linked to mate selection and sexual behavior [35] and masculine voices to be linked to dominance [31] and men’s threat potential in forager and industrial societies [53]. This type of association extends beyond evolutionary implications and may affect immediate real world consequences. Perceived intelligence, for example, has been found to affect an individual’s employability [28]. Landlords are found to discriminate against prospective tenants on the basis of the sound of their voice during telephone conversations [11]. Perceived task-ability, dominance, and sociability are found to show the strongest correlation with perceived influence in simulated juries [12]. Thus, the association between voice-based personality and court outcomes observed in this study further strengthens the importance of understanding how (and why) voice-based judgments influence human behavior. To be sure, what is still in need of further exploration is the specific nature of the association between voice judgments and court outcomes. That is, why are court outcomes correlated with perceived masculinity but not other attributes? It is worth noting that the focus on language and gender in the court room is not new. However, previous studies have focused primarily on the gendered language performance of witnesses [54] or the discursive practices in the courtroom [55]. To the best of our knowledge, no studies have focused on vocal characteristics of the lawyers per se. More specifically, given that the attributes are positively correlated with each other, the fact that only perceived masculinity is found to correlate with court outcomes suggests that masculinity captures particular variance that is not captured by the other ratings. In a similar study where subjects were presented with faces of electoral candidates and were asked to rate the candidates’ perceived attributes, such as competence, intelligence, leadership, honesty, trustworthiness, charisma, and likability of candidates [33], only perceptions of competence predicted election outcomes. Our findings are similar in that, while perceived masculinity correlated with judgments of other voice attributes, perceived masculinity is the only one that predicts court outcomes in a consistent and robust manner. Concerning the nature of the perceived attribute itself, masculinity is a quality or set of practices that is stereotypically, though not exclusively, connected with men. Women may engage in masculine practices equally as much, although such practices are either not noticed or censured [56]. The performative nature of “masculinity” made possible the existence of non-masculine men and masculine women [56–59]. Different cultures may also construct different notions of masculinity. These differences are reflected in the stereotypical ways of talking and thinking about men and masculinities. In the US, there are four main cultural discourses of masculinity [56]: gender difference, which pertains to categorical difference in biology and behavior between men and women; heterosexism, which sees being masculine as to sexually desire women and not men; dominance, which links masculinity with notions of authority or power; and male solidarity, which assumes as given a bond among men. In the present context, the fact that court outcomes are negatively associated with masculinity points to a possible connection with the discourse of dominance. That is, lawyers who are perceived to be more masculine might be construed as being more dominant and authoritative. To what extent these constructs, as distinct from perceived confidence and perceived aggressiveness, play a role in the decision process as judges deliberate court decisions will have to be explored further in future work. This work only establishes an association and does not attempt to advocate a particular causal relationship between these variables. To be sure, gendered differentiation of masculine and feminine language has been argued to have different evolutionary basis [60]. Males are seen as being selected to be aggressive and dominant, but this selective pressure might be a double-edged sword since aggressive and dominant behaviors would lead to lethal confrontation. In the present context, the dominant and aggressive stands of masculine-sounding lawyers might have invited an adverse response from the Court. Given our research design, our findings do not allow us to conclude if the Justices were engaging in some form of linguistic profiling in making their judicial decision per se. Do lawyers change their voices across oral arguments in a manner predicted by case characteristics? Do law firms engage in some form of linguistic profiling in choosing their oral advocates? Further investigation should yield fruitful insights into the mechanisms underlying the associations between voice-based masculinity and court outcomes. In sum, our results contribute to a growing literature on the relevance of extraneous factors in courtrooms. That is, although judicial behavior is widely assumed to be governed by legal doctrine [61], where judges are strictly hewing to legal doctrine and court precedent in making their decisions, the judge’s decision can be affected by the judge’s policy preferences [62], self-interest [63], and in the present case, potential voice-based snap judgments regarding lawyer personality. Future studies will hopefully elucidate the mechanisms behind these extraneous factors in the courtroom.

Supporting Information S1 Fig. Correlation in Ratings across Survey Designs (collapsed). This figure plots the mean untransformed rating for each of the 60 audio clips selected from our sample for further robustness checks. The x-axis reflects mean ratings obtained from participants in our main survey who were asked to rate each advocate on the full set of attributes, whereas the y-axis reflects the mean ratings obtained from participants in an alternative survey who were randomly assigned to rate each advocate on only one attribute at a time. https://doi.org/10.1371/journal.pone.0164324.s001 (TIFF) S2 Fig. First Screenshot of Survey. https://doi.org/10.1371/journal.pone.0164324.s002 (TIF) S3 Fig. Second Screenshot of Survey. https://doi.org/10.1371/journal.pone.0164324.s003 (TIF) S4 Fig. Third Screenshot of Survey. https://doi.org/10.1371/journal.pone.0164324.s004 (TIF) S1 Table. Participant Ratings Consistency (N = 748). This table presents descriptive statistics of a measure of consistency in participant ratings using data on the random set of 6 audio clips that were duplicated for each participant. For each participant, the consistency measure is defined as the averge absolute difference in ratings of a given attribute between the duplicate clips: . https://doi.org/10.1371/journal.pone.0164324.s005 (PDF) S2 Table. Robustness Checks on Sample of 60 Clips. This table presents coefficient estimates from OLS regressions using data on a select sample of Supreme Court oral arguments made by male advocates. The dependent variable is an indicator for whether the advocate won the case or not. Independent variables are voice-based ratings of advocate attributes normalized by survey participant. Column 1 reports basline regression results, column 2 reports results from a specification that includes lawyer biographical controls: age, number of clerkships, and dummies for whether the advocate attended an elite law school, has a second graduate degree, served on law review or as a Supreme Court clerk. Columns 3-4 compare regression results using alternative survey designs to the baseline results presented in column 1. Column 3 presents results from a survey of approximately 200 participants rating the set of 60 audio clips, and column 4 presents results using ratings obtained from a survey that randomly assigned only one attribute to each audio clip. a ratings of educatedness were included instead of aggressiveness in columns 3-4; b ratings of age were included instead of intelligence in column 4; †, *, and ** indicate significance at the 10 percent, 5 percent, and 1 percent levels, respectively. https://doi.org/10.1371/journal.pone.0164324.s006 (PDF)

Acknowledgments We thank Michael Boutros, Katie Franich, Dennis Luo, Betsy Pillion, and Jacob Phillips for invaluable research assistance. We thank participants at the 14th Conference on Laboratory Phonology, University of Toronto, the annual meeting of the Linguistics Society of America, and Canadian Economic Association. The authors are listed in alphabetical order.

Author Contributions Conceptualization: DC YH AY. Data curation: DC YH AY. Formal analysis: DC YH AY. Funding acquisition: DC YH AY. Investigation: DC YH AY. Methodology: DC YH AY. Project administration: DC YH AY. Resources: DC YH AY. Supervision: DC YH AY. Visualization: DC YH AY. Writing – original draft: DC YH AY. Writing – review & editing: DC YH AY.