Police officers speak significantly less respectfully to black than to white community members in everyday traffic stops, even after controlling for officer race, infraction severity, stop location, and stop outcome. This paper presents a systematic analysis of officer body-worn camera footage, using computational linguistic techniques to automatically measure the respect level that officers display to community members. This work demonstrates that body camera footage can be used as a rich source of data rather than merely archival evidence, and paves the way for developing powerful language-based tools for studying and potentially improving police–community relations.

Using footage from body-worn cameras, we analyze the respectfulness of police officer language toward white and black community members during routine traffic stops. We develop computational linguistic methods that extract levels of respect automatically from transcripts, informed by a thin-slicing study of participant ratings of officer utterances. We find that officers speak with consistently less respect toward black versus white community members, even after controlling for the race of the officer, the severity of the infraction, the location of the stop, and the outcome of the stop. Such disparities in common, everyday interactions between police and the communities they serve have important implications for procedural justice and the building of police–community trust.

Over the last several years, our nation has been rocked by an onslaught of incidents captured on video involving police officers’ use of force with black suspects. The images from these cases are disturbing, both exposing and igniting police–community conflict all over the country: in New York, Missouri, Ohio, South Carolina, Maryland, Illinois, Wisconsin, Louisiana, Oklahoma, and North Carolina. These images have renewed conversations about modern-day race relations and have led many to question how far we have come (1). In an effort to increase accountability and transparency, law enforcement agencies are adopting body-worn cameras at an extremely rapid pace (2, 3).

Despite the rapid proliferation of body-worn cameras, no law enforcement agency has systematically analyzed the massive amounts of footage these cameras produce. Instead, the public and agencies alike tend to focus on the fraction of videos involving high-profile incidents, using footage as evidence of innocence or guilt in individual encounters.

Left unexamined are the common, everyday interactions between the police and the communities they serve. By best estimates, more than one quarter of the public (ages 16 y and over) comes into contact with the police during the course of a year, most frequently as the result of a police-initiated traffic stop (4, 5). Here, we examine body-worn camera footage of routine traffic stops in the large, racially diverse city of Oakland, CA.

Routine traffic stops are not only common, they are consequential, each an opportunity to build or erode public trust in the police. Being treated with respect builds trust in the fairness of an officer’s behavior, whereas rude or disrespectful treatment can erode trust (6, 7). Moreover, a person’s experiences of respect or disrespect in personal interactions with police officers play a central role in their judgments of how procedurally fair the police are as an institution, as well as their willingness to support or cooperate with the police (8, 9).

Blacks report more negative experiences in their interactions with the police than other groups (10). Across numerous studies, for example, blacks report being treated less fairly and respectfully in their contacts with the police than whites (6, 11). Indeed, some have argued that racial disparities in perceived treatment during routine encounters help fuel the mistrust of police in the controversial officer-involved shootings that have received such great attention. However, do officers treat white community members with a greater degree of respect than they afford to blacks?

We address this question by analyzing officers’ language during vehicle stops of white and black community members. Although many factors may shape these interactions, an officer’s words are undoubtedly critical: Through them, the officer can communicate respect and understanding of a citizen’s perspective, or contempt and disregard for their voice. Furthermore, the language of those in positions of institutional power (police officers, judges, work superiors) has greater influence over the course of the interaction than the language used by those with less power (12⇓⇓⇓–16). Measuring officer language thus provides a quantitative lens on one key aspect of the quality or tone of police–community interactions, and offers new opportunities for advancing police training.

Previous research on police–community interactions has relied on citizens’ recollection of past interactions (10) or researcher observation of officer behavior (17⇓⇓–20) to assess procedural fairness. Although these methods are invaluable, they offer an indirect view of officer behavior and are limited to a small number of interactions. Furthermore, the very presence of researchers may influence the police behavior those researchers seek to measure (21).

In study 1, human participants rated officer utterances on several overlapping dimensions of respect. With a high degree of agreement, participants inferred these dimensions from officer language. Even though they were not told the race of the stopped driver, participants judged officer language directed toward black motorists to be less respectful than language directed toward whites. In study 2, we build statistical models capable of predicting aspects of respect based on linguistic features derived from theories of politeness, power, and social distance. We discuss the linguistic features that contribute to each model, finding that particular forms of politeness are implicated in perceptions of respect. In study 3, we apply these models to all vehicle stop interactions between officers of the Oakland Police Department and black/white community members during the month of April 2014. We find strong evidence that utterances spoken to white community members are consistently more respectful, even after controlling for contextual factors such as the severity of the offense or the outcome of the stop.

Data

Our dataset consists of transcribed body camera footage from vehicle stops of white and black community members conducted by the Oakland Police Department during the month of April 2014. We examined 981 stops of black (N = 682) and white (N = 299) drivers from this period, 68.1% of the 1,440 stops of white and black drivers in this period. These 981 stops were conducted by 245 different officers (see SI Appendix, Data Sampling Process for inclusion criteria). Per Oakland Police Department policy, officers turn on their cameras before making contact with the driver and record for the duration of the stop. From the 183 h of footage in these interactions, we obtain 36,738 usable officer utterances for our analysis.

Study 1: Perceptions of Officer Treatment from Language. We first test whether human raters can reliably judge respect from officers’ language, and whether these judgments reveal differences in officer respect toward black versus white community members. Respect is a complex and gradient perception, incorporating elements of a number of correlated constructs like friendliness and formality. Therefore, in this study, we ask participants to rate transcribed utterances spoken by officers along five conceptually overlapping folk notions related to respect and officer treatment. We randomly sampled 414 unique officer utterances (1.1% of all usable utterances in the dataset) directed toward black (N = 312) or white (N = 102) community members. On each trial, participants viewed the text of an officer utterance, along with the driver’s utterance that immediately preceded it. All proper names and places were anonymized, and participants were not told the race or gender of the driver. Participants indicated on four-point Likert scales how respectful, polite, friendly, formal, and impartial the officer was in each exchange. Each utterance was rated by at least 10 participants. Could participants reliably glean these qualities from such brief exchanges? Previous work has demonstrated that different perceivers can arrive at similar judgments from “thin slices” of behavior (22). In a similar vein, participants showed consistency in their perceptions of officer language, with reliability for each item ranging from moderate (Cronbach’s α = 0.73) to high ( α = 0.91) agreement (see SI Appendix, Annotator Agreement). These results demonstrate that transcribed language provides a sufficient and consensual signal of officer communication, enough to gain a picture of the dynamics of an interaction at a given point in time. To test whether participant ratings uncovered racial group differences, we averaged scores across raters to calculate a single rating on each dimension for each utterance, then built a linear mixed-effects regression model to estimate the fixed effect of community member race across interactions, controlling for variance of a random effect at the interaction level. Officer utterances directed toward black drivers were perceived as less respectful [b = −0.23, 95% confidence interval (−0.34, −0.11)], polite [b = −0.23 (−0.35, −0.12)], friendly [b = −0.24 (−0.36, −0.12)], formal [b = −0.16 (−0.30, −0.03)], and impartial [b = −0.26 (−0.39, −0.12)] than language directed toward white drivers (Fig. 1). These differences persisted even when controlling for the age and sex of the driver (see SI Appendix, Model Outputs for Each Rated Dimension). Fig. 1. (Left) Differences in raw participant ratings between interactions with black and white community members. (Right) When collapsed to two uncorrelated components, Respect and Formality, we find a significant difference for Respect but none for Formality. Error bars represent 95% confidence intervals. PC, principal component. Given the expected conceptual overlap in the five perceptual categories we presented to the participants, we used principal component analysis to decompose the ratings into their underlying components. Two principal components explained 93.2% of the variance in the data (see SI Appendix, Principal Component Analysis (PCA) Loadings for loadings). The first component, explaining 71.3% of the variance and composed of positive loadings on the impartial, respectful, friendly, and polite dimensions with some loading on the formal dimension, we characterize as Respect, broadly construed. The second, explaining 21.9% of the variance and composed primarily of a very high positive loading on the formal dimension and a weak negative loading on the friendly dimension, we characterize as Formality. This component captures formality as distinct from respect more generally, and is likely related to social distance. Standardizing these factor scores as outcome variables in mixed-effects models, we find that officers were equal in Formality with white and black drivers [ β = −0.01 (−0.19, 0.16)], but higher in Respect with white drivers [ β = 0.17 (0.00, 0.33)] (Fig. 1). Study 1 demonstrates that key features of police treatment can be reliably gleaned from officer speech. Participant ratings from thin slices of police–community interactions reveal racial disparities in how respectful, impartial, polite, friendly, and formal officers’ language to community members was perceived. Such differences were driven by differences in the Respect officers communicated toward drivers rather than the Formality with which officers addressed them.

Study 2: Linguistic Correlates of Respect. The methods of study 1 (human coding of 414 individual utterances), although effective at discovering racial disparities in officer respect toward community members in our dataset, cannot offer a general solution to the analysis of body camera data. One problem is scale: Each year, on the order of 26 million vehicle stops are made (5). Furthermore, using only a small sample of individual utterances makes it impossible to study how police treatment varies over officers, or how the interaction progresses across time in each stop. In this study, we therefore develop computational linguistic models of respect and formality and tune them on the 414 individual utterances; in study 3, we apply these models to our full dataset of 36,738 utterances. Our method is based on linguistic theories of respect that model how speakers use respectful language (apologizing, giving agency, softening of commands, etc.) to mitigate “face-threatening acts.” We use computational linguistic methods (e.g., refs. 23⇓⇓–26) to extract features of the language of each officer utterance. The log-transformed counts of these features are then used as independent variables in two linear regression models predicting the perceptual ratings of Respect and Formality from study 1. Our model-assigned ratings agree with the average human from study 1 about as well as humans agree with each other. Our model for Respect obtains an adjusted R2 of 0.258 on the perceptual ratings obtained in study 1, and a root-mean-square error (RMSE) of 0.840, compared with an RMSE of 0.842 for the average rater relative to other raters. Our model for Formality obtains an adjusted R2 of 0.190, and an RMSE of 0.882 compared with 0.764 for the average rater (see SI Appendix, Model Comparison to Annotators for more details on how these values were calculated). These results indicate that, despite the sophisticated social and psychological cues participants are likely drawing upon in rating officers’ utterances, a constrained set of objectively measurable linguistic features can explain a meaningful portion of the variance in these ratings. Fig. 2 lists the linguistic features that received significant weights in our model of Respect (arranged by their model coefficients). For example, apologizing, gratitude, and expressions of concern for citizen safety are all associated with respect. The bars on the right show the log-odds of the relative proportion of interactions in our dataset taken up by each feature, where negative numbers mean that a feature comprised a larger proportion of officers’ speech in interactions with black community members and positive numbers mean the same for interactions with white community members. Example utterances containing instances of the highest-weighted features for the Respect model are shown in Fig. 3. See SI Appendix, Study 2 for full regression outputs and more detailed discussion of particular linguistic findings. Fig. 2. (Left) Respect weights assigned by final model to linguistic features and (Right) the corresponding log-odds of those features occurring in officer speech directed toward black versus white community members, calculated using Fisher’s exact test. †P < 0.1; ∗P < 0.05; ∗∗P < 0.01; ∗∗∗P < 0.001. Fig. 3. Sample sentences with automatically generated Respect scores. Features in blue have positive coefficients in the model and connote respect, such as offering reassurance (“no problem”) or mentioning community member well-being (“drive safe”). Features in red have negative coefficients in the model and connote disrespect, like informal titles (“my man”), or disfluencies (“that- that’s”).

Study 3: Racial Disparities in Respect. Having demonstrated that people can reliably infer features of procedural justice from officer speech (study 1), and that these ratings can be reliably predicted from statistical models of linguistic features (study 2), we are now able to address our central question: Controlling for contextual factors of the interaction, is officers’ language more respectful when speaking to white as opposed to black community members? We apply our models from study 2 to the entire corpus of transcribed interactions to generate predicted scores for Respect and Formality for each of the 36,738 utterances in our dataset. We then build linear mixed-effects models for Respect and Formality over these utterances. We include, as covariates in our primary model, community member race, age, and gender; officer race; whether a search was conducted; and the result of the stop (warning, citation, or arrest). We include random intercepts for interactions nested within officers. Controlling for these contextual factors, utterances spoken by officers to white community members score higher in Respect [ β = 0.05 (0.03, 0.08)]. Officer utterances were also higher in Respect when spoken to older [ β = 0.07 (0.05, 0.09)] community members and when a citation was issued [ β = 0.04 (0.02, 0.06)]; Respect was lower in stops where a search was conducted [ β = −0.08 (−0.11, −0.05)]. Officer race did not contribute a significant effect. Furthermore, in an additional model on 965 stops for which geographic information was available, neither the crime rate nor density of businesses in the area of the stop were significant, although a higher crime rate was indicative of increased Formality [ β = 0.03 (0.01, 0.05)]. One might consider the hypothesis that officers were less respectful when pulling over community members for more severe offenses. We tested this by running another model on a subset of 869 interactions for which we obtained ratings of offense severity on a four-point Likert scale from Oakland Police Department officers, including these ratings as a covariate in addition to those mentioned above. We found that the offense severity was not predictive of officer respect levels, and did not substantially change the results described above. To consider whether this disparity persists in the most “everyday” interactions, we also reran our analyses on the subset of interactions that did not involve arrests or searches (N = 781), and found the results from our earlier models were fundamentally unchanged. Full regression tables for all models described above are given in SI Appendix, Study 3. Another hypothesis is that the racial disparities might have been caused by officers being more formal to white community members, and more informal or colloquial to black community members. However, we found that race was not associated with the formality of officers’ utterances. Instead, utterances were higher in Formality in interactions with older [ β = 0.05 (0.03, 0.07)] and female [ β = 0.02 (0.00, 0.04)] community members. Are the racial disparities in the respectfulness of officer speech we observe driven by a small number of officers? We calculated the officer-level difference between white and black stops for every officer (N = 90) in the dataset who had interactions with both blacks and whites (Fig. 4). We find a roughly normal distribution of these deltas for officers of all races. This contrasts with the case of stop-and-frisk, where individual outlier officers account for a substantial proportion of racial disparities (27); the disparities we observe here cannot be explained by a small number of extreme officers. Fig. 4. Kernel density estimate of individual officer-level differences in Respect when talking to white as opposed to black community members, for the 90 officers in our dataset who have interactions with both blacks and whites. More positive numbers on the x axis represent a greater positive shift in Respect toward white community members. Because our model is able to generate scores across all utterances in our dataset, we can also consider aspects of the trajectory of interactions beyond the mean level of respect (Fig. 5). Growth-curve analyses revealed that officers spoke with greater Respect [ b = 0.35 (0.29, 0.40)] and reduced Formality [ b = −0.57 (−0.62, −0.53)] as interactions progressed. However, these trajectories varied by community member race: Although stops of white and black drivers converged in the Formality expressed during the interaction [ b = −0.09 (−0.13, −0.05)], the gap in Respect increased over time [ b = 0.10 (0.05, 0.15)]. That is, officer Respect increased more quickly in interactions with white drivers [ b = 0.45 (0.38, 0.54)] than in interactions with black drivers [ b = 0.24 (0.19, 0.29)]. Fig. 5. Loess-smoothed estimates of the (Left) Respect and (Right) Formality of officers’ utterances relative to the point in an interaction at which they occur. Respect tends to start low and increase over an interaction, whereas the opposite is true for Formality. The race discrepancy in Respect is consistent throughout the interactions in our dataset.