Significance Claims about how reported emotional experiences are geometrically organized within a semantic space have shaped the study of emotion. Using statistical methods to analyze reports of emotional states elicited by 2,185 emotionally evocative short videos with richly varying situational content, we uncovered 27 varieties of reported emotional experience. Reported experience is better captured by categories such as “amusement” than by ratings of widely measured affective dimensions such as valence and arousal. Although categories are found to organize dimensional appraisals in a coherent and powerful fashion, many categories are linked by smooth gradients, contrary to discrete theories. Our results comprise an approximation of a geometric structure of reported emotional experience.

Abstract Emotions are centered in subjective experiences that people represent, in part, with hundreds, if not thousands, of semantic terms. Claims about the distribution of reported emotional states and the boundaries between emotion categories—that is, the geometric organization of the semantic space of emotion—have sparked intense debate. Here we introduce a conceptual framework to analyze reported emotional states elicited by 2,185 short videos, examining the richest array of reported emotional experiences studied to date and the extent to which reported experiences of emotion are structured by discrete and dimensional geometries. Across self-report methods, we find that the videos reliably elicit 27 distinct varieties of reported emotional experience. Further analyses revealed that categorical labels such as amusement better capture reports of subjective experience than commonly measured affective dimensions (e.g., valence and arousal). Although reported emotional experiences are represented within a semantic space best captured by categorical labels, the boundaries between categories of emotion are fuzzy rather than discrete. By analyzing the distribution of reported emotional states we uncover gradients of emotion—from anxiety to fear to horror to disgust, calmness to aesthetic appreciation to awe, and others—that correspond to smooth variation in affective dimensions such as valence and dominance. Reported emotional states occupy a complex, high-dimensional categorical space. In addition, our library of videos and an interactive map of the emotional states they elicit (https://s3-us-west-1.amazonaws.com/emogifs/map.html) are made available to advance the science of emotion.

Central to the science of emotion is the principle that emotions are centered in subjective experiences that people represent with language (1⇓⇓⇓⇓⇓⇓⇓⇓–10). People represent their transient experiences within a semantic space that includes hundreds, if not thousands, of semantic terms that refer to a rich variety of emotional states (11⇓–13) most readily characterized by the types of situations in which they occur (14, 15). Given that experience is often considered the sine qua non of emotion (1⇓⇓⇓⇓⇓⇓⇓⇓–10), the understanding of the semantic space of reported emotional experiences is crucial to progress in characterizing emotion-related cognition, signaling, and physiology (16), as well individual differences in emotion (17).

One line of theorizing has documented the underlying dimensions of the semantic space of reported emotional experience, focusing on the core affective states that make certain experiences feel emotional (18, 19). Efforts to identify a finite set of axes central to reported experiences of emotion have most consistently yielded two affective dimensions—valence and arousal—that are posited to lie at the core of all affective experiences, from more diffuse moods to specific emotions. These dimensions are thought to describe raw, disconnected feelings as opposed to emotions felt toward specific objects or situations (14, 18, 20, 21). To account for the occurrence of specific emotions, a related line of inquiry has documented how other, more context-directed affective dimensions such as dominance, certainty, agency, effort, and attention differentiate reports of emotional experiences of similar valence and arousal, such as anger and fear, or hope and pride (1, 14, 19, 22⇓–24). Varying combinations of such dimensions have been the focus of hundreds of studies linking reported emotional experience to behavior, physiology, and brain activity (25⇓⇓⇓⇓⇓⇓⇓⇓⇓⇓–36).

A second approach to emotional experience details how specific emotion categories, such as awe, fear, and envy, describe discrete clusters of states within a presupposed semantic space. More precisely, basic emotion theories posit that a limited number of clusters, ranging in theoretical accounts from 6 to 15, describe the distribution of all emotional states (16, 37, 38). A cluster, or emotion family, may go by a prototypical label, such as “anger,” and contain closely related states such as irritation, frustration, and rage (39) that occur in similar situations (14). As with affective dimensions, such emotion families, discretely partitioned into categories, have been the focus of hundreds of empirical studies (16, 25, 27⇓–29, 32, 35, 40⇓⇓⇓⇓⇓⇓⇓⇓–49). Clearly, claims that specific affective dimensions and emotion categories capture how people report on their emotional experience—and, by implication, other emotion-related processes—have shaped the study of emotion.

Despite the pervasive influence of these theoretical approaches, empirical progress in understanding how reported emotional experiences are organized within a semantic space has been modest. Statistical approaches to testing these theoretical claims have been unable to openly explore how reported emotional experiences are organized within a more general topological space that could simultaneously involve both distinct clusters and gradients of relatedness in response to varied situations. As a result, little is known about both the boundaries between emotion categories and their arrangement within a larger semantic space of emotion, despite these issues’ being controversial areas of contrasting claims (2, 3, 20, 37, 50). Further, the states that have been studied have been limited in scope, often encompassing only 6 to 12 categories (46) (although see ref. 24), in large part for methodological reasons. Widely used self-report measures, such as the Positive and Negative Affect Schedule, capture 10 to 12 emotions (51⇓–53). The same is true of widely used affect-eliciting stimuli, such as the International Affective Picture System (54), and the Gross and Levenson films (55). As a result, the array of emotional states captured in past studies is too narrow to generalize, a priori, to the rich variety of emotional experiences that people deem distinct, including the expanding array of emotion categories discovered to correspond to distinct behaviors (16, 56). Also, many claims are founded upon studies that apply multivariate techniques such as factor analysis to self-reports of contemporaneous or recalled emotional experiences (24, 53, 57). Such studies have accounted for correlations between items (e.g., between valence and awe) but not the degree to which people agree on individual items that may be independent (e.g., reliable judgments of awe that do not simply reflect positive valence). Further, past multivariate approaches have relied on heuristic methods that do not generate P values, confidence intervals, or posterior probabilities in estimating how many dimensions are needed to account for reported emotional experiences (58).

This investigation introduces a mathematically based conceptual framework and empirical approach to characterizing the varieties of experience captured by emotional self-report, the most widely used measure of experience (1⇓⇓⇓⇓⇓⇓⇓⇓–10). By examining emotional experiences reported in response to the widest array of psychologically significant situations—powerfully evocative video clips—ever studied, we provide answers to the following questions. How many distinct varieties of emotion do people reliably report experiencing across distinct situations? Is reported emotional experience better understood in terms of categories, such as amusement and awe, or in terms of widely measured affective dimensions, such as valence and effort? Do boundaries between emotion categories such as amusement and awe correspond to discrete jumps or smooth transitions in how emotions are reported to be experienced (14, 37, 59)?

We address these conceptual issues by focusing on self-reports of emotion terms (e.g., anger or “love”), given that self-reports are currently the most accessible measure of subjective experience (1, 9). Self-reports carry information about a person’s internal state (1, 9) and correlate with other psychological processes that are thought to be emotion-related, such as expressive behavior (60⇓⇓–63). Nonetheless, it is important to note that the meaning of emotional self-report is subject to ambiguities, including basic indeterminacies of linguistic reference (64), difficulties in capturing certain subjective phenomena in words [e.g., moods, memories, inchoate physical sensations, and automatic appraisals (65)], differences in the granularity of emotional awareness and expression (66), and influences on language use of culture, gender, and social class. The subjective experience of emotion is shaped by many complex processes that self-report measures only partially capture; self-report is not a direct readout of experience.

Heedful of these considerations, in our theorizing we begin from the broad assumption that self-reported emotional experiences correspond to points within a semantic space. Such a space is characterized by its dimensionality—the number of independent directions in the space—and the distribution of all emotional experiences that people can report along these dimensions. Conceptually speaking, each dimension of this semantic space corresponds to a distinct variety of reported emotional experience. These varieties of reported experience can be combined in ways that account for both individual emotion terms and the collections of terms that comprise reported emotional experiences. In other words, every term and reported experience corresponds, mathematically, to a single point within the semantic space, determined by a linear combination of the semantic dimensions that define the space. For example, a semantic space could have a semantic dimension directly corresponding to the term “joy,” with other terms such as “excitement” and “elation” also positioned at various points along this dimension. Alternatively, the terms joy, excitement, and elation could all correspond to points obtained by applying suitable weights to other semantic dimensions, perhaps including “awe” and love. Because this analysis entails that all reported emotional experiences are linear combinations of the dimensions of a semantic space, it follows that a semantic space can be derived by applying linear dimensionality reduction techniques to self-report judgments of emotional experiences. However, accurately deriving a semantic space of reported emotional experience may require larger, more diverse samples of experiences than have been typical of past factor analytic studies of the dimensions of reported emotional experience (67).

Guided by this theorizing, we set out to use modern large-scale statistical inference methods and a large, diverse dataset to interrogate the semantic space of reported emotional experience elicited by dynamic visual stimuli. We first gathered the widest array of emotionally evocative stimuli ever studied: 2,185 short video clips depicting a range of emotional situations. The videos were gathered by querying search engines and content aggregation websites with contextual phrases targeting 34 emotion categories, such as “close call” (targeting relief) and “mushroom cloud” (targeting awe). The 34 emotion categories were derived from emotion taxonomies of prominent theorists (for a summary see ref. 16); recent studies of positive emotions such as awe, joy, love, desire, and excitement (68⇓–70); Darwin’s observations of emotional states such as admiration, adoration, and sympathy (71, 72); findings of states found to reliably occur in daily interactions, such as confusion, awkwardness, and calmness (73); and conceptualizations of nuanced differences between states such as fear, anxiety, and horror (74) (see Table S1 for a list of states and their theoretical origins). The videos on average lasted about 5 s and portrayed an exceptionally wide range of psychologically significant situations, including births and babies, weddings and proposals, suffering and death, spiders and snakes, endearing animals, whales and elephants, art and architecture, natural beauty and wonders, natural disasters, explosions and warfare, feces and vomit, physical pratfalls, sexual acts, respected and hated celebrities, nostalgic films, awkward handshakes, delicious food, dance, sports, accidents and close calls, surgeries, risky stunts, soldiers returning home, and many others.

Table S1. Category information

We presented the emotionally evocative videos from this library to participants on Amazon Mechanical Turk to obtain repeated (9–17) judgments of emotional states elicited by each video. More specifically, participants provided one of three kinds of reports of emotional experience in response to a random sampling of videos. One group offered free response interpretations of their emotional response to each of 30 videos, thus allowing us to ascertain which categories of emotion spontaneously arise in people’s relatively unconstrained reports of subjective experience (75). A second group of participants rated each of 30 videos in terms of the degree to which it made them feel the 34 emotion categories of interest, allowing us to interrogate in finer detail the structure of the reported emotional states corresponding to this set of categories. A final group of participants rated each of 12 videos they viewed in terms of its placement along 14 widely measured scales of affective dimensions, which, in varying combinations, are frequently used to measure self-reported emotional experience (24⇓⇓⇓⇓⇓⇓⇓⇓⇓⇓⇓–36, 76, 77). All told, these procedures yielded a total of 324,066 individual judgments (27,660 multiple-choice categorical judgments, 19,710 free-response judgments, and 276,696 nine-point dimensional judgments; see Materials and Methods and Tables S1 and S2 for more information on the ratings gathered).

Table S2. Affective dimension information

Results Emotion Elicitation. Critical to our conceptual endeavor is a preliminary question: Did distinct videos elicit reports of distinct emotional experiences? To test whether the videos reliably elicited reports of distinct emotional experiences, we assessed how many videos elicited significant concordance in judgment rates of each of the 34 emotion categories. By concordance, we mean multiple raters judging a given video as eliciting the same category of emotion among the 34 choices. We found that 75% of the videos elicited significant concordance for at least one category of emotion across raters [false discovery rate (FDR) <0.05], with concordance averaging 54% (chance level being 27%, obtained from simulated raters choosing randomly with the same base rates of category judgment observed in our data). Importantly, all 34 emotion categories were found to be reported at significantly above-chance rates in response to at least one video (Fig. S1A). These results show that all 34 categories of emotion are meaningful in that they are reliably reported as fitting descriptions for the experience of emotion. However, these findings also leave open the possibility that some categories are synonyms, or, more generally, that not all are linearly independent. The latter could also be the case, for example, if one category, such as joy, was equivalent to a conceptual grouping of others, such as adoration and triumph. This concern over linear dependence can be resolved by deriving principal components from the ratings—dimensions that are linearly uncorrelated but have continuous loadings for each category (78). Determining how many of these distinct dimensions were reliably rated by different observers would reveal the number of distinct emotional experiences that can be reported using the 34 categories that guided this investigation. Fig. S1. Category judgment concordance levels, dimensional judgment frequencies, and free response term use frequencies. (A) Interrater concordance levels for each video, for each category of elicited emotion. Dots represent the proportions of times the category was chosen for each video. Only videos for which the category was chosen at least once are shown. Seventy-five percent of the videos elicited significance concordance in emotional response (FDR <0.05), with every category being elicited with significant concordance by at least one video. (B) Judgment frequency for each affective dimension across all videos. Shaded plots for each affective dimension are kernel histograms of the distribution of average ratings for that affective dimension across videos. Histograms are normalized to an arbitrary height. Elicited emotional states were more widely distributed in terms of some affective dimensions, such as approach, than others, such as dominance. (C) Frequency of term use in free response judgments. The area occupied by each word is proportionate to the number of times the word was chosen. Participants used a wide variety of nuanced terms to describe the emotions elicited by each video. Evidence for 27 Distinct Varieties of Reported Emotional Experience. To examine how many semantically distinct categories structured participants’ reports of emotional experience, we devised a method called split-half canonical correlations analysis (SH-CCA). SH-CCA is a generalization of split-half reliability analysis, in which the averages obtained from half of the ratings of each video clip for a single item (e.g., awe) are correlated with the averages obtained from the other half of the ratings, across stimuli. In SH-CCA, the averages obtained from half of the ratings of all items simultaneously are compared using CCA to the averages obtained from other half of the ratings, yielding an estimate of the number of independent, reliable dimensions of variance in category judgments across raters (see Supporting Information and Fig. S2 for details of the method and its validation in 2,312 simulated studies). In other words, SH-CCA accounts for shared variance (correlations) between items, such as awe and “aesthetic appreciation,” without discarding the reliable variance in ratings of each individual item, such as the extent to which awe may differentially be evoked by some stimuli (e.g., explosions) while aesthetic appreciation may differentially be evoked by others (e.g., pastoral scenes of nature). Using SH-CCA we found that between 24 (P < 0.05) and 26 (P < 0.1) statistically significant semantic dimensions of reported emotional experience (i.e., 24–26 linear combinations of the categories) were required to explain the reliability of participants’ reports of emotional experience in response to the 2,185 videos. So far, this would suggest that the categorical ratings capture at least 24–26 semantically distinct varieties of reported emotional experience. (In fact, SH-CCA tends to produce overly conservative estimates of dimensionality; see Fig. S2.) Fig. S2. Simulation studies verifying the use of split-half CCA for categorical ratings. Simulations (n = 2,312) were conducted, generating datasets of known underlying dimensionality that could be compared with the SH-CCA estimate. General specifications: For each simulation, underlying multinomial probabilities of each category for each of 2,185 hypothetical stimuli were generated by randomly sampling stimuluswise loadings on “explainable” and individual ratingwise loadings on “unexplainable” dimensions from a uniform distribution, with each stimulus loading exclusively on one to two random explainable and two random unexplainable dimensions. Each explainable dimension in turn loaded on one to two categories at random. All but one unexplainable dimension always loaded on one category each, with the final unexplainable dimension always loading on all remaining categories. The total number of explainable and unexplainable dimensions was systematically varied from 1 to 34 for each simulation study, resulting in 34*34*2 = 2,312 simulations. Each 34-category rating of each stimulus was drawn from a multinomial distribution with probability equal to the stimulus-specific explainable dimension loadings times the explainable dimension coefficients, plus the rating-specific unexplainable loadings times the unexplainable dimension coefficients, normalized to sum to 1. Twelve ratings were sampled for each stimulus. Each rating could comprise multiple selections, with the probability of each number of selections equal to what we observe in our actual categorical judgment data (56% one category, 27% two categories, 11% three categories, 4% five categories, and 1% or less for the remainder). Low vs. high signal-to-noise (SNR) simulations: For low-SNR simulations (plotted in blue), both unexplainable dimension coefficients and explainable dimension coefficients were always 1. For high-SNR simulations (plotted in black), unexplainable dimension coefficients were always 0.05 and explainable dimension coefficients were always 1. (A) The dimensionality estimated by SH-CCA is plotted against the known dimensionality of the data. The estimates are highly accurate, typically underestimating the true dimensionality by at most 1. (B) Median SNR per category is plotted for each simulation. The median SNR we observe is consistent with SNRs observed in our low-SNR simulations with around 25 systematic dimensions. (C) Controlled vs. actual FWER across low- and high-SNR simulations. Estimates were, in general, overly conservative. Further research might examine whether the incorporation of nonlinear CCA methods more targeted to multinomial data could increase power to estimate dimensionality relative to the SH-CCA method developed here. To address concerns that forced choice methods may inflate the apparent specificity of emotion self-reports (75), we also assessed how many dimensions of variance were reliably shared between the emotion category ratings and the free response labels participants used in reporting on their experience in viewing the videos (see Fig. S1C for representation of frequency of use of free response terms). In other words, we determined how many distinct varieties of emotion captured by the categorical ratings (e.g., fear vs. horror) were also reliably associated with distinct terms in the free response task (e.g., “suspense” vs. “shock”). We did so using CCA, which finds linear combinations within each of two sets of variables that maximally correlate with each other. In this analysis, we found 27 significant linearly independent patterns of shared variance between the categorical and free response reports of emotional experience (P < 0.01), meaning people’s multiple-choice and free-response interpretations identified 27 of the same distinct varieties of emotional experience. The near convergence in the number of significant linearly independent patterns across two methods and datasets—SH-CCA within the categorical judgment ratings and CCA between the categorical and free response judgment ratings—serves as convergent validity for up to 27 semantically distinct varieties of reported emotional experience. How do the 27 distinct semantic dimensions we have documented correspond to reported emotional experiences? To extract the meaning of the 27 dimensions within the category judgments, we first used PCA to extract the 27 dimensions explaining the most variance, then applied factor rotation to their loadings on the 34 categories, as shown in Fig. 1. Factor rotation yields a set of semantic dimensions that span the same space as the principal components but are more easily interpretable in that they will each tend to load on a small number of categories. After factor rotation, many of the semantic dimensions have loadings on single categories, such as awe. In fact, there were only seven emotion categories not mapped to distinct dimensions: pride and triumph, which coloaded on experiences of admiration; contempt and disappointment, which coloaded on experiences of anger; sympathy, which coloaded on experiences of both empathic pain and sadness; and guilt and envy, which had only negligible loadings on any semantic dimensions. Essentially, these findings show that approximately 27 categories of emotion had distinct meaning in describing the reported experiences elicited by the 2,185 videos, given that each semantic dimension loaded maximally on a distinct category. Where different categories coload on the same semantic dimension they were used in an approximately linearly dependent manner, perhaps as synonyms; where categories do not have strong loadings on any semantic dimensions (e.g., envy) they were used insufficiently or not consistently enough to contribute much reliable variance. However, those loading on separate semantic dimensions—27 in total—were reliably separable in meaning with respect to the emotional states elicited by the videos. (In Fig. S3 we also repeat this analysis with 24, 25, and 26 dimensions to understand how the dimensions may have differed under stricter criteria for significance.) The 27 dimensions we derive from emotion self-report in response to short videos demonstrate a semantic space of emotions far richer in distinct varieties of reported experience than anticipated by emotion theories to date (for a summary see ref. 16). Not only do we find evidence for traditionally understudied varieties of positive emotion, such as excitement (68⇓–70), but also for differences between nuanced states relevant to more specific theoretical claims, such as the distinctions between romantic love and sexual desire (79), interest and surprise (80), horror and fear, and aesthetic appreciation or beauty and feelings of awe (81). Fig. 1. Factor analysis loadings on 27 dimensions of variance within the categorical responses. Statistical analyses revealed that categorical judgments reliably captured up to 27 separable dimensions of variance, each corresponding to a semantically distinct variety of reported emotional experience. Here, the first 27 principal components of variance within the categorical judgments, extracted using principal components analysis (PCA), have been rotated into more interpretable dimensions using varimax rotation, which finds dimensions that load on relatively few categories. Categories without maximal loadings on any dimensions (contempt, disappointment, envy, guilt, relief, sympathy, and triumph) were either not judged reliably or were taken as roughly linearly dependent with other more frequently used categories during dimensionality reduction. Categories loading on separate dimensions were reliably separable in meaning with respect to the emotional states elicited by the videos. The dimensions we derive from emotion self-report in response to short videos demonstrate a complexity of emotion structure beyond what has been proposed in most emotion theories to date, reliably differentiating emotional states as nuanced as aesthetic appreciation (i.e., feelings of beauty and awe). Fig. S3. Rotated factor weights when including 24, 25, or 26 dimensions of variance in the categorical judgments. Dropping from 27 components (Fig. 1) to 26 (Right) eliminates the separate categorical judgment dimension corresponding to the category “boredom,” which instead loads negatively on most of the 26 other categorical judgment dimensions. Dropping to 25 components (Middle) eliminates the categorical judgment dimension corresponding to anger, “contempt,” and “disappointment,” which instead load positively on the “sadness” dimension. Dropping to 24 components (Left) eliminates the categorical judgment dimension corresponding to “satisfaction,” which instead loads on the “admiration” dimension along with “pride” and “triumph.” These analyses indicate that reports of experiences of boredom, anger, and satisfaction were less reliably differentiated in responses to particular videos than experiences of other categorical judgment dimensions, such as aesthetic appreciation, awe, fear, and horror. The Distribution of Reported Emotional Experience: “Discrete” Categories Are Bridged by Continuous Gradients of Meaning. Understanding the semantic space of emotion requires examining not only the semantic dimensions of reported emotional experience—that is, what distinct varieties of emotional experience do people report?—but also the distribution of states along these dimensions. Such a line of inquiry is germane to an enduring question: How can distinct varieties of emotional experience be combined or blended together? Discrete emotion theorists predict the shape of the distribution to approximate a number of distinct clusters. Another possibility suggests that emotional states are more evenly distributed along affective dimensions such as valence and arousal (82). While it is difficult to visualize a 27-dimensional point cloud, we can use modern data visualization techniques to interrogate how emotional responses to the 2,185 videos are distributed along the 27 semantic dimensions of emotional experience documented in the previous analysis. In Fig. 2A we use a method called t-SNE (83). This method projects high-dimensional data—the 27 dimensions of reported emotional experience we have uncovered—onto two nonlinear axes, such that the local distances between data points are accurately preserved while more distinct data points are separated by longer, more approximate, distances. Fig. 2. The structure of reported emotional experience: Smooth gradients among 27 semantically distinct categorical judgment dimensions. (A) A chromatic map of average emotional responses to 2,185 videos within a 27-dimensional categorical space of reported emotional experience. t-distributed stochastic neighbor embedding (t-SNE), a data visualization method that accurately preserves local distances between data points while separating more distinct data points by longer, more approximate, distances, was applied to the loadings of the 2,185 videos on the 27 categorical judgment dimensions, generating loadings of each video on two axes. The individual videos are plotted along these axes as letters that correspond to their highest loading categorical judgment dimension (with ties broken alphabetically) and are colored using a weighted interpolation of the unique colors corresponding to each of the categorical judgment dimensions on which they loaded positively. The resulting map reveals gradients among distinct varieties of reported emotional experiences, such as the gradients from anxiety to fear to horror to disgust (also see the interactive map at https://s3-us-west-1.amazonaws.com/emogifs/map.html). (B) Number of significant coloadings of each video on each categorical judgment dimension. The significance of individual loadings of each video on each categorical judgment dimension was determined via simulation of a null distribution (Supporting Information). We then counted the number of instances in which videos loaded significantly (FDR <0.05) on pairs of two categorical judgment dimensions. These results validate the emotion gradients observed in A. For example, anxiety and fear (F and Q) were elicited by many of the same videos (75 times in total), as were fear and horror (Q and R; 55 times), yet anxiety and horror were seldom elicited by the same videos (just eight times). (C) Top free response terms associated with each categorical judgment dimension. The free response judgments were regressed onto the categorical judgment dimensions, across videos. For 22/27 dimensions, the highest loading category is among the three (out of 600) top-weighted free response terms, strongly validating the categorical ratings as measures of subjective experience. In Fig. 2A we apply t-SNE to map the 2,185 videos along all 27 semantically distinct varieties of emotional experience, resulting in a 2D space in which each video is surrounded by other videos that evoked similar reported emotional experiences. To plot the 27 varieties of emotion elicited by the videos within this 2D space we use a chromatic map, in which each video is colored uniquely according to the specific varieties of reported emotional experience that it elicited. Specifically, the letters corresponding to each video are colored using a weighted interpolation of the colors corresponding to each of the semantic dimensions on which they loaded positively. Thus, smooth gradients between these semantically distinct varieties of reported experience correspond to smooth transitions in color. This analysis reveals a complex distribution of reported emotional experiences that is neither simply clustered nor simply uniform. Inspection of Fig. 2A does reveal certain clusters of emotional experience, for example, those of craving (desire), sexual desire and romantic love, and nostalgia. At the same time, many categories of emotional experience share smooth gradients with other semantically distinct categories, forming smooth transitions between particular varieties of reported emotional experience. For example, the videos are distributed along smooth gradients from anxiety to fear to horror to disgust, calmness to aesthetic appreciation to awe, and adoration to amusement to awkwardness, among others. Adjacent semantic dimensions along these gradients, such as anxiety and fear, were elicited by an overlapping set of videos, corroborating the shape of the distribution revealed by Fig. 2A. These results reveal that the boundaries between many distinct emotion categories are fuzzy rather than discrete. In more fine-grained analyses, we find that these fuzzy boundaries are highly specific to particular pairs of distinct categories. As shown in Fig. 2B, anxiety and fear (F and Q) were elicited by many of the same videos (75 times in total), as were fear and horror (Q and R; 55 times), yet anxiety and horror were elicited by few of the same videos (just eight times). Further inspection of Fig. 2B reveals that emotion categories mapped to distant locations within the t-SNE space, such as awe and disgust, were seldom elicited by any of the same videos. These findings converge with doubts that emotion categories “cut nature at its joints” (20), but fail to support the opposite view that reported emotional experiences are defined by entirely independent dimensions (82). Based on the distribution of emotional states elicited by thousands of videos along 27 semantic dimensions, we can infer that the majority of categories of emotion share fuzzy boundaries with one or two other distinct categories, forming conceptually related chains of reported experiences, such as that from calmness to aesthetic appreciation to awe. To illustrate these findings and their conceptual implications, we provide a fully interactive version of Fig. 2A (https://s3-us-west-1.amazonaws.com/emogifs/map.html) in which each video is displayed when its position in the map is hovered over with the cursor. Inspection of this map confirms qualitatively that within the 27-dimensional semantic space of reported emotional experiences, most states occupy continuous gradients as opposed to discrete clusters. Categorical Labels Explain More Variance in Reported Emotional Experience than Proposed Affective Dimensions. We next compared the categorical structure of reported emotional experience to the information carried by the 14 affective dimension judgments—approach, arousal, attention, certainty, commitment, control, dominance, effort, fairness, identity, obstruction, safety, upswing (improvement of conditions), and valence. Some theorists have suggested that categorical representations of emotion are explained largely by position within a space formed by particular combinations of these affective dimensions (18, 24, 77, 82, 84). This claim has led to many studies measuring emotional experience using some combination of affective dimensions (25⇓⇓⇓⇓⇓⇓⇓⇓⇓–35). To test whether the categorical judgments of emotional states were a function of the affective dimension judgments, we examined whether the affective dimension judgments could better explain each of the categorical judgment dimensions, or vice versa. That is, do affective dimension judgments of the degree to which each video elicits displeasure, arousal, submissiveness, and so on, better explain a person’s reports of fear, or does labeling an experience as fear better explain the person’s reports of these well-validated affective dimensions? We compared these possibilities using cross-validated regression techniques. With both linear and nonlinear regression (Fig. 3) we found that the affective dimension judgments explained at most 61% of the explainable variance in the categorical judgment dimensions, whereas the categorical judgment dimensions consistently explained 78% of the explainable variance in the affective dimension judgments (Supporting Information). That the categorical judgment dimensions explained the affective dimension judgments substantially better than the reverse (P < 10−6, bootstrap test, both linear and nonlinear regression) suggests that the categories have more semantic value than the affective dimensions in explicating people’s reports of emotional experience elicited by short videos—that they capture most of what the 14 affective dimensions capture, and more, with respect to the emotions people reliably report experiencing in response to evocative videos. Fig. 3. Variance explained by the categorical judgments in the affective dimension judgments, and vice versa. The categorical judgment dimensions explain significantly more variance in the affective dimension judgments than vice versa (P < 10−6, bootstrap test). These findings hold when using both linear regression with ordinary least squares (Left) and nonlinear regression with k-nearest neighbors (Middle). This suggests that the categories have the most value in explicating reported emotional experiences elicited by short videos. (Explained variance was calculated using leave-one-out-cross-validation and then divided by the estimated explainable variance. For this analysis, we used nine ratings per video. For k-nearest neighbors, we tested k from 1 to 50 and show the results from choosing the optimal k, i.e., the one that resulted in the greatest average explained variance for each prediction. See also Supporting Information. Smooth Category Gradients Correspond to Smooth Differences in Reported Emotional Experience. The above analyses indicate that judgments of the affective dimensions were largely explained by the categories. How, then, do the affective dimensions vary as a function of the categorical judgments? For instance, do the affective dimensions vary smoothly across category gradients, or are there sharp jumps in affective dimensions at points where the most frequently reported category of emotion changes? The presence of sharp jumps in associated affective dimensions between neighboring categories would support basic emotion theories (85), which predict that basic emotions are associated with prototypical patterned responses (e.g., subjective, behavioral, and physiological) that are similar across all instances of a given category but different, in discrete fashion, from the patterned responses of other emotions (22, 38, 39). In light of these conceptual claims, we tested whether gradients between distinct reported emotional experiences reflected smooth or abrupt changes in associated affective dimensions. To do so, we regressed from the categorical ratings to the affective dimensions across videos that elicited each of two different emotions to a significant extent. In these analyses, we found that the smooth boundaries documented between particular categories reflected smooth variations in affective dimensions such as arousal and commitment to an individual (Fig. S4). That the affective dimensions vary smoothly across gradients between categories calls into question the notion from basic emotion theories that prototypical patterned responses are similar across all instances of a given category. While each category is associated with a specific pattern of affective dimension ratings, these ratings do not shift abruptly across categories; rather, they vary smoothly along the gradients associated with each emotion category. Fig. S4. Gradients in the categorical judgment space correspond to smooth differences in affective meaning. We further analyzed gradients among all 11 pairs of categorical judgment dimensions sharing significant coloadings on 20 or more videos (Fig. 2B). For each categorical judgment dimension pair, the affective dimension ratings were regressed onto the difference in loading between the two categorical judgment dimensions using principal components regression with four components (we opted to avoid using linear regression with all 14 dimensions on only 20–75 observations). To avoid overfitting, we first trained the regression on all videos for which only one of the two categorical judgment dimensions had a significant loading and then tested it on the videos for which both categorical judgment dimensions had significant loadings. The latter videos are plotted as data points above, with the actual differences in loading for each pair of categorical judgment dimensions on the x axis and the predicted differences based on a linear combination of affective dimensions on the y axis. y-axis labels indicate the most strongly weighted affective dimension. Dashed white lines are regression lines. Data points are colored using a weighted average of the colors corresponding to all categorical judgment dimensions on which they load positively, as in Fig. 2A. Seven out of 11 categorical gradients have significant correlations with affective dimension (FDR control at the 0.05 level is achieved when P < 0.033). The error bars represent SE in the space of affective dimensions. The homogeneity of the error bars indicates that the affective dimensions are homoscedastic, demonstrating that their smooth relationships with the categorical gradient cannot be explained by disagreement across raters at the boundary between categories. Unifying Factors of the Meaning of Reported Emotional Experiences. A central claim across emotion theories is that emotional experiences are defined by factors that reflect the coalescence of affective dimensions and categorical labels of current experiences (18, 22, 24, 86, 87). To examine the relationship between the affective dimensions and categories, we ascertained how affective dimensions covary with categorical judgment dimensions of elicited emotion (24, 84). We did so by applying CCA between the 14 affective dimensions and 27 categorical judgment dimensions associated with each video (Fig. 4). Here, CCA extracted linear combinations of affective dimension judgments that correlated maximally with linear combinations of categorical judgment dimensions. This analysis yielded 13 significant canonical variates (P < 0.01), shared dimensions of variance between participants’ self-reports of affective dimensions and categorical judgments of their reported emotional experiences (Fig. S5). Each canonical variate might be thought of as a unifying factor, or central component, of the meaning of reported emotional experiences, with both a categorical and affective dimension loading. Fig. 4. Canonical correlations analysis between the categorical judgment dimensions and the affective dimensions. (A) The first 13 canonical correlations between the categorical judgment dimensions and the affective dimensions were found to be significant (P < 0.01). We assigned labels to each canonical variate by interpreting its coefficients on the affective dimensions (see Fig. S5 for the coefficients). (B and C). Categorical variates (B) and dimensional variates (C) for the first three canonical correlations, projected as red, green, and blue color channels onto the t-SNE map from Fig. 2A. Color legends are given in the titles for each map. Similarity in colors between E and F illustrates the degree of shared information between the categorical and dimensional judgments for these three dimensions of emotional experience. Labels on each map reflect the combination of loadings on the three dimensions that give rise to each color. (D and E) Like B and C, but for the fourth through sixth canonical correlations. Altogether, B–E illustrate that the categorical gradients correspond to smooth differences in affective dimension (see also Fig. S4 for analysis of gradient smoothness). Fig. S5. Coefficients of the 13 significant category and affective dimension variates from canonical correlation analysis. See Fig. 4 for corresponding canonical correlations and loadings of the first four canonical variates on the videos. Close inspection of Fig. S5 reveals how affective dimensions relate to categories of emotional experience (for similar analysis see ref. 24). For instance, the first canonical variate corresponds almost exclusively to valence, the same initial dimension uncovered in nearly all factor analytic studies of reported emotional experience (18), and differentiates experiences of highly positive states (calmness and joy) from the most negative states (horror and empathic pain). More social affective dimensions such as feelings of dominance and approach motivation likewise drive reported emotional experience. For example, judgments of approach and dominance, variates 4 and 6 in Fig. S5, account for variation in reports of anger (76, 88, 89). Judgments of commitment account for variation in reports of emotional experiences such as sadness and adoration that are closely related to attachment processes (90). To visualize each canonical variate, we project both its categorical and affective dimension loadings onto our t-SNE map. In Fig. 4 B–E, loadings of the first six canonical variates—both categorical (B and D) and affective dimension (C and E)—are projected as colors onto the t-SNE map. Each color channel (red, green, and blue) in Fig. 4 B–E corresponds to a canonical variate. Hence, the extent of similarity in color between B and C as well as between D and E indicate the extent to which linear combinations of category judgments (B and D) are similar in meaning to particular linear combinations of affective dimensions (C and E). The maps allow us to further interpret some of the gradients observed along the 27 varieties of reported emotional experiences represented in Figs. 1 and 2A. For example, the gradient from disgust to sadness is associated with a relative increase in commitment to an individual.

Discussion Central to the science of emotion are claims about the semantic space of reported emotional experience: How many distinct varieties of emotion do people report experiencing? What are the boundaries between distinct varieties of reported emotional experience? To what extent is reported emotional experience rooted in the categorical labeling of the state, vs. judgments of proposed dimensions of affect such as valence and arousal? What are the unifying factors that relate proposed affective dimensions to emotion categorization? The findings from the present investigation emerge from a mathematical framework positing that reports of emotional experience can be characterized as points within a semantic space, distributed along semantic dimensions corresponding to distinct varieties of reported experience. To interrogate the semantic space of reported emotional experience, we had participants report on their emotional responses, using three different methods, to 2,185 emotionally evocative short videos, heedful of the ambiguities and indeterminacies of self-report. Our first finding concerns the richness of the semantic space of reported emotional experience. Using statistical methods to determine the dimensionality of reported categories of experience, we obtain evidence for up to 27 varieties of experience from the categories of emotion reliably reported in response to over 2,000 emotionally evocative short videos. Our finding that 27 distinct varieties of reported experience are reliably associated with distinct situations converges with recent developments in the emotion signaling literature suggesting that upwards of 20 states have distinct nonverbal signals (56). The space of distinct reported emotional experiences in English involves a richer variety of states than considered earlier in the field (37). By no means do we mean to claim that this is the definitive taxonomy of emotional states, for which studies of other types of stimuli, other approaches to self-report, other modalities of emotional response, and other cultures will need to be incorporated. Nevertheless, the present investigation reveals the rich varieties of reliably reported emotional experience that may shape human behavior. Our next finding concerns how reported emotional experiences are distributed in relation to one another, another matter central to theoretical debate regarding the structure of emotion. Past theorists have suggested that the distribution of emotional states is shaped in one of two ways: either that emotional states occupy a limited number of distinct clusters or emotion families (16, 37⇓–39) or that they are more evenly distributed across more independent dimensions (82). Our approach, which interrogates the distribution of elicited emotional states within a dimensional space using an open-ended statistical framework, can identify both discrete clusters and continuous gradients. While our findings suggest that there may be constraints on which varieties of emotional experiences can be reported simultaneously in response to a single stimulus, most categories of emotion share continuous gradients with at least one other category. These correspond to smooth gradients in affective meaning, as one can see in Fig. 2A, where we observe gradients linking experiences of admiration, awe, and aesthetic appreciation; anxiety, fear, horror, and disgust; and a number of other emotion categories. These findings suggest a far more complex distribution of emotional states than the clustered or more uniform distributions hinted at in discrete and dimensional theories (16, 37⇓–39, 82). These findings also raise intriguing questions warranting further research. For example, the smooth gradients of affective meaning we document may account for how people transition from one experience to the next (e.g., from admiration to awe; see ref. 91), and for mixed emotional experiences (92, 93). Finally, our findings speak to the question of how people conceptualize their emotional experiences in semantic terms. When participants were asked to judge their emotional state by choosing from a list of 34 categories or by placing their experiences along 14 different dimensional scales of affective appraisal and motivation, the categorical judgments more powerfully explained variance in the affective dimension judgments than vice versa (Fig. 3). Categorical labels organize affective dimensions in a coherent and powerful fashion. It is important to recognize that the most current constructivist and appraisal theories seldom propose that specific dimensions offer an exhaustive description of reported emotional experience (94). Nevertheless, hundreds of studies of emotion-related behavioral, cognitive, physiological, and neural effects have focused on measurements of valence, arousal, and other specific affective dimensions (24⇓⇓⇓⇓⇓⇓⇓⇓⇓⇓⇓–36, 76, 77). The present findings suggest that reported emotional experience is more precisely conceptualized in terms of categories more often put forward by discrete emotion theories (16, 37⇓–39), although, contrary to discrete theories, we find that the boundaries between emotion categories are fuzzy rather than discrete in nature. Our findings dovetail with recent findings related to the neural representation of emotional experience. Most notably, Kragel and LaBar (95) decoded emotional experiences from concomitant fMRI of the human brain, documenting that distributed patterns of brain activity distinguish among discrete emotions. Specifically, experiences of amusement, anger, contentment, fear, sadness, and surprise could be discriminated with above-chance accuracy based on patterns of brain activation, even when classifiers were trained and tested on emotional states elicited by separate modalities of stimuli (film and music). In light of these findings, the results of the present investigation raise the intriguing possibility that distinct patterns of neural activation might distinguish among a much broader array of states than have been investigated so far, such as the many positive states we have documented here (e.g., aesthetic appreciation and awe) and may reflect continuous gradients rather than discrete categories. New fMRI modeling approaches could fruitfully be combined with the emotion elicitation and multidimensional reliability estimation techniques introduced here to determine the number of distinct varieties of emotion that are represented by distinct patterns of neural activity. The present findings regarding the structure of self-reported emotional experiences may also inform recent theoretical efforts to explicate how such experiences are generated. For example, in the higher-order theory of emotional consciousness recently put forward by LeDoux and Brown (15), it is posited that emotional experiences are introspective states defined by schematic representations of psychologically significant situations, and emotion terms are symbolic representations of these states. Cast within this theorizing, the reliability that we observed across participants in their reported emotional responses to particular situations—viewing short videos—emerges from commonalities in the symbolic structures participants relied upon to label their emotional reactions to the evocative stimuli. The higher-order theory of emotional consciousness also predicts that emotional states are represented in parts of the brain responsible for higher-order cognition (15), consistent with findings by Kragel and LaBar (96) that representations of subjective emotional experiences are found in high-level brain regions (e.g., orbitofrontal cortex). Our findings raise intriguing questions about how such brain regions might encode the dozens of distinct varieties of emotion that we have uncovered. It is worth noting important limitations of the present investigation. As in so many studies of emotional experience, the conclusions we might draw depend on the degree of correspondence between self-reports and subjective experiences. As noted earlier, some aspects of emotional experience may elude self-report, and there are other potential determinants of self-report besides emotional experience. On this latter point, we note that reported emotional experience may reflect a combination of three conceptually distinct phenomena: (i) emotional experience itself; (ii) cognitive and perceptual experiences that may not in their own right be considered emotional, but may nevertheless color how an emotional experience is labeled (18); or (iii) perception of affective quality, that is, the emotional experience that a situation could potentially cause or should cause according to cultural norms of emotional experience (18, 97). Future research will need to systematically examine such processes, to the extent possible, to further characterize the semantic space of emotional experience. It is also important to mention that we have focused on commonalities in the emotions people report experiencing in each situation; individuals also differ, often in striking fashion, in their reports of emotional responses to a given situation (98, 99). However, the results of an additional analysis we performed suggest that differences such as gender, age, social class, and personality factors explained at most a small proportion of the variance in reported emotional experience compared with commonalities across participants (Fig. S6). Nevertheless, it will be crucial for future studies to examine how such culture-related sources of variation in emotion self-report shape the structure of semantic spaces of emotional experience. Fig. S6. Individual differences in demographics and personality explain a relatively small proportion of the variance in reported categories of emotional experience. Demographic and personality information was collected from each rater in a separate survey submitted before each rating survey. The demographics and personality survey included self-reported years of age, gender, marital status, 11 levels of education ranging from “None” to “Doctorate,” fiscal and social conservatism (1-to-7 scales), religiousness (1-to-7 scale), the MacArthur Scale of Subjective Social Status, the Short Dark Triad of Personality, the Ten Item Personality Measure (TIPI), two questions on trait anxiety from the State-Trait Anxiety Inventory, and two questions on subjective wellbeing from the Satisfaction with Life Scale. Using each of 19 composite items from this survey (e.g., the final measures of each of the Big Five personality traits from the TIPI), we performed a median split, assigning each individual response to the categorical judgment survey to one of two separate datasets. We then correlated the mean responses to each video across datasets, resulting in a “median-split correlation.” If the median-split correlation is low, then we can infer that people low vs. high in the splitting variable, for example extroversion, respond differently to the videos. Otherwise, we can infer that they responded similarly. To test whether the median-split correlation for each variable was significantly lower than would be expected by chance, we performed a separate permutation test for each splitting variable, randomly assigning equivalent numbers of raters of each video to one of two datasets. We repeated this permutation test 1,000 times for each variable. The resulting median-split correlations are displayed as black dots on the bar graph. For the most part, the split-half correlations are close to chance levels—the correlations obtained by splitting participants at random, plotted as the black dots—indicating that people of different genders, education levels, political views, and personalities responded very similarly to the videos. Interestingly, the only variable for which a median split explained a significant amount of variance in the categorical judgment was self-reported religiousness (FDR <0.01). However, even religiousness explained a relatively small proportion of the variance in the categorical judgments, with a median-split correlation only 5.5% lower than would be expected by chance (i.e., 5.5% of the variance explained by the videos presented). (The lower chance level of variance explained when religiousness was used as a splitting variable reflects the imbalance between the two resulting samples. A relatively small proportion of ratings, 31% overall, were submitted by raters who chose anything greater than the median of 1, or “Not at all religious.”) Granting these limitations and caveats, our results reveal how emotion concepts are reliably structured in their association with distinct situations. These findings have generative implications for studies that relate reported experience to behavior, physiology, and individual differences (14). With respect to neurophysiology, for example, hundreds of studies of the brain regions activated during reported emotional experiences have focused on valence and arousal or the six basic emotions (25, 26, 28, 29, 34, 35, 42, 44, 47⇓–49), leaving out the many other varieties of reported emotional experience that we find reliably occur in distinct situations and that could potentially be represented in distinct brain activity patterns. Questions about the structure of reported emotional experiences are foundational to the science of emotion. Answers to such questions bear upon the most central theoretical claims in the field. Our conceptualization of how emotional self-reports are situated within a semantic space and our geometric analytic techniques have yielded more nuanced, complex answers than is typical in the theorizing that has sparked such intense debate. Reported emotional experiences inhabit at least 27 dimensions associated with reliably distinct situations and are distributed along continuous gradients between particular emotion categories within this space. With analytic methods, and by studying the widest array of emotions and elicitors to date, we have uncovered an approximation of a geometric structure of reported emotional experience. It will be important to extend these methods and findings to studies of other emotion elicitors, such as music, daily activities, and social interactions. It will also be critical to ascertain geometric structures of reported emotional experience within other cultures and their languages, given that here we have only studied emotional experience reported by US participants using English emotion concepts (100). The methods developed here could be fruitfully applied to studies of emotion-related peripheral physiological response, central nervous system response, and nonverbal expression, once again to shift toward an understanding of how emotions are how emotional states are arranged within a geometric space.

Materials and Methods Emotion judgments of the videos were obtained using Amazon Mechanical Turk. A total of 853 English-speaking US participants took part in the study (403 females, mean age = 36 y). The experimental procedures were approved by the Institutional Review Board at the University of California, Berkeley. All participants gave their informed consent. See Supporting Information for details. The 2,185 videos and their mean ratings can be requested here: https://goo.gl/forms/XErJw9sBeyuOyp5Q2. Please exercise discretion in viewing the videos, many of which contain highly graphic violence, nudity, and/or sexual content. Videos with highly graphic content are blurred in the chromatic map linked elsewhere in the paper (https://s3-us-west-1.amazonaws.com/emogifs/map.html). However, an uncensored chromatic map is also available to readers of age 18+ by replacing the word “map” with the word “uncensored” in the previous URL (although please exercise careful discretion in viewing the uncensored map, which, again, contains extremely graphic content). In both maps, floating over the number corresponding to each video for an extended period will reveal the video’s unique numeric tag, which, followed by “.mp4,” also serves as its filename within our database. Note that videos within the map can be clicked and dragged.

The Categorical, Free Response, and Affective Dimension Judgment Surveys Three separate surveys were used to obtain emotion judgments: one for the categorical judgments, one for the free response judgments, and one for the affective dimension judgments. The categorical judgment survey was used to obtain multiple-select judgments of the emotions elicited by each video. Each of the 2,185 videos was judged by 9 to 17 observers in terms of the 34 categories (listed in Table S1). Observers were required to select at least one category but could select as many as desired. The individual survey each observer provided data for 30 videos, ordered randomly, which played only when the corresponding section of the survey was hovered over with the mouse. Observers were allowed to complete as many of versions of the survey as desired, with different videos presented in each. Payment for each survey was 72 cents. The free response judgment survey was used to collect free response judgments of the emotions elicited by each video. A separate sample of observers, nine each for the different videos, rated each video with 600 free response terms (listed in Dataset S1). Observers responded to each video by typing into a blank box. As observers typed, a drop-down menu appeared displaying all emotion terms containing the currently typed substring. For example, typing the substring “lov-” caused the following terms to be displayed: love, “brotherly love,” “feeling loved,” “loving sympathy,” “maternal love,” “romantic love,” and “self-love.” Observers could select as many emotion terms as desired. The individual survey each observer provided data for contained 30 videos, ordered randomly, which played only when the corresponding section of the survey was hovered over with the mouse. Payment for each survey was 90 cents. The affective dimension judgment survey was used to obtain rating scale judgments of the emotions elicited by each video. A separate sample of observers, nine each for the different videos, rated each video along the 14 affective dimensions. The ratings were each obtained on a nine-point Likert scale with the number 5 anchored at neutral. See Table S2 for the questions corresponding to each affective dimension. For these ratings, because they were more numerous, observers provided data for 12 videos, ordered randomly. Payment for each survey was 80 cents.

SH-CCA Method The SH-CCA method is a means of determining the underlying dimensionality of a dataset in which a set of items is rated in terms of some number of characteristics, and there are repeated measures (in this study multiple raters) for each rating. In brief, SH-CCA is a generalization of split-half reliability analysis, in which the averages obtained from half of the ratings of each stimulus for a single item (e.g., awe) are correlated with the averages obtained from the other half of the ratings, across stimuli. In SH-CCA, the averages obtained from half of the ratings of all items simultaneously are compared using canonical correlation analysis to the averages obtained from other half of the ratings, yielding an estimate of the number of dimensions of reliable variance. Our implementation of SH-CCA involves the following steps. (i) Half of the ratings for each stimulus are randomly assigned to each of two sets. (ii) In a leave-one-out-cross-validation procedure, one stimulus is held out at a time and CCA is performed on the remainder of the stimuli between the averaged ratings from the two sets. The resulting pairs of canonical variate loadings are averaged across each pair (taking advantage of the prior knowledge that both sets came from the same underlying population), then multiplied by the ratings of the held-out stimulus minus the mean ratings from the left-in stimuli, resulting in loadings for each held-out stimulus on each canonical variate. (iii) Loadings on each canonical variate for held-out stimuli are concatenated for each set and correlated across the two sets, resulting in canonical correlation coefficients. Correlations are controlled for the loadings of stimuli in both sets on all previous canonical variates, to adjust for nonorthogonality of held-out variate loadings. P values for each canonical correlation are computed by transforming each correlation coefficient using Student’s t-distribution. (iv) Steps i to iii are repeated 20 times and the resulting 20 P values are averaged for each canonical correlation to obtain final P values. We use leave-one-out-cross-validation instead of a standard test statistic for the significance of each canonical correlation, such Bartlett’s chi-squared statistic, because this and other standard test statistics assume the data are approximately multivariate normally distributed. Our nonmutually exclusive, multiple-choice ratings may violate assumptions of approximate normality. We verified that our implementation of SH-CCA with leave-one-out-cross-validation can be applied successfully to multinomial ratings under varying conditions of underlying noise and systematic dimensionality using 2312 simulation studies (Fig. S2). For these simulations, we estimated the number of significant dimensions by stopping at the first canonical correlation for which P > 0.05. In practice this resulted in statistically conservative estimates. When applied to the 34-category judgments, SH-CCA yielded evidence for between 24 (P < 0.05) and 26 (P < 0.1) significant dimensions of variance. Given our simulation results, in which applying a significance level of 0.1 actually resulted in a familywise error rate (FWER) of 0.0035, we expect these estimates to be conservative. However, we show the results of extracting 24, 25, or 26 dimensions in Fig. S3.

Explainable Variance Calculation To calculate explainable variance, we note that the variance of a given rating across stimuli is equal to the explainable variance plus the unexplainable variance. The unexplainable variance can be estimated as the mean of the squared standard errors across stimuli. Hence, the proportion of explainable variance can be estimated by simply dividing the mean of the squared standard errors by the total variance and subtracting this quantity from 1.

Estimating Significance of Categorical Judgment Ratings and Categorical Dimension Loadings on Individual Videos To estimate the significance of the categorical judgment proportions and dimension loadings of each video we first constructed a null distribution of categorical judgments. To do so, we simulated random 34-category judgments of 20,000 videos from a multinomial distribution with probabilities for each category set to the actual proportion of times the category was selected for our 2,185 videos. Each rating could consist of multiple category selections, with the number of categories selected given by a separate multinomial distribution with probabilities set to the proportion of times each number of selections was made for our 2,185 videos (56% one category, 27% two categories, 11% three categories, 4% five categories, and 1% or less for the remainder). We multiplied the resulting judgment proportions by the categorical dimension coefficients to obtain a null distribution of loadings on each categorical dimension. Finally, we calculated the P value for each proportion/loading as the proportion of times that a greater proportion/loading for that category/dimension appeared within the null distribution. We controlled the FDR using the Benjamini–Hochberg procedure.

Acknowledgments This research was supported in part by a grant from the John Templeton Foundation.