Behavioral experiment

In the present study, we took advantage of a high-quality publicly available dataset, part of the studyforrest project40 (http://studyforrest.org), to demonstrate the existence of a gradient-like organization in brain regions coding emotion ratings. Particularly, we used moment-by-moment scores of the perceived intensity of six basic emotions elicited by an emotionally charged movie (Forrest Gump; R. Zemeckis, Paramount Pictures, 1994), as predictors of fMRI activity in an independent sample. We then tested the correspondence between the fitting of the emotion rating model in TPJ voxels and their relative spatial arrangement to reveal the existence of orthogonal spatially overlapping gradients.

To obtain moment-by-moment emotion ratings during the Forrest Gump movie, we enrolled 12 healthy Italian native speakers (5F; mean age 26.6 years, range 24–34). None of them reported to have watched the movie in 1 year period prior to the experiment. All subjects signed an informed consent to participate in the study, had the right to withdraw at any time and received a small monetary compensation for their participation. The study was conducted in accordance with the Declaration of Helsinki and was approved by the local IRB (CEAVNO: Comitato Etico Area Vasta Nord Ovest; Protocol No. 1485/2017).

We started from the Italian dubbed version of the movie, edited following the exact same description reported in the studyforrest project (eight movie segments ranging from a duration of 11 to 18 min). The movie was presented in a setting free from distractions using a 24″ monitor with a resolution of 1920 × 1080 pixels connected to a MacBook™ Pro running Psychtoolbox64 v3.0.14. Participants wore headphones in a noiseless environment (Sennheiser™ HD201; 21–18,000 Hz; Maximum SPL 108 dB) and were instructed to continuously rate the subjective perceived intensity (on a scale ranging from 0 to 100) of six basic emotions5 throughout the entire movie: happiness, surprise, fear, sadness, anger and disgust. Specific buttons mapped the increase and decrease in intensity of each emotion and subjects were instructed to represent their inner experience by freely adjusting or maintaining the level of intensity. Participants were allowed to report more than one emotion at the same time and ratings were continuously recorded with a 10-Hz sampling rate. Subjects were presented with the same eight movie segments employed in the fMRI study one after the other, for an overall duration of 120 min. Further, before starting the actual emotion rating experiment, all participants performed a 20-min training session to familiarize with the experimental procedure. Specifically, they had to reproduce various levels of intensity for random combinations of emotions that appeared on the screen every 10 s.

For each subject, we recorded six timeseries representing the moment-by-moment perceived intensity of basic emotions. First, we downsampled timeseries to match the fMRI temporal resolution (2 s) and, afterward, we introduced a lag of 2 s to account for the delay in hemodynamic activity. The resulting timeseries were then temporally smoothed using a moving average procedure (10 s window). This method allowed us to further account for the uncertainty of the temporal relationship between the actual onset of emotions and the time required to report the emotional state.

To verify the consistency in the occurrence of affective states while watching the Forrest Gump movie, we computed the Spearman’s ρ correlation coefficient across subjects for each of the six ratings (Fig. 1b). Statistical significance of the agreement was assessed by generating a null distribution of random ratings using the IAAFT procedure (Iterative Amplitude Adjusted Fourier Transform65; Chaotic System Toolbox), which provided surrogate data with the same spectral density and temporal autocorrelation of the averaged ratings across subjects (1000 surrogates).

Preprocessed and temporally smoothed single-subject emotion ratings were averaged to obtain six group-level timeseries representing the basic emotion model. After measuring the Spearman’s ρ between pairings of basic emotions (Fig. 1b), we performed PC analysis and identified six orthogonal components, which constituted the emotion dimension model (Fig. 1c).

To verify the consistency across subjects of the PCs, we computed the agreement of the six components by means of a leave-one-subject-out cross validation procedure (Fig. 1d). Specifically, for each iteration, we performed PC analysis on the left-out subject behavioral ratings and on the averaged ratings of all the other participants. The six components obtained from each left-out subject were rotated (Procrustes analysis, reflection and orthogonal rotation only) to match those derived from all the other participants. This procedure generated for each iteration (i.e., for each of the left-out subjects) six components, which were then compared across individuals using Spearman’s ρ, similarly to what has been done for the six basic emotions. To assess the statistical significance, we created a null distribution of PCs from the generated surrogate data of the behavioral ratings, as described above (1000 surrogates).

Although subjects were asked to report their inner experience using six emotion categories, their ratings were not limited to binary choices. Indeed, at each timepoint raters could specify the perceived intensity of more than one emotion, leading to the definition of more complex affective states as compared to the basic ones. To further highlight this aspect, we performed dimensionality reduction and clustering analyses on emotion timeseries. Starting from emotion ratings averaged across participants, we selected timepoints characterized by the highest intensity (i.e., by summing the six basic emotions and setting the threshold to the 50th percentile) and applied Barnes–Hut t-distributed stochastic neighbor embedding56,66 (t-SNE; perplexity = 30; theta = 0.05). The algorithm measures the distances between timepoints in the six-dimensional space defined by the basic emotions as joint probabilities according to a Gaussian distribution. These distances are projected onto a two-dimensional embedding space using a Student’s t probability distribution and by minimizing the Kullback–Leibler divergence. To further describe the variety of affective states elicited by the movie, we then applied k-means clustering analysis to the projection of timepoints in the t-SNE manifold and determined the number of clusters using the silhouette criterion67.

fMRI experiment

We selected data from the phase II of the studyforrest project, in which 15 German mother tongue subjects watched an edited version of the Forrest Gump movie during the fMRI acquisition. Participants underwent two 1-h sessions of fMRI scanning (3T, TR 2 s, TE 30 ms, FA 90°, 3 mm ISO, FoV 240 mm, 3599 tps), with an overall duration of the experiment of 2 h across eight runs. Subjects were instructed to inhibit any movement and simply enjoy the movie (for further details40). We included in our study all participants that underwent the fMRI acquisition and had the complete recordings of the physiological parameters (i.e., cardiac trace) throughout the scanning time (14 subjects; 6F; mean age 29.4 years, range 20–40 years). For the fMRI pre-processing pipeline, please refer to Supplementary Information.

Encoding analysis

Voxel-wise encoding68,69 was performed using a multiple linear regression approach to measure the association between brain activity and emotion ratings, constituted by the six PCs. Of note, performing a least-square linear regression using either the six PCs or the six basic emotion ratings yields the same overall fitting (i.e., full model R2), even though the coefficient of each column could vary among the two predictor sets.

To reduce the computational effort, we limited the regression procedure to gray matter voxels only (~44 k with an isotropic voxel resolution of 3 mm). We assessed the statistical significance of the R2 fitting of the model for each voxel using a permutation approach, by generating 10,000 null encoding models. Null models were obtained by measuring the association between brain activity and surrogate data having the same spectral density and temporal autocorrelation of the original six PCs. This procedure provided a null distribution of R2 coefficients, against which the actual association was tested. The resulting p-values were corrected for multiple comparisons using the FDR70 method (q < 0.01; Fig. 3a, Supplementary Fig. 1 and Supplementary Table 1). R2 standard error was calculated through a bootstrapping procedure (1000 iterations). Moreover, we conducted a noise-ceiling analysis for right TPJ data, similarly to what has been done by Ejaz and colleagues71 (please see Supplementary Methods).

Emotion gradients in right TPJ

We tested the existence of emotion gradients by measuring the topographic arrangement of the multiple regression coefficients72 in regions lying close to the peak of fitting for the encoding procedure (i.e., right pSTS/TPJ). To avoid any circularity in the analysis73, we first delineated an ROI in the right pSTS/TPJ territories using an unbiased procedure based on the NeuroSynth74 database v0.6 (i.e., reverse inference probability associated to the term TPJ). Specifically, we started from the peak of the TPJ NeuroSynth reverse inference meta-analytic map to draw a series of cortical ROIs, with a radius ranging from 9 to 27 mm. Afterward, to identify the radius showing the highest significant association, for each spherical ROI we tested the relationship between anatomical and functional distance75 (Supplementary Table 2). This procedure was performed using either multiple regression coefficients obtained from the three emotion dimensions or from the four basic emotions stable across all subjects. As depicted in Supplementary Fig. 2, we built for each radius two dissimilarity matrices: one using the Euclidean distance of voxel coordinates, and the other one using the Euclidean distance of the fitting coefficients (i.e., β values) of either the three emotion dimensions or the four basic emotions. The rationale behind the existence of a gradient-like organization is that voxels with similar functional behavior (i.e., lower functional distance) would also be spatially arranged close to each other on the cortex75 (i.e., lower physical distance). The functional and anatomical dissimilarity matrices were compared using the Spearman’s ρ coefficient. To properly assess the significance of the anatomo-functional association, we built an ad hoc procedure that maintained the same spatial autocorrelation structure of TPJ in the null distribution. Specifically, we generated 1000 IAAFT-based null models for the emotion dimension and the basic emotion data, respectively. These null models represented the predictors in a multiple regression analysis and generated a set of null β regression coefficients. Starting from these coefficients, we built a set of functional dissimilarity matrices that have been correlated to the anatomical distance and provided 1000 null Spearman’s ρ coefficients, against which the actual anatomo-functional relationship was tested. Confidence intervals (CI; 2.5 and 97.5 percentile) for the obtained correlation values were calculated employing a bootstrap procedure (1000 iterations). We also tested the existence of gradients in other brain regions encoding emotion ratings using a data-driven searchlight analysis. Results and details of this procedure are reported in Supplementary Information.

To estimate the significance of right TPJ gradients, we used null models built on emotion ratings, leaving untouched the spatial and temporal structure of brain activity. However, as spatial smoothness may still affect the estimation of gradients, we tested right TPJ topography using the group-average unfiltered data. In brief, all the steps described in the fMRI data pre-processing section (see Supplementary Methods) were applied, with the only exception of spatial filtering. Following this procedure, the estimated smoothness of the right TPJ region was 4.5 × 4.2 × 3.6 mm (3dFWHMx tool). Using these data and the same procedure described above, we measured the significance of emotion gradients. Results are detailed in Supplementary Table 6 and Supplementary Fig. 7.

The Euclidean metric does not take into account cortical folding. Indeed, because of the morphological characteristics of TPJ, which include a substantial portion of STS sulcal walls, the estimation of emotion gradients would benefit from the use of a metric respectful of cortical topology. For this reason, we ran the Freesurfer recon-all analysis pipeline76 on the standard space template77 used as reference for the nonlinear alignment of single-subject data. We then transformed the obtained files in AFNI-compatible format (@SUMA_Make_Spec_FS). This procedure provided a reconstruction of the cortical ribbon (i.e., the space between pial surface and gray-to-white matter boundary), which has been used to measure the anatomical distance. In this regard, we particularly employed the Dijkstra algorithm as it represents a computationally efficient method to estimate cortical distance based on folding78,79. The single-subject unsmoothed timeseries were then transformed into the standard space, averaged across individuals and projected onto the cortical surface (AFNI 3dVol2Surf, map function: average, 15 steps). Afterward, we performed a multiple linear regression analysis using PCs derived from emotion ratings as predictors of the unsmoothed functional data. This analysis was carried out within a cortical patch that well approximated the size of the 3D-sphere used in the original volumetric pipeline and centered at the closest cortical point with respect to the Neurosynth TPJ peak. Thus, for each regressor of interest, we obtained unsmoothed β values projected onto the cortical mantle. We then tested the existence of a gradient-like organization for each predictor, using the Dijkstra algorithm and the same procedure described above. Results are detailed in Supplementary Table 6 and Fig. 4.

Right temporo-parietal gradients and portrayed emotions

We tested whether the gradient-like organization of right TPJ reflects portrayed emotions. Thus, we took advantage of publicly available emotion tagging data of the same movie, provided by an independent group41. Differently from our behavioral task, raters were asked to indicate the portrayed emotion of each character (e.g., Forrest Gump, Jenny) in 205 movie segments (average duration ~35 s) presented in random order and labeled over the course of ~3 weeks. As also suggested by the authors41, this particular procedure minimizes carry-over effects and help observers to exclusively focus on indicators of portrayed emotions. Importantly, in their editing, the authors respected the narrative of movie scenes (e.g., Forrest in front of Jenny’s grave is a single cut with a duration of ~131 s), so that the raters could have a clear understanding of what was shown on the screen. In addition, the possibility to tag emotions independently in each movie segment and to watch each scene more than once, allowed subjects to choose among a larger number of emotion categories80 (N = 22), as compared to our set of emotions. Moreover, each observer was instructed to report with a binary label whether the portrayed emotion was directed toward the character itself (self-directed; e.g., Forrest feeling sad) or toward another character (other-directed; e.g., Forrest feeling happy for Jenny). These two descriptions served as third-person emotion attribution models and underwent the exact same processing steps (i.e., 2 s lagging and temporal smoothing), which have been applied to our subjective emotion rating model. As the two third-person emotion attribution models included the four basic emotions found to be consistent across observers in our experiment (i.e., happiness, fear, sadness and anger), we have been able to directly assess the correlation for these ratings using Spearman’s ρ.

We then measured the extent to which the two third-person emotion attribution models explained brain activity in right TPJ following the method described in the “Encoding analysis” section. As these two descriptions are higher in dimensionality as compared to our subjective emotion rating model, we assessed the significance of fitting using three different procedures: (A) matching the dimensionality across models by selecting the first six PCs only; (B) matching the emotion categories in ratings, by performing PCA on the four basic emotions shared across models (i.e., happiness, fear, sadness and anger); (C) using the full model regardless of the dimensionality (i.e., six components for our subjective emotion rating model and 22 for each of the two emotion attribution models). In addition, to allow a direct and unbiased comparison between R2 values obtained from different models, we performed cross-validation using a half-run split method (Supplementary Fig. 8).

Lastly, we tested whether right TPJ gradients encode emotion attribution models. Specifically, we evaluated two different scenarios: (1) the existence of right TPJ gradients encoding the 22 components of each emotion attribution model; (2) the possibility to identify emotion gradients following the multidimensional alignment81 (i.e., canonical correlation analysis) of the 22-dimensional emotion attribution space to the six-dimensional space defined by subjective ratings. These alternative procedures relate to two different questions: (1) whether the process of emotion attribution is associated to emotion gradients in right TPJ and (2) whether starting from a third-person complex description of portrayed emotions, one can reconstruct the subjective report of our raters. Results for these two procedures are detailed in Supplementary Table 5.

Characterization of emotion gradients in right TPJ

Once the optimal ROI radius was identified, we tested the gradient-like organization of right TPJ for each individual emotion dimension and basic emotion (Supplementary Table 3), using the same procedure described above. We calculated the numerical gradient of each voxel using β values. This numerical gradient estimates the partial derivatives in each spatial dimension (x, y, z) and voxel, and can be interpreted as a vector field pointing in the physical direction of increasing β values. Afterward, to characterize the main direction of each gradient, rather than calculating its divergence (i.e., Laplacian of the original data82,83), we computed the sum of all vectors in the field (Fig. 4 and Supplementary Fig. 2). This procedure is particularly useful to reveal the principal direction of linear gradients and provides the opportunity to represent this direction as the orientation of the symmetry axis of the selected ROI. The above-mentioned procedure was also adopted to assess the reliability of the emotion gradients in each subject. Results and details of this procedure are reported in Supplementary Information. Furthermore, since gradients built on β coefficients could reflect positive or negative changes in hemodynamic signal depending on the sign of the predictor, we represented the average TPJ activity during movie scenes characterized by specific affective states (Fig. 5).

We investigated whether distinct populations of voxels are selective for specific affective states. To this aim, we employed the population receptive field method43 (pRF) and estimated the tuning curve of right TPJ voxels for each predictor found to be topographically encoded within this region. We modeled the tuning curve of each voxel as a Gaussian distribution, in which μ represented the preferred score of the predictor and σ the width of the response. The optimal combination of tuning parameters was selected among ~5k plausible values of μ (5th−95th percentile of the scores of each predictor—0.5 step) and σ (ranging from 1 to 12—0.25 step), sampled on a regular grid. Each emotion timeseries was then filtered using these ~5k Gaussian distributions and fitted in brain activity through a linear regression approach. This produced t-values (i.e., β/SE β) expressing the goodness of fit of μ and σ combinations, for each right TPJ voxel. The principal tuning of voxels was then obtained by selecting the combination characterized by the highest t-value across the ~5k samples (Fig. 6).

To estimate the similarity between tunings (i.e., μ parameters) obtained from the pRF approach and our original results (i.e., β coefficients of the gradient estimation), we computed Spearman’s ρ across right TPJ voxels. The significance of such an association was tested against a null distribution of β coefficients obtained through the IAAFT procedure (N = 1000).

Lastly, we further characterized the prototypical responses of populations of voxels as function of affective states. To do so, we used the non-negative matrix factorization84 and decomposed the multivariate pRF data (i.e., voxels t-values for each μ and σ) into an approximated matrix of lower rank (i.e., 10, retaining at least 90% of the total variance). This method allows parts-based representations, as the tuning of right TPJ voxels is computed as a linear summation of non-negative basis responses. The results of this procedure are summarized in Supplementary Fig. 9.

All the analyses were performed using MATLAB R2016b (MathWorks Inc., Natick, MA, USA).

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.