We present a high-dimensional model of the representational space in human ventral temporal (VT) cortex in which dimensions are response-tuning functions that are common across individuals and patterns of response are modeled as weighted sums of basis patterns associated with these response tunings. We map response-pattern vectors, measured with fMRI, from individual subjects' voxel spaces into this common model space using a new method, “hyperalignment.” Hyperalignment parameters based on responses during one experiment—movie viewing—identified 35 common response-tuning functions that captured fine-grained distinctions among a wide range of stimuli in the movie and in two category perception experiments. Between-subject classification (BSC, multivariate pattern classification based on other subjects' data) of response-pattern vectors in common model space greatly exceeded BSC of anatomically aligned responses and matched within-subject classification. Results indicate that population codes for complex visual stimuli in VT cortex are based on response-tuning functions that are common across individuals.

We tested the validity of the common model by performing between-subject MVP classification of responses to a wide range of visual stimuli—time segments from the movie and still images of seven categories of faces and objects and six animal species. For between-subject classification (BSC), the response vectors for one subject were classified using a classifier model based on other subjects' response vectors. We compared BSC performance for response vectors that had been transformed into the common model space to BSC for data that were aligned across subjects based on anatomy and to within-subject classification (WSC), in which the response vectors for a subject were classified using an individually tailored classifier model based on response vectors from the same subject. Results showed that BSC accuracies for response-pattern vectors in common model space were markedly higher than BSC accuracies for anatomically aligned response-pattern vectors and equivalent to WSC accuracies. More than 20 dimensions were needed to achieve this level of accuracy. Here we present a common model space with 35 dimensions. Thus, the representational space in VT cortex can be modeled with response-tuning functions that are common across subjects. These response-tuning functions are associated with cortical topographies that serve as basis patterns for modeling patterns of response to stimuli and can be examined in each individual's VT cortex. The general validity of the model across the varied stimulus sets that we tested could be achieved only when hyperalignment was based on responses to the movie. Common models based on responses to smaller, more controlled stimulus sets—still images of a limited number of categories—were valid only for restricted stimulus domains, indicating that these models captured only a subspace of the substantially larger representational space in VT cortex.

Here we present a high-dimensional model of the representational space in VT cortex that is based on response-tuning functions that are common across brains and is valid across a wide range of complex visual stimuli. To construct this model, we developed a method, hyperalignment, which aligns patterns of neural response across subjects into a common, high-dimensional space. We estimated the hyperalignment parameters that transform an individual's VT voxel space into this common space based on responses obtained with fMRI while subjects watched a full-length action movie, Raiders of the Lost Ark. We reasoned that estimation of hyperalignment parameters that are valid for a large domain of complex visual stimuli would require sampling responses to a wide range of stimuli. Viewing a natural movie evokes local brain responses that show synchrony across subjects in a large expanse of cortex, including visual areas in the occipital, ventral temporal, and lateral temporal cortices (). In contrast to earlier univariate analyses of local synchrony, we took a multivariate approach to analyze the time-varying patterns of response evoked by this rich, dynamic stimulus. We reasoned that in the brains of two individuals viewing the same dynamic visual stimulus, such as a full-length action movie, the trajectories of VT response-pattern vectors over time reflect similar visual information, but the coordinate systems for their respective representational spaces, in which each dimension is one voxel, are poorly aligned. Hyperalignment uses Procrustean transformations () iteratively over pairs of subjects to derive a group coordinate system in which subjects' vector trajectories are in optimal alignment. The Procrustean transformation is an orthogonal transformation (rotations and reflections) that minimizes the Euclidean distance between two sets of paired vectors. After hyperalignment, we reduced the dimensionality of the common space by performing a principal components analysis (PCA) and determined the subspace that is sufficient to capture the full range of response-pattern distinctions.

Representational distinctions among complex visual stimuli are embedded in topographies in VT cortex that have coarse-to-fine spatial scales. Large-scale topographic features that are fairly consistent across individuals reflect coarser categorical distinctions, such as animate versus inanimate categories in lateral to medial VT cortex (), faces versus objects and body parts versus objects (the fusiform face and body-parts areas, FFAs and FBAs, respectively;), and places versus objects (the parahippocampal place area, PPA;). Finer distinctions among animate categories, among mammalian faces, among buildings, and among objects appear to be carried by smaller-scale topographic features, and an arrangement of these features that is consistent across brains has not been reported (). MVP analysis can detect the features that underlie these representational distinctions at both the coarse and fine spatial scales, whereas conventional univariate analyses are sensitive only to the coarse spatial scale topographies. Current models of the functional organization of VT cortex that are based on response-tuning functions defined by simple contrasts, such as faces versus objects or scenes versus objects, and on relatively large category-selective regions, such as the FFA and PPA (), fail to capture the fine-grained distinctions among responses to a wide range of stimuli and the fine spatial scale of the response patterns that carry those distinctions.

Representations of complex visual stimuli in human ventral temporal (VT) cortex are encoded in population responses that can be decoded with multivariate pattern (MVP) classification (). Population responses are patterns of neural activity. For MVP analysis, patterns of activity are analyzed as vectors in a high-dimensional space in which each dimension is a local feature in the distributed pattern. We refer to this response-pattern vector space as a representational space. Features can be single-neuron recordings, local field potentials, or imaging measures of aggregate local neural activity, such as voxels in functional magnetic resonance imaging (fMRI). MVP analysis exploits variability in response-tuning profiles across these features to classify and characterize the distinctions among responses to different stimuli (). Because establishing feature correspondence across brains is difficult, a new classifier model generally is built for each brain. Consequently, no general model of the representational space in VT cortex exists that uses a common set of response-tuning functions and can account for the fine-grained distinctions among neural representations in VT cortex for a wide range of visual stimuli.

We then asked whether the category-selective FFA and PPA could be identified reliably in the common model space. For each subject, we projected all other subjects' face and object data in the 35-dimensional common model space into that subject's 1,000 voxel native space. We then identified face-selective and house-selective regions in that subject's VT cortex based solely on other subjects' data. Figure 6 B shows group-defined FFAs and PPAs in two representative subjects. The outlines of the individually defined FFA and PPA are superimposed on the group-defined regions to illustrate the close correspondence. Thus, the common model also captures the individual-specific anatomical locations of category-selective regions within the VT cortex.

The topographies for the PCs in the common model that best capture the variance in responses to the movie, a complex natural stimulus, did not correspond well with the category-selective regions, the FFA and PPA, that are identified based on responses to still images of a limited variety of stimuli. We next asked whether the category selectivity that defines these regions is preserved in the 35-dimensional representational space of our model. First, we defined a dimension in the model space based on a linear discriminant that contrasts the mean response vector to faces and the mean response vector to houses and objects. The mean response vectors were based on group data in the face and object perception experiment. We then plotted the voxel weights for this dimension in the native anatomical spaces for individual subjects ( Figure 6 A ; Figure S1 F). Unlike the topographies for principal components, the voxel weights for this faces-versus-objects dimension have a topography that corresponds well with the boundaries of individually defined FFAs. Thus, when the response-tuning profiles are modeled with this single dimension, the face selectivity of FFA voxels is evident, but this dimension does not capture the fine-scale topography in the FFA that is the basis for decoding finer distinctions among faces or among nonface objects. By contrast, the dimensions in the common model do capture these distinctions. MVP analysis restricted to the FFA affords significant classification of human faces versus animal faces, dog faces versus monkey faces, and even shoes versus chairs. Moreover the topography within the FFA that enables decoding these distinctions can be captured with common basis functions when hyperalignment is restricted to individually defined FFA voxels ( Figure S2 E).

(B) FFA and PPA regions defined by contrasts in group data projected into the native voxel spaces of two subjects. For each subject, that subject's own data were excluded from the calculation of face selectivity and house selectivity, yielding category-selective regions that were based exclusively on other subjects' data. Each subject's individually defined FFAs and PPAs are shown as outlines to illustrate the tight correspondence with model-defined category-selective regions.

(A) The topography associated with the contrast between mean response to faces as compared to the mean response to nonface objects (houses, chairs, and shoes). Note the tight correspondence of the regions with positive weights and the outlines of individually defined FFAs.

Overall, these results show that the PCA-defined dimensions capture a functional topography in VT cortex that has more complexity and a finer spatial scale than that defined by large category-selective regions such as the FFA and PPA.

(B) The cortical topographies for the same PCs projected into the native voxel spaces of two subjects as the voxel weights for each PC in the matrix of hyperalignment parameters for each subject. The outlines of individually defined face-selective (FFA) and house-selective (PPA) regions are shown for reference. See also Figure S5

(A) Category response-tuning profiles for the first, second, third, and fifth PCs in the common model space. These PCs were derived to account for variance of responses to the movie, but they also are associated with differential responses to the categories in the other two experiments. The scale for response-tuning profiles is centered on zero, corresponding to the mean response to the movie, and scaled so that the maximum deviation from zero (positive or negative) is set to one.

Individual VT voxel spaces can be transformed into the common model space with a single parameter matrix (the first 35 columns of an orthogonal matrix; Figure 1 Figure S1 A). Each common model space dimension is associated with a time-series response for each experiment. A response-tuning profile for an individual voxel is modeled as a weighted sum of these 35 response-tuning functions ( Figure S1 E). Each dimension is also associated with a topographic pattern in each individual subject's VT voxel space ( Figure S1 C), and the response pattern for a stimulus is modeled as a weighted sum of these 35 patterns ( Figure S1 D).

The dimensions that define the common model space are selected as those that most efficiently account for variance in patterns of response to the movie. The model space has a dimensionality that is much lower than that of the original voxel space but much higher than the handful of binary contrasts for face, place, and body-part selectivity that have dominated most investigations into the functional organization of VT cortex. Each dimension is associated with a response-tuning function that is common across brains and with individual-specific cortical topographies. The dimensions have meaning in aggregate as a computational framework that captures the distinctions among VT representations for a diverse set of complex visual stimuli, but their meaning in isolation is less clear. The coordinate axes for this space, however, can be rotated to search for dimensions that have clearer meaning, in terms of response-tuning function, and the cortical topographies for dimensions in a rotated model space can be examined. Here we probe the meaning of the common model space. First we examine the response-tuning functions and cortical topographies for four of the top five PCs. In the next section, we illustrate how to derive a dimension based on a simple stimulus contrast—faces versus objects—and examine the associated cortical topographies. We show that the cortical topographies associated with well-known category selectivities are preserved in the 35-dimensional common model space.

We also tested whether the general validity of the model space reflects responses to stimuli that are in both the movie and the category perception experiments or reflects stimulus properties that are not specific to these stimuli. We recomputed the common model after removing all movie time points in which a monkey, a dog, an insect, or a bird appeared. We also removed time points for the 30 s that followed such episodes to factor out effects of delayed hemodynamic responses. BSC of the face and object and animal species categories, including distinctions among monkeys, dogs, insects, and birds, was not affected by removing these time points from the movie data (65.0% ± 1.9% versus 64.8% ± 2.3% for faces and objects; 67.1% ± 3.0% versus 67.6% ± 3.1% for animal species; Figure S4 B). This result suggests that the movie-based hyperalignment parameters that afford generalization to these stimuli are not stimulus specific but, rather, reflect stimulus properties that are more abstract and of more general utility for object representations.

We conducted further analyses to investigate the properties of responses to the movie that afford general validity across a wide range of stimuli. We tested BSC of single time points in the movie and in the face and object perception experiment, in which we carefully matched the probability of correct classifications for the two experiments. Single TRs in the movie experiment could be classified with accuracies that were more than twice that for single TRs in the category perception experiment (74.5% ± 2.5% versus 32.5% ± 1.8%; chance = 14%; Figure S4 A). This result suggests that VT responses evoked by the cluttered, complex, and dynamic images in the movie are more distinctive than are responses evoked by still images of single faces or objects.

We next asked whether hyperalignment based on these simpler stimulus sets was sufficient to derive a common space with general validity across a wider array of complex stimuli. We applied the hyperalignment parameters derived from the face and object data to the movie data in the ten Princeton subjects and the hyperalignment parameters derived from the animal species data to the movie data in the 11 Dartmouth subjects. BSC of 18 s movie time segments after hyperalignment based on category perception experiment data was markedly worse than BSC after hyperalignment based on movie data (17.6% ± 1.3% versus 65.8% ± 2.7% for Princeton subjects; 28.3% ± 2.8% versus 74.9% ± 4.1% for Dartmouth subjects; p < 0.001 in both cases; Figure 4 ). Thus, hyperalignment of data using a set of stimuli that is less diverse than the movie is effective, but the resultant common space has validity that is limited to a small subspace of the representational space in VT cortex.

BSC of face and object categories after hyperalignment based on data from that experiment was equivalent to BSC after movie-based hyperalignment (62.9% ± 2.9% versus 63.9% ± 2.2%, respectively; Figure 4 ). Surprisingly, BSC of the animal species after hyperalignment based on data from that experiment was significantly better than BSC after movie-based hyperalignment (76.2% ± 3.7% versus 68.0% ± 2.8%, respectively; p < 0.05; Figure 4 ). This result suggests that the validity for a model of a specific subspace may be enhanced by designing a stimulus paradigm that samples the brain states in that subspace more extensively.

We compared BSC accuracies (means ± SE) for data in the common model space based on movie viewing relative to common model spaces based on responses to the images in the category perception experiments. Note that common models based on responses to the category images afford good BSC for those experiments but do not generalize to BSC of responses to movie time segments. Only the common model based on movie viewing generalizes to high levels of BSC for stimuli from all three experiments. Dashed lines indicate chance performance. See also Figure S4

We next asked whether a complex, natural stimulus, such as the movie, is necessary to derive hyperalignment parameters that generate a common space with general validity across a wide range of complex visual stimuli. In principle, a common space and hyperalignment parameters can be derived from any fMRI time series. We investigated whether hyperalignment of the face and object data and hyperalignment of the animal species data would afford high levels of BSC accuracy using only the data from those experiments. In each experiment, we derived a common space based on all runs but one. We transformed the data from all runs, including the left-out run, into this common space. We trained the classifier on those runs used for hyperalignment in all subjects but one and tested the classifier on the data from the left-out run in the left-out subject. Thus, the test data for determining classifier accuracy played no role either in hyperalignment or in classifier training ().

We next asked whether the information necessary for classification of stimuli in the two category perception experiments could be captured in smaller subspaces and whether these subspaces were similar. For each experiment, we performed PCAs on mean category vectors for each run, averaged across subjects, in the common model space derived from the movie data, then used these PCs for BSC. Data folding, i.e., division of data into training and testing sets, ensured that generalization testing was done on data that were not used for hyperalignment or classifier training (). BSC of the face and object categories reached a maximal level with the top 12 PCs from the PCA of the face and object data (67.7% ± 2.1%). BSC of the animal species reached a maximal level with the top nine PCs from the PCA of the animal species data (73.9% ± 3.0%). The top PCs from the face and object data, however, did not afford good classification of the animal species (55.0% ± 3.4%) or the movie time segments (50.1% ± 2.7%), nor did the top PCs from the animal species data afford good classification of the face and object categories (54.2% ± 2.6%) or the movie time segments (49.5% ± 2.6%; Figure 3 B). Thus, the lower-dimensional representational spaces for the limited number of stimulus categories in the face and object experiment and in the animal species experiment are different from each other and are of less general validity than the higher-dimensional movie-based common model space.

We next asked how many dimensions are necessary to capture the information that enables these high levels of BSC accuracy ( Figure 1 ). We performed a principal components analysis (PCA) of the mean responses to each movie time point in common model space, averaging across subjects, then performed BSC of the movie, face and object, and animal species data with varying numbers of top principal components (PCs). The results show that BSC accuracies for all three data sets continue to increase with more than 20 PCs ( Figure 3 A ). We present results for a common model space with 35 dimensions, which affords BSC classification accuracies that are equivalent to BSC accuracies using all 1,000 original dimensions (68.3% ± 2.6% versus 70.6% ± 2.6% for movie time segments; 64.8% ± 2.3% versus 63.9% ± 2.2% for faces and objects; 67.6% ± 3.1% versus 68.0% ± 2.8% for animal species; Figure 2 A). The effect of number of PCs on BSC was similar for models that were based only on Princeton (n = 10) or Dartmouth (n = 11) data, suggesting that this estimate of dimensionality is robust across differences in scanning hardware and scanning parameters (see Figure S3 D).

(B) BSC for 35 PCs that were calculated based on responses during movie viewing, for ten PCs that were calculated based on responses to the face and object images, and for ten PCs that were calculated based on responses to the animal images. Note that only the 35 PCs based on responses to the movie afforded high levels of BSC for stimuli from all three experiments. Dashed lines indicate chance performance. See also Figure S3

After hyperalignment using parameters derived from the movie data, BSC identified the six animal species with 68.0% accuracy (SE = 2.8%, chance = 16.7%; Figure 2 A). The confusion matrix shows that the classifier could identify each individual species and that confusions were most often made within class, i.e., between insects, between birds, or between primates. WSC accuracy (68.9% ± 2.8%) was equivalent to BSC of hyperaligned data with a similar pattern of confusions. BSC of anatomically aligned animal species data (37.4% ± 1.5%) showed an even larger decrement relative to BSC of hyperaligned data than that found for the face and object perception data (p < 0.001).

We used a linear support vector machine (SVM) for BSC of both category perception experiments. After hyperalignment using parameters derived from the movie data, BSC identified the seven face and object categories with 63.9% accuracy (SE = 2.2%, chance = 14.3%; Figure 2 A). The confusion matrix ( Figure 2 B) shows that the classifier distinguished human faces from nonhuman animal faces and monkey faces from dog faces but could not distinguish human female from male faces. The classifier also could distinguish chairs, shoes, and houses. Confusions between face and object categories were rare. WSC accuracy (63.2% ± 2.1%) was equivalent to BSC of hyperaligned data with a similar pattern of confusions, but BSC of anatomically aligned data (44.6% ± 1.4%) was significantly worse (p < 0.001; Figure 2 ).

We used a one-nearest neighbor classifier based on vector correlations for BSC of 18 s segments of the movie (six time points, TR = 3 s). An individual's response vector to a specific time segment was correctly classified if the correlation of that response vector with the group mean response vector (excluding that individual) for the same time segment was higher than all correlations of that vector with group mean response vectors for more than 1,000 other time segments of equal length. Other time segments were selected using a sliding time window, and those that overlapped with the target time segment were excluded from comparison. After hyperalignment, BSC identified these segments correctly with 70.6% accuracy (SE = 2.6%, chance < 1%; Figure 2 ). After anatomical alignment, the same time segments could be classified with 32.0% accuracy (SE = 2.5%), a level of accuracy that was better than chance but far lower than after hyperalignment (p < 0.001).

(A) Classification accuracies (means ± SE) for BSC of data that have been mapped into the 1,000-dimensional common space with hyperalignment, into the 35-dimensional common model space, and into Talairach atlas space (anatomically aligned), as well as for WSC of the category perception experiments. Dashed lines indicate chance performance.

We calculated a common space for all 21 subjects based on responses to the movie ( Figure 1 , middle). We performed BSC of response patterns from all three data sets to test the validity of this space as a common model for the high-dimensional representational space in VT cortex. With BSC, we tested whether a given subject's response patterns could be classified using an MVP classifier trained on other subjects' patterns. For BSC of the movie data, we used hyperalignment parameters derived from responses to one half of the movie to transform each subject's VT responses to the other half of the movie into the common space. We then tested whether BSC could identify sequences of evoked patterns from short time segments in the other half of the movie, as compared to other possible time segments of the same length. The data used for BSC of time segments in one half of the movie was not used for voxel selection or derivation of hyperalignment parameters (). For the category perception experiments, we used the hyperalignment parameters derived from the entire movie data to transform each subject's VT responses to the category images into the common space and tested whether BSC could identify the stimulus category being viewed. As a basis for comparison, we also performed BSC on data that had been aligned based on anatomy, using normalization to the Talairach atlas (). For the category perception experiments, we also compared BSC to within-subject classification (WSC), in which individually tailored classifiers were built for each subject. Because each movie time segment was unique, WSC of movie time segments was not possible. Voxel sets were selected based on between-subject correlations of movie time series (see Supplemental Experimental Procedures ). BSC accuracies were relatively stable across a wide range of voxel set sizes. We present results for analyses of 1,000 voxels (500 per hemisphere). See Figures S3 A and S3B for results using other voxel set sizes.

Hyperalignment uses the Procrustean transformation () to align individual subjects' VT voxel spaces into a common space ( Figure 1 ). Individual voxel spaces and the common space are high dimensional, unlike the three-dimensional anatomical spaces. The Procrustean transformation finds the optimal orthogonal matrix for a rigid rotation with reflections that minimizes Euclidean distances between two sets of labeled vectors. For hyperalignment, labeled vectors are patterns of response for time points in an fMRI experiment, and the Procrustean transformation rotates (with reflections) the high-dimensional coordinate axes for each subject to align pattern vectors for matching time points. After rotation, coordinate axes, or dimensions, in the common space are no longer single voxels with discrete cortical locations but, rather, are distributed patterns across VT cortex (weighted sums of voxels). Minimizing the distance between subjects' time-point response-pattern vectors also makes time-series responses for each common space dimension maximally similar across subjects (see Figure S2 A available online). First, the voxel spaces for two subjects were brought into optimal alignment. We then brought a third subject's voxel space into optimal alignment with the mean trajectory for the first two subjects and proceeded by successively bringing each remaining subject's voxel space into alignment with the mean trajectory of response vectors from previous subjects. In a second iteration, we brought each individual subject's voxel space into alignment with the group mean trajectory from the first iteration and recalculated the group mean vector trajectory. In the third and final step, we recalculated the orthogonal matrix that brought each subject's VT voxel space into optimal alignment with the final group mean vector trajectory. The orthogonal matrix for each subject was then treated as that subject's “hyperalignment parameters,” which we used to transform data from independent experiments into the common space.

The upper box shows the input data before any transformations—separate matrices of 500 voxels in the VT cortex of each hemisphere with time-series data for each of 21 subjects. The middle box represents the data structures after hyperalignment. For each subject there is a matrix of time-series data that has been rotated (with reflections) into the common, 500-dimensional space for the VT cortex of each hemisphere with an orthogonal matrix—the hyperalignment parameters—that specifies that transformation. The mean time-series data in the common spaces—2 matrices with 500 dimensions × 2,205 time points—are the targets for hyperalignment. The lower box represents the data structures after dimensionality reduction. PCA was performed on the mean time-series data from all 1,000 dimensions (right and left VT cortices), and the top 35 PCs were found to afford BSC that was equivalent to BSC of the 1,000-dimensional hyperaligned data and to WSC. For each subject, there is a matrix of time series for each PC (35 PCs × 2,205 time points) and part of an orthogonal matrix (35 PCs × 1,000 voxel weights) that can be used to transform any data from the same 1,000 VT voxels into the common model space. See also Figure S1

In our first experiment, we collected functional brain images while 21 subjects watched a full-length action movie, Raiders of the Lost Ark. In a second experiment, we measured brain activity while ten of these subjects, at Princeton University, looked at still images of seven categories of faces and objects—male faces, female faces, monkey faces, dog faces, shoes, chairs, and houses. In a third experiment, we measured brain activity while the other 11 subjects, at Dartmouth College, looked at still images of six animal species—ladybugs, luna moths, yellow-throated warblers, mallards, ring-tailed lemurs, and squirrel monkeys.

Discussion

The objective of this research project was to develop a model of the representational space in VT cortex that (1) is based on response-tuning functions that are common across brains and (2) captures the fine-scale distinctions among representations of complex stimuli that, heretofore, have only been captured by within-subject analyses using MVP classification. To meet this objective, we developed a method, hyperalignment, which maps data from individual subjects' native voxel spaces into a common, high-dimensional space. The dimensions in this common space are basis functions that are distinct response-tuning functions defined by their commonality across brains. Model dimensions also are associated with topographic patterns in each subject's native voxel space. Our results show that transformation of response vectors into common space coordinates affords between-subject MVP classification of subtle distinctions among complex visual stimuli at levels of accuracy that far exceed BSC based on anatomically aligned voxels and are equivalent to, and can even exceed, WSC. Hyperalignment thus makes it possible to build a high-dimensional model of the representational space in VT cortex that is valid across brains.

We also investigated whether we could build a single model that was not only valid across brains, but also valid across a wide range of complex visual stimuli. To this end, we used a complex and dynamic natural stimulus—a full-length action movie—to sample a diverse variety of representational states. The results show that hyperalignment based on responses to this stimulus affords a single model of VT cortex with general validity across a broad range of stimuli, whereas hyperalignment based on responses to still images in more controlled, conventional experiments does not. Thus, by virtue of the rich diversity of a complex, natural stimulus, our model of the representational space in VT cortex also has general validity across stimuli.

Initially, the common space produced with hyperalignment has the same number of dimensions as the number of voxels in each individual's native space. We asked how many distinct common response-tuning functions are needed to contain the information that affords the full range of fine-grained distinctions among complex, visual stimuli. We tested the sufficiency of lower-dimensional subspaces and found that BSC accuracies continued to increase with more than 20 common response-tuning functions. We present a 35-dimensional common model space that afforded BSC for all three experiments at levels of accuracy that were equivalent to BSC with all 1,000 hyperaligned dimensions or WSC with 1,000 voxels. Ten dimensions were sufficient within the limited stimulus domains of each category perception experiment, but these sets of ten dimensions did not afford high levels of BSC for the other experiment or for the movie. Thus, these lower-dimensional models are subspaces of the full model and are valid only for more limited stimulus domains.

Complex, Natural Stimuli for Sampling Representational State Spaces The second goal of this project was to develop a single model that was valid across stimuli that evoke distinct patterns of response in VT cortex. To this end, we collected three data sets for deriving transformations into a common space and testing general validity. All data sets could be used to derive the parameters for hyperalignment, and all data sets allowed BSC of responses to different stimuli. Hasson et al., 2004 Hasson U.

Nir Y.

Levy I.

Fuhrmann G.

Malach R. Intersubject synchronization of cortical activity during natural vision. Bartels and Zeki, 2004 Bartels A.

Zeki S. Functional brain mapping during free viewing of natural scenes. Sabuncu et al., 2010 Sabuncu M.R.

Singer B.D.

Conroy B.

Bryan R.E.

Ramadge P.J.

Haxby J.V. Function-based intersubject alignment of human cortical anatomy. The central challenge was to estimate parameters in each subject for a high-dimensional transformation that captures the full variety of response patterns in VT cortex. We reasoned that achieving such general validity would require sampling a wide range of stimuli that reflect the statistics of normal visual experience. The use of a limited number of stimuli—eight, 12, or even 20 categories—constrains the number of dimensions that may be derived. We chose the full-length action movie as a varied, natural, and dynamic stimulus that can be viewed during an fMRI experiment (). Parameter estimates derived from responses to this stimulus produced a common model space that afforded highly accurate MVP classification for all three experiments. Supplemental analysis of the effect of the number of movie time points used for model derivation indicates that maximal BSC required most of the movie (1,700 time points or 85 min; Figure S2 D). This space has a dimensionality that cannot logically be derived from a more limited stimulus set. By contrast, the responses evoked by the stimuli in the category perception experiments did not have these properties. We also derived common models based on responses to the face and object categories in ten subjects and on responses to the pictures of animals in 11 subjects. These alternative common models afforded high levels of accuracy for BSC of the stimulus categories used to derive the common space but did not generalize to BSC for the movie time segments. Thus, models based on hyperalignment of responses to a limited number of stimulus categories align only a small subspace within the representational space in VT cortex and are, therefore, inadequate as general models of that space. On the positive side, these results also show that hyperalignment can be used for BSC of an fMRI experiment without data from movie viewing. Further analyses revealed other desirable properties of the movie as a stimulus for model derivation. The movie evoked responses in VT cortex that were more distinctive than were responses to the still images in the category perception experiments. Moreover, the general validity of the model based on the responses to the movie is not dependent on responses to stimuli that are in both the movie and the category perception experiments but, rather, appears to rest on stimulus properties that are more abstract and of more general utility.