Story understanding involves many perceptual and cognitive subprocesses, from perceiving individual words, to parsing sentences, to understanding the relationships among the story characters. We present an integrated computational model of reading that incorporates these and additional subprocesses, simultaneously discovering their fMRI signatures. Our model predicts the fMRI activity associated with reading arbitrary text passages, well enough to distinguish which of two story segments is being read with 74% accuracy. This approach is the first to simultaneously track diverse reading subprocesses during complex story processing and predict the detailed neural representation of diverse story features, ranging from visual word properties to the mention of different story characters and different actions they perform. We construct brain representation maps that replicate many results from a wide range of classical studies that focus each on one aspect of language processing and offer new insights on which type of information is processed by different areas involved in language processing. Additionally, this approach is promising for studying individual differences: it can be used to create single subject maps that may potentially be used to measure reading comprehension and diagnose reading disorders.

Data Availability: The authors confirm that all data underlying the findings are fully available without restriction. All data are included in Appendix A. Some of the preprocessing steps listed in Appendix A were omitted as some people might want to try different steps on the data. The steps that were omitted are indicated on the pdf provided at http://www.cs.cmu.edu/afs/cs/project/theo-73/www/plosone/files/Appendix_A.pdf by striking through them.

Introduction

Story understanding is a highly complex cognitive process that combines the low level perception of individual words, representing their meanings and parts of speech, understanding the grammar of sentences and their meanings, tying these sentence meanings together into a coherent understanding of the story plot and the evolving beliefs, desires, emotions, and actions of story characters. Story understanding and word and sentence processing have long been central topics of study across diverse fields including linguistics, computer science [1], cognitive science [2], literature and philosophy [3].

Due to this complexity, most experimental brain imaging studies of language processing have focused on just one aspect of language at a time, via carefully controlled experiments. For example, researchers have searched for brain regions where neural activity increases or decreases when the input stimulus is a word, in contrast to a non-word letter string [4], or a sentence with simple versus complex syntax [5], or a sentence with expected versus unexpected meaning [6]. These experiments require carefully controlled, hand-tailored textual stimuli that vary solely along one dimension of interest, raising the question of how much these findings reflect language processing in complex every-day use.

One of the main questions in the study of language processing in the brain is to understand the role of the multiple regions that are activated in response to reading. A network of multiple brain regions have been implicated in language [5], [7], and while the view of the field started with a simplistic dissociation between the roles of Broca's area and Wernicke's area, the current theories about language comprehension are more complex and most of them involve different streams of information that involve multiple regions (including Broca's and Wernicke's). Because of the complexity of language, the different experimental setups and the different hypotheses tested, different models have emerged leading to little agreement in the field, including on fundamental questions such as: Are language regions specific to language? [7]. There has been disagreement as well about other questions such as the role of the different "language" regions and the differentiation between regions processing syntax and regions processing semantics. [8] has found no regions to be responsive exclusively to syntactic or semantic information, while [9] has found regions in the Inferior Frontal Gyrus (IFG) that exclusively process syntax or semantics. Different models of meaning integration have also been proposed that disagree on the order in which semantic and syntactic information is accessed as a word is encountered, as well as on the order of integration of this information [10], [11].

We present in this paper a novel approach that can be used to answer these questions, as well as initial results that show that the different language processes are represented by different distributions of brain areas. Our data-driven approach studies the type of information being represented in different parts of the brain during a naturalistic task in which subjects read a chapter from Harry Potter and the Sorcerer's Stone [12]. We extract from the words of the chapter very diverse features and properties (such as semantic and syntactic properties, visual properties, discourse level features) and then examine which brain areas have activity that is modulated by the different types of features, leading us to distinguish between brain areas on the basis of which type of information they represent.

Our approach differs in multiple key respects from typical language studies. First, the subjects in our study read a non-artificial chapter, exposing them to the rich lexical and syntactic variety of an authentic text that evokes a natural distribution of the many neural processes involved in diverse, real-world language processing. Second, our analysis method differs significantly from studies that search for brain regions where the magnitude of neural activity increases along one stimulus dimension. Instead, our approach is to train a comprehensive generative model that simultaneously incorporates the effects of many different aspects of language processing. Given a text passage as input, this trained computational model outputs a time series of fMRI activity that it predicts will be observed when the subject reads that passage. The text passage input to the model is annotated with a set of 195 detailed features for each word, representing a wide range of language features: from the number of letters in the individual word, to its part of speech, to its role in the parse of its sentence, to a summary of the emotions and events involving different story characters. The model makes predictions of the fMRI activation for an arbitrary text passage, by capturing how this diverse set of information contributes to the neural activity, then combining these diverse neural encodings into a single prediction of brain-wide fMRI activity over time.

Our model not only accounts for the different levels of processing involved in story comprehension; it goes further by explicitly searching for the brain activity encodings for individual stimuli such as the mention of a specific story character, the use of a specific syntactic part-of-speech or the occurrence of a given semantic feature. The resulting trained model extrapolates from the training data to make testable predictions of the brain activity associated with novel text passages with may vary arbitrary in their content. In training this generative model we make minimal prior assumptions about the form of the hemodynamic response that relates neural activity to observed fMRI activity, instead allowing the training procedure to estimate the hemodynamic response separately for each distinct story feature at each distinct voxel; it has been shown that the hemodynamic response varies across different regions of the brain [13]. We also employ a novel approach for combining fMRI data from multiple human subjects, which is robust to small local anatomical variabilities among their brains. This approach allows us to produce more accurate population-wide brain representation maps by using data from multiple subjects, while avoiding the major problem associated with averaging voxel-level data across multiple subjects: the bias in favor of regions were subjects share the same smooth representation.

To validate this modeling technique, we show below that the predictions of our trained model are sufficiently accurate to distinguish which of two previously unseen short text passages is being read, given only the observed fMRI brain activity, with an accuracy of 74%. This accuracy is significantly higher than chance accuracy (50%), with . While the exact numerical value of the accuracy might not be particularly revealing, the fact that we can obtain such a statistically significant result is to our knowledge a novel discovery. It has not been shown previously that one could model in detail the rapidly varying dynamics of brain activity with fMRI while reading at a close to normal speed. This finding has important significance for the future study of reading and language processing, specifically given the new trend in cognitive neuroscience to shift away from experiments with artificial, controlled stimuli to using natural stimuli that mimic real life conditions [14] in order to obtain more generalizable conclusions.

Reporting accuracy of the trained model predictions is however not the main contribution of this paper. We also use the brain activity encodings of different story features learned by the trained model – including perceptual, syntactic, semantic, and discourse features – to provide new insights into where and how these different types of information are encoded by brain activity. We align and contrast these results with several previously published studies of syntax, semantics, and models of the mental states and social interactions with others. In this paper, we use the term "semantic features" to refer to the lexical semantic properties of the stimulus words, and use "discourse features" to refer to discourse semantics of the story.

The experiments in this paper use a particular set of 195 features, and provide a solid proof of concept of the approach. However, this approach is flexible and capable of capturing additional alternative hypotheses by changing the time series of features used to describe the sequence of words in the story. We plan to use this method in the future to test and contrast competing theories of reading and story understanding. As long as different theories can be characterized in terms of different time series of annotated story features, our approach can compare them by training on these alternative feature sets, then testing experimentally which theory offers a better prediction of brain data beyond the training set.

Our approach is analogous to earlier work that trained a computational model to predict fMRI neural representations of single noun meanings [15]. However, here we extend that approach from single nouns and single fMRI images, to passages of text in a story, and the corresponding time series of brain activity. This work is also analogous to recent work analyzing fMRI from human subjects watching a series of short videos where a large set of objects were identified and annotated with semantic features that were then mapped to brain locations [16], though that work was restricted to semantic features and did not include language stimuli. Our approach is the first to provide a generative, predictive model of the fMRI neural activity associated with language processing involved in comprehending written stories.