Predictive processing as a conceptual framework for understanding brain has a long tradition in the fields of computational and cognitive neuroscience and has been elegantly summarized elsewhere (). The principle of a comparison between predicted and actual feedback is also often used to model the function of the cerebellum () and the dopaminergic reward system (). Surprisingly, however, predictive processing in the neocortex has received little attention at the physiological level. This is in part due to the difficulty of designing experiments that can effectively disambiguate between the neuronal activity associated with bottom-up representation and that associated with predictive processing hypotheses, and it is in part because we have poor experimental access to internal models or control over the associated predictions. In this perspective, we argue that existing data about neural activity and neural circuit organization of the (sensory) cortex can be understood in the context of a predictive processing framework, and we highlight recent direct evidence in support. We then discuss how computations required for predictive processing might be implemented at the circuit level and propose experiments that would provide a mechanistic corroboration.

The mapping of the motor command onto the sensory consequences of the movement functions to simulate the environment and thus is the internal model of the world. The idea that the brain uses an internal model to predict sensory input based on movements and past sensory experience has been formalized in several different variants: predictive coding, hierarchical temporal memory, and Bayesian inference (). All of these are based around the idea of a generative model of the world used to predict sensory input. Following Andy Clark (), we will refer to this family of theories as the predictive processing framework. Of note, we do not wish to diminish the importance of the discrepancies between the different theories we are grouping here (see, e.g.,for a review of different variants of predictive coding), but will focus on their common premise. Here, we will focus on aspects of predictive processing that are based on a comparison of sensory input with a generative model of the environment. Our aim is to discuss the physiological evidence that has convinced us that the predictive processing framework is more consistent with the data than the representational framework (see, e.g.,andfor discussions of the representational framework).

In parallel, the ideas of Helmholtz would resurface in the work of Erich von Holst, Horst Mittelstaedt (), and Roger Sperry (). They were unsatisfied with an account of perception driven bottom-up by sensory input because it failed to explain how animals distinguish self-generated sensory feedback from externally generated input. One prominent example they used to illustrate that the brain must be able to make this distinction is the fact that the optokinetic reflex does not prevent self-motion of the eye. During passive viewing, full-field visual flow results in a movement of the eye that stabilizes the image on the retina; this is called the optokinetic reflex. If the animal could not distinguish between self-generated and externally generated visual input, then the optokinetic reflex would prevent any active movement of the eye. The argument is that the visual flow resulting from an eye movement would trigger the optokinetic reflex just as visual flow during passive viewing does and thus would result in a reflexive eye movement that counteracts the original eye movement. They concluded that one simple strategy to solve this problem of distinguishing self-generated sensory feedback from externally generated input in general would be to cancel the predictable consequences of self-generated sensory feedback using an efference copy of a motor command. This requires that the brain has a mechanism to transform the efference copy of the motor command into the sensory coordinate system to cancel the reafferent sensory feedback. This transformed version of the efference copy is often referred to as a corollary discharge. Conceputally, such transformations, or internal models, are equivalent to a simulation of the external world and function to make predictions of sensory input. Kenneth Craik formulated this idea in the early 1940s as: “My hypothesis then is that thought models, or parallels, reality—that its essential feature is not ‘the mind’, ‘the self’, ‘sense-data’, nor propositions but symbolism, and that this symbolism is largely of the same kind as that which is familiar to us in mechanical devices which aid thought and calculation” ().

How does the brain distinguish between self-generated and externally generated sensory input? This was the basis of a disagreement between Hermann von Helmholtz and Charles Sherrington over a century ago. The echoes of this exchange enrich our pursuit of understanding the function of the neocortex to this day. Hermann von Helmholtz speculated that the absence of motion perception during eye movements is the result of an efference copy signal that cancels the visual feedback arising from self-generated eye movements (). He argued that when pushing gently on one’s eye, this cancellation does not occur, and we perceive a moving world. Less well known perhaps is the case of a patient with a unilateral traumatic lesion of the lateral rectus muscle that moves the eye temporally. When the patient would close the unaffected eye and attempt to initiate a movement of the affected eye temporally, he would report seeing the world rapidly moving in the direction of intended eye movement (). Thus, the motor command to move the eye could drive perception in absence of any change in visual input. Based on these observations Helmholtz speculated that the brain must have an internal model of the sensory consequences of self-generated movements. He called this the “sense of innervation.” Four decades later, Charles Sherrington revisited these ideas and argued that we have a sensory system in the musculature—the “muscular sense” (proprioception)—that provides direct sensory evidence of the position of our muscles. Based on this, he concluded that a sense of innervation would be an unnecessary assumption (). Sherrington’s reliance on bottom-up-driven sensory computations would extend to one of his most influential concepts—the receptive field—paving the way for a view of the brain that is driven to move by its sensorium. Sherrington’s views of a nervous system built upon sensory-driven receptive fields would flourish over the next several decades. Describing the responses of ganglion cells in the frog’s retina to small black spots, Horace Barlow argued that it is hard to avoid the conclusion that these neurons function as fly detectors (). Born was the concept of the feature detector, the postulate that the activity of neurons in sensory pathways is driven primarily by feed-forward sensory input and represents the presence of a feature or an object in the environment. The effects of this revolutionary idea are still apparent in most of our thinking of brain function. With the discovery of the simple cells in cat primary visual cortex (), the feature detector rapidly became the dominant narrative for our thinking about cortical function (). This concept has been a guiding principle for scientific inquiry; it is apparent not only in the concept of receptive fields of neurons in visual cortex, but also in place cells (), grid cells (), face cells (), and concept cells (). Once sensory systems of the brain have extracted an invariant representation from the sensory input, a separate part of the brain is then tasked with deciding and acting upon that representation. Following David Marr, we will call this the representational framework for describing the function of neocortex ().

For simplicity, we have ignored multiplicative gains of response magnitude, which can be incorporated in both frameworks. The reason the two response types are hard to distinguish is that experimentalists have some control over the bottom-up input—at least in sensory areas of the brain—but have only poor control of the top-down input or predictions generated on a moment-by-moment basis. If experiments are performed by averaging data over many trials, for each of which the top-down input may vary, or experiments are performed under conditions in which top-down input is altered or gated off (e.g., by anesthesia), P reduces to a constant and (2) can be written in the form of (1). With the prediction error driven just by the stimulus, the internal representation will be updated by bottom-up input and will look like the one postulated by the representation framework ( Figure 2 ). Under these conditions, both internal representation neurons and positive prediction-error neurons will have responses identical to the ones predicted by the representational framework. Thus, to design experiments that could distinguish between the two frameworks, experimentalists must be able to control or measure the prediction. Typically, this is not possible, and instead a proxy is used for the animal’s predictions. In the context of sensory processing, self-generated motion is one possible proxy for a prediction of the resulting visual feedback (e.g., optic flow). This assumes that animals learn how sensory feedback couples to movement with experience. In a first approximation, the representational framework predicts that neuronal responses will not differ in conditions when the stimulus is externally generated versus when it is the consequence of self-motion. The predictive processing framework instead postulates that responses in a subset of neurons, the prediction-error neurons, signal a deviation, or a mismatch, between predicted and actual sensory input. Based on this argument, much of the experimental focus in the effort to test the hypothesis of predictive processing in cortex was on prediction-error responses. In the next section, we will summarize the evidence for cortical responses that are consistent with predictive processing.

This function can be arbitrarily complex and, in the case of visual or auditory receptive fields, is in the form of a convolution with a receptive field. The predictive processing framework differs to this in that, in addition to internal representation neurons, it postulates the existence of prediction-error neurons. The response of prediction-error neurons is the difference between a function V that depends on the bottom-up input and a function P that depends on the top-down input—or, more specifically, the prediction of the bottom-up input V.

When evaluating evidence that may distinguish the two alternative descriptions of cortical function, it is worth noting that the predictive processing framework is an extension of the representational framework. To illustrate this, we will make a few simplifying assumptions. In the representational framework, the response R of a neuron can be modeled as a function V of the bottom-up input.

Cerebral cortex is a network of interconnected areas that are distinguishable by their connections to the sensory input and motor output streams and by their connections to each other. We refer to the part of cortex that is the principal target of the afferents from primary sensory thalamus as primary sensory cortex. By virtue of its connectivity to the periphery, each cortical area has a unique basis for the representation of body and environment. We refer to this basis of representation as the area’s coordinate system. The coordinate system of visual cortex, for example, appears to be built on Gabor filters of the visual input (such as the receptive field of simple cells), and that of auditory cortex is built on spectro-temporal filters of the auditory input. In motor cortex, the coordinate system is built on motor commands, and in inferotemporal cortex, possibly on objects or concepts (). Each coordinate system only spans part of the total space of all sensory input and motor output. The transformation from one coordinate system to another is referred to as an internal model. For instance, given a current motor state and visual input, an efference copy of a motor command can be transformed to a prediction of the corresponding consequences in visual input. The motor command for an eye movement to the left can be transformed to the corresponding shift of the visual image to the right. The transformation from a motor coordinate system to a sensory coordinate system is referred to as a forward model, while a transformation from a sensory coordinate system to a motor coordinate system is referred to as an inverse model () ( Figure 1 C). More generally, any communication between two cortical areas will require a transformation that describes how activity in the source area relates to activity in the target area. If such a transformation between two areas exists, activity in one area can serve as a prediction of bottom-up input in the other area. Although cortical processing can be hierarchical, especially in the vicinity of primary sensory areas, cortex as a whole is likely not arranged as a hierarchy (). Given that there are systematic correlations between auditory and visual inputs, for example, activity in an auditory area could serve as a prediction of bottom-up input in a visual area and vice versa. Thus, predictive processing does not have to follow a strict hierarchical arrangement of inter-areal connections ( Figure 1 D). Interestingly, a model that has been proposed recently as an alternative to hierarchical predictive processing is a variant of a predictive processing architecture in which the flow of signals is reversed, predictions are sent up the hierarchy, and errors are sent down the hierarchy (). In the absence of a strict hierarchy, the communication between areas would always entail the exchange of predictions and errors in both directions.

A common assumption is that predictive processing is advantageous because it is efficient; fewer spikes are necessary because only prediction errors are transmitted up the hierarchy. While prediction-error signals are sparser when input is predictable, for every bottom-up spike cancelled there needs to be a spike in a top-down prediction. In a first approximation, this means that the total number of spikes (bottom-up and top-down) remains unchanged. Hence, although there are circumstances under which predictive processing can be more efficient, in cortex this is likely not the case if efficiency is measured as the number of spikes per bit of information transmitted. We propose that the main advantage of predictive processing is that the internal representation is updated by a combination of bottom-up and top-down input and can thus be modified in absence of bottom-up input. This would provide a framework to simulate and predict the environment.

In the predictive processing framework, predictions that arrive in a target area are based on an internal representation in the source area. To illustrate this, assume two hypothetical visual areas: one coding for geometric shapes and the other for edges. If the internal representation of a triangle is active in the geometric shape area, it will send a prediction of three edges to the edge area. Prediction-error neurons will be activated only if the bottom-up input does not match the top-down prediction. In absence of prediction errors, the internal representation for edges in the edge area and the internal representation for the triangle in the geometric shape area will remain active. These internal representations (of the triangle in the geometric shape area and edges in the edge area) are equivalent to those postulated by the representation framework. The key difference lies in how the internal representations are updated: in the representation framework through feature detectors and bottom-up drive and in predictive processing through a comparison between bottom-up input and top-down predictions based on an internal representation.

(B) Schematic responses of positive and negative prediction-error neurons and internal representation neurons. Assuming the bottom-up input (S) to the circuit increases unexpectedly, positive prediction-error neurons will fire, activating both the internal representation neurons and the top-down prediction (P) from a higher area. This in turn will inhibit the positive prediction-error neuron. If the bottom-up input decreases again, negative prediction-error neurons will be activated and inhibit both internal representation neurons and top-down predictions. Responses of all three neuron types should be influenced by separate gating signals that modulate response amplitude.

(A) Positive prediction errors are computed in Type 1 neurons (Sensory Input - Prediction [S-P]), while negative prediction errors are computed by Type 2 neurons (Prediction – Sensory Input [P-S]). Triangles represent excitatory neurons, while circles represent inhibitory neurons. Note this schematic assumes hierarchical processing. In the case of a non-hierarchical communication between two areas, both areas will send and receive both top-down-like and bottom-up-like signals.

Prediction errors may come in two flavors. The bottom-up input can be stronger than predicted (for example, when an unpredicted stimulus appears) or it can be weaker than predicted (for example, when the expected stimulus does not appear or a stimulus disappears). In theory, a bidirectional change could be signaled by one neuron that has a sufficiently high basal firing rate. Increases in activity could signal more input than predicted, while a decrease could signal less input than predicted. Such a bidirectional modulation of a prediction-error signal has been observed in the dopaminergic system (). In the neocortex, and particularly in layer 2/3, the baseline firing rates of principal neurons are much lower () and bidirectional modulation of activity is less plausible. In agreement with previous suggestions (), we think it is more likely that the error computation is carried out by two separate prediction-error circuits: one to signal more and one to signal less input than predicted ( Figure 2 ). We will refer to these two types of prediction error as positive prediction error and negative prediction error.

(D) Predictive processing does not need to follow a strict hierarchy. In the communication between two areas, both predictions and prediction errors can be sent in both directions.

(C) To predict the sensory consequences of self-generated movement, motor areas provide an efference copy of the motor command to sensory areas. The transformation from the motor coordinate system to the sensory coordinate system is referred to as a forward model (e.g., what do I hear when I speak). The transformed efference copy that can be directly compared to sensory signals is referred to as a corollary discharge. The transformation from the sensory coordinates to motor coordinates is referred to as an inverse model (e.g., what are the muscles I need to activate to reproduce a sound I just heard).

(B) In a hierarchical predictive processing framework, internal representations are updated based on a comparison of a top-down prediction and bottom-up input. Prediction errors are sent forward in the hierarchy, while predictions are sent backward. The coordinate transformations between the different areas are the internal models (M).

At the core of all predictive processing theories is the idea that the brain develops a generative model of the world that it uses to predict sensory input (). The comparison of predicted and actual sensory input then updates an internal representation of the world. This process is often described as a processing hierarchy. A brain area at a higher level of the hierarchy sends a top-down signal to an area at lower level in the form of a prediction of the bottom-up input to that area. Predictions are compared to bottom-up input to compute the difference between the two ( Figures 1 A and 1B ). This requires at least two functional classes of neurons: an internal representation neuron and a comparator or prediction-error neuron. Internal representation neurons project downward in the neural hierarchy and encode predictions about the bottom-up input. Prediction-error neurons project upward in the hierarchy and encode a difference between prediction and bottom-up input. Thus, in the lowest level of the hierarchy the bottom-up input is the sensory input, while in higher levels of the hierarchy it is the prediction errors from lower levels. When the bottom-up information matches the information carried by internal representation neurons, the responses in prediction-error neurons decrease. In sensory cortex, both internal representation neurons and prediction-error neurons are expected to be selective for specific stimulus features.

The source of such a modulating or gating signal is not always clear. Attentional gain modulation in visual cortex has been speculated to be driven by long-range cortical input (), input from higher-order thalamus (), or neuromodulatory inputs (). Neuromodulatory input can not only gate plasticity (), but also change the balance of top-down versus bottom-up influence (). Specifically, the neuromodulatory tone may shift the relative contribution of bottom-up and top-down signals such that the influence of prediction errors can be modulated according to the internal state of the animal, which would determine the extent to which bottom-up inputs are used to update the internal model. It remains to be seen how different modulatory signals are combined to alter the sensitivity by which cortical circuits prioritize and respond to sensory information or report prediction errors. A more complete understanding of these modulation mechanisms requires further exploration and may be key to understanding cortical dysfunction.

In sensory cortex, responses can be modulated and given precedence depending on the context in which the stimulus is perceived. This implies that predictions and prediction errors may be modulated in a context-dependent manner. Conceptually, a dynamic modulation of the influence of top-down and bottom-up input is consistent with an attentional modulation of sensory input (). Direct evidence for a modulation of prediction and prediction-error signals comes from a variety of experiments. Experience-dependent predictive responses in visual cortex, for example, are only apparent under quiet wakefulness, but not if the animal is active (). Similarly, adaptation of sensory responses depends on context and the task-relevance of sensory input (). In sensorimotor learning, prediction errors during movement are thought to correct the motor program. Here, the requirement for a context-dependent gating of prediction errors stems from the fact that prediction errors that occur during passive observation should not interfere with the motor program. This requires an error signal that can gate plasticity, whose magnitude can be adjusted in a context-dependent manner.

We assume that perception is linked to the internal representation of the world and that we only perceive a stimulus if the internal representation for that stimulus is active. This internal representation is what predictions are based on. During a given percept, internal representation neurons, likely distributed across several associated areas, are active. If neurons maintaining the internal representation are the basis for predictions of bottom-up input impinging on the same or other cortical areas, they should exhibit a set of functional and connectional features. First, an internal representation requires a circuit mechanism that maintains the activity in a population of neurons for the time a stimulus is perceived. A number of plausible mechanisms have been proposed for the persistence of neural activity, including strong and selective recurrent excitation between coactive neuronal assemblies within the cortex () or via thalamocortical loops (). Moreover, internal representations may not require stable patterns of activity, but could be maintained using dynamic attractors. In either case, neurons representing the internal model are expected to exhibit more sustained and dense activity than neurons that function as comparators. Second, internal representation neurons should make connections within the area they reside as well as provide top-down input to lower areas within the same sensory modality and/or project to associated cortical areas dedicated to other modalities. Finally, as internal representations need to be updated by prediction errors, the neurons encoding the internal representation should be densely connected with the comparator circuit encoding the same feature. Interestingly, these functional and anatomical characteristics are hallmarks of a subset of cortical neurons prevalent in deeper layers (). However, how internal representations are maintained in the cortical circuit and how they may be used to generate top-down predictions is still unclear.

A key assumption of the predictive processing framework is that internal models are learned and that experience shapes the circuits required for generating predictions and computing prediction errors. While evolution has generated a template of reproducible long-range projections linking cortical areas, often reciprocally, it is the interaction with the world that refines these connections to generate internal models. Sensory experience sculpts the connectivity between neurons in an activity-dependent manner, such that nearby cortical neurons with similar responses (i.e., those that fire together) can preferentially link up into synaptically connected subnetworks with strong recurrent excitation (). We suggest that a similar principle may apply to the establishment of long-range networks across cortical areas, whereby a history of correlated firing determines which neurons become associated. In the context of predictive processing, this would apply equally to sculpting the bottom-up and top-down connectivity between internal representation neurons encoding components of the same object as well as between prediction-error neurons and internal representation neurons within and across areas. In visual cortex of rodents, predictive responses emerge in an experience-dependent way (). Through passive sensory experience, visual cortex responses become predictive of upcoming visual stimuli (). Through experience of visuomotor coupling, predictions of visual flow are learned (), and through experience in a spatial environment, responses emerge that are predictive of the visual input at a given spatial location (). These predictive responses is sensory areas may thus be driven by long-range inputs whose influence is shaped by experience.

Locomotion is just one possible proxy for a prediction of visual input, and other signals, like spatial location, could serve a similar function. Consistent with this, neurons in layer 2/3 of V1 respond robustly to the omission of a stimulus the mouse expects to see at a certain location in a virtual environment (). In principle, any signal that explains some of the variance in the visual input can serve as a prediction of visual feedback. Vestibular or eye movement signals could serve as predictions of full-field visual flow. In a learned coupling between two sensory stimuli—e.g., a sound and a visual input—one can serve as a prediction of the other. It is therefore plausible that long-range cortical communication conveys specific predictions of input to the target areas that are associated by experience with signals in the source area (). Consistent with this view, specific signals related to self-generated movement, head direction, animal’s spatial location, and stimulus timing have been observed across several sensory areas (). These diverse sources of contextual input may thus provide predictions required for computation of prediction errors and for updating internal representations based on information from a given sensory modality.

With the discovery of strong motor-related signals in primary visual cortex in the complete absence of visual input () came further evidence that a representational framework could explain only a fraction of the responses, even in primary sensory areas. Modulation of visual responses by locomotion or arousal () is thought to be the consequence of neuromodulatory inputs (), which exert context-dependent influence on responses in visual cortex (). However, modulatory inputs alone cannot account for motor-related signals in visual cortex in the absence of visual input. A driving motor-related prediction of visual input, however, could account for these non-visual signals. We have recently argued that in visual cortex, one source of the prediction of visual input given movement is the anterior cingulate cortex (). Activity in axons of anterior cingulate neurons in visual cortex conveys an experience-dependent prediction of visual flow (rather than copies of motor commands) as a function of the turning of the mouse in a virtual environment. Importantly, we found that this motor-related input is shaped by the coupling between movement and visual feedback the mouse has experienced previously.

Finally, we suggest that negative and positive prediction-error neurons exert opposite effects on their targets. Negative prediction errors should act mainly by engaging bottom-up inhibition in their target areas, thus suppressing the current internal representation. Conversely, positive prediction-error neurons provide bottom-up excitation to target areas, thus activating a new cohort of neurons. The combined effect of positive and negative prediction-error neurons is to update the internal representation that best approximates, or predicts, the current environment.

Observing prediction-error signals in the neocortex does not prove they are computed therein. However, if cortical circuits do implement predictive processing, this requires at least three components: a comparator circuit that computes the prediction error between bottom-up input and predictions, a circuit to maintain an internal representation that gives rise to predictions, and a modulating or gating signal that sets the precision or weight of the prediction error. The circuit elements required to generate prediction errors are present in each module of the neocortex. Cortical areas receive bottom-up input from the thalamus or other cortical areas as well as extensive top-down inputs from many nearby and distal cortical areas and higher-order thalamic nuclei (), consistent with predictions from multiple modalities. The top-down inputs can be very dense—as, for instance, the top-down input from anterior cingulate cortex to V1 ()—and target both excitatory and inhibitory neurons in layer 2/3 monosynaptically (). The comparator circuits that generate negative and positive prediction errors require differential wiring of bottom-up and top-down inputs onto subsets of excitatory and inhibitory neurons. Negative prediction-error neurons will respond when top-down excitation exceeds bottom-up inhibition (whereby increasing strength or saliency in predictions should result in increasing strength of mismatch). It follows that subsets of inhibitory neurons are mainly bottom-up driven, either directly or via local excitatory relays, and that these provide input preferentially to negative prediction-error neurons. In layer 2/3 of visual cortex, a subset of somatostatin-expressing interneurons are thought to provide visually driven inhibition to negative prediction-error neurons (). Conversely, positive prediction-error neurons will respond when bottom-up excitation exceeds top-down inhibition. Accordingly, a different set of interneurons is expected to be driven more strongly by top-down input and provide inhibition to positive prediction-error neurons. This form of top-down inhibition is a frequent circuit motif in cortex ().

In the predictive processing framework, prediction-error signals in sensory cortices are expected to be feature-specific and not simply the result of a surprise response. That is, they should signal the type of deviation from prediction and not simply the fact that there was a deviation. Accordingly, responses of mismatch neurons in layer 2/3 of mouse V1 were found to signal deviations between predicted and actual visual flow in spatially confined areas of the visual field (). These mismatch signals parallel visual signals in magnitude, spatial resolution and retinotopic organization, suggesting that mismatch signals are computed based on local visual cues and that visual and mismatch signals are separate aspects of the same computation.

If increasing stimulus predictability results in a response reduction, a violation of a strong prediction should trigger a response increase. Evidence in support comes from the discovery of prediction-error signals in primary sensory areas of cortex, where responses were quantified to unexpected changes in the coupling between self-generated movements and sensory feedback. Using manipulations of visual feedback from hand movements, work in humans found a selective activation of primary visual cortex to incongruences between hand movements and visual feedback that could not be explained by the visual input alone (). Manipulating auditory feedback of self-generated vocalizations in marmosets revealed responses in primary auditory cortex that were selective to deviations between expected and actual auditory feedback (). Similar observations were made in primary auditory pallium of the songbird (). These responses could not be explained by the change in sensory input, as they were only apparent during manipulations of self-generated feedback and not when the animal was passively observing or hearing the same stimulus. However, in all of these experiments, the responses were triggered by an unexpected change to sensory feedback in the form of an additional stimulus that differed from the one expected. The key signal that is more difficult to explain in a representation framework is a response to the absence of a predicted sensory input or a negative prediction error. Such signals have been found in layer 2/3 of primary visual cortex (V1) of the mouse, where a subset of neurons responds selectively to the absence of expected visual flow () or the absence of an expected visual stimulus (). We have referred to this type of negative prediction error as a mismatch response. Although mismatch responses also exist in layer 5 neurons, they are likely more prevalent in layer 2/3 ().

Similarly, there may be top-down inhibition of the sensory consequences of self-generated movement. There is evidence for this in auditory cortex, where responses are generally supressed during self-generated locomotion via top-down projection that recruits local inhibition (). Consistent with the idea that the effect of these top-down predictions can be modulated in a context-dependent manner, certain forms of response adaptation in visual cortex have been shown to be dependent on the task relevance of the stimulus ().

In neocortex, early evidence for the predictive processing framework did not arise with new data, but from the demonstration that classical visual phenomena, like end-stopping (), can be explained as a prediction error (). One central idea here was that the suppression of the response that appears when a stimulus extends into the surround of the classical receptive field is the consequence of top-down inhibition. In this way, the stimulus in a given location acts as a prediction of the stimulus in the neighboring region. This prediction, relayed via activation of a higher-level representation, inhibits responses of neurons with receptive fields in neighboring parts of the visual field to the same stimulus. In layer 2/3 of mouse visual cortex, somatostatin-positive interneurons, likely driven by lateral projections from neighboring cortical neurons, have been shown to have a causal role in surround suppression (). The idea of a top-down prediction that acts to inhibit bottom-up input was later used to demonstrate that a large variety of classical visual receptive field properties can be explained in a predictive processing framework (). This type of comparison is consistent with a positive prediction error: the top-down prediction acts to inhibit the predictable bottom-up input. A top-down prediction that functions to inhibit bottom-up input should result in a response decrease when stimuli become predictable. This is indeed the case when stimuli become predictable, either as the result of a learned association with a preceding stimulus () or after frequent presentation of the same stimulus, in which case the suppression is often described as sensory adaptation (). Another simple form of increased predictability of a stimulus is prolonged presentation of the same stimulus, during which sensory responses typically decrease in magnitude. This form of sensory adaptation occurs at many levels in the sensory processing hierarchy, but certain forms, like contrast adaptation in visual cortex, are thought to be, at least in part, cortical in origin (). Although adaptation would be consistent with top-down inhibition, early experiments studying mechanisms of contrast adaptation using intracellular recordings in visual cortex of anesthetized animals found no evidence of inhibition contributing to contrast adaptation (). More recently, it was found that levels of inhibition increase with stimulus duration () and are selectively suppressed by anesthesia (). Consistent with a strong top-down influence, contrast adaptation has been shown to depend on the behavioral relevance of a stimulus (). Hence, it is possible that certain forms of sensory adaptation in the awake animal are driven by top-down inhibition.

The idea that our perception of the world is an active and constructive process has an intuitive appeal to explain much of our everyday experience of the world. Our predictions frequently interfere with what we perceive. Our voice sounds eerily different when we hear it in a recording, and we perceive our own singing to be much closer to pitch than it actually is. In visual illusions, we see color where there is none, simply because we know objects rarely change color (), or miss things that happen right in front of our eyes (). In these cases, what we expect to hear or see interferes with, and even supersedes, what we actually hear and see. We refer to the conditions in which we can prove that our predictions interfere with perception as sensory illusions. Given our frequent disagreements with others over the attributes of objects we see, or over what we hear, it is probably appropriate to describe perception as a controlled hallucination (). This tainting of our access to reality must come at some advantage. One advantage of having an internal model of the world is to allow us to predict the future. We cannot only anticipate the sensory consequences of our own movements, but also physical attributes or dynamics of objects and other agents in the world. You can look at a photograph of a football player - one leg on the ground in front of the player, the other retracted behind the ball – and know instantly and without deliberation what will happen next. Often such predictions are not trivial and depend on detailed knowledge about the physical properties of the objects we are looking at, our model of the intentions or actions of the agents, and their context in the particular moment. These examples, and many others like it, give intuitive support for the idea that the brain is a predictive processing machine. In this section, we highlight the physiological evidence for predictive processing in neural circuits of the sensory neocortex.

So, what could the implications be of describing brain function in terms of an internal representation of the world that is updated through comparison with incoming sensory information? Of course, we do not have a definitive answer to this question, but what we will attempt to do in this section is to explain where we see promise of predictive processing. First, temporarily decoupling the internal representation from sensory input would allow one to run the model as a simulation. In this way, one could simulate the consequences of one’s actions without having to perform them. This is likely what we refer to as thinking. Second, we would postulate that perception is based on a finely tuned process that continuously balances internal predictions against bottom-up signals to update an internal representation. If this process is imbalanced such that the internal representation is driven too strongly by top-down predictions, one might perceive things that are not there, or interpret intention into action of others where there is none. This would likely resemble positive symptoms of schizophrenia, as has been argued previously (). Conversely, if the effect of top-down predictions were too weak and the internal representation were dominated by bottom-up sensory input, one might be unable to adequately predict sensory input or understand intentions of others. Assuming the brain lacks the ability to generate an internal model with sufficient predictive capacity, a simple behavioral strategy would be to engage in stereotyped repetitive behavior that makes the input more predictable. A dysfunction in the brain’s ability to make accurate predictions has been proposed as one of the attributes of autism (). Based on this, one could speculate that schizophrenia and autism are opposite ends of the same circuit imbalance in which the internal representation of the world is either driven too strongly or too weakly by predictions. We speculate that alterations in predictive processing circuits may be common to both disorders. Anti-NMDA receptor encephalitis, for example, in which NMDA receptors are targeted by the immune system, results in symptoms that resemble those of schizophrenia when adults are affected, while it results in symptoms that resemble those of autism when children are affected (). In addition, there is a common gene expression network that is dysregulated in the two conditions (), possibly in opposite directions (). Thus, the absence of key molecular regulators of synaptic plasticity (e.g., glutamate receptors) may lead to a failed experience-dependent adjustment of the connections in circuits that maintain an internal representation of the world through a comparison with incoming sensory input. In turn, this may cause aberrations in predictive processing and altered cortical function in these neurodevelopmental disorders.

The immediate appeal of predictive processing is that it could be a basic computational primitive implemented in different variants throughout the brain. Evidence consistent with predictive processing has been found in a variety of different brain regions. The function of the dopaminergic system has been described in terms of reward prediction errors (). Many of the models of cerebellar function are based on the concepts of internal models and prediction errors (). Cerebellum-dependent sensorimotor learning is thought to be driven by sensory prediction errors computed as a comparison between intended and actual sensory feedback (). Similarly, certain forms of cortex-dependent sensorimotor learning are thought to be driven by performance errors (). In vocal learning, these performance errors have been suggested to be computed based on a comparison of intended and actual sensory feedback ().

The Experiments That Need to Be Done

In this section, we outline experiments that may test, refine, or reject the model of predictive processing in the cerebral cortex using currently available technologies.

Bastos et al., 2012 Bastos A.M.

Usrey W.M.

Adams R.A.

Mangun G.R.

Fries P.

Friston K.J. Canonical microcircuits for predictive coding. Felleman and Van Essen, 1991 Felleman D.J.

Van Essen D.C. Distributed hierarchical processing in the primate cerebral cortex. Markov et al., 2014 Markov N.T.

Vezoli J.

Chameau P.

Falchier A.

Quilodran R.

Huissoud C.

Lamy C.

Misery P.

Giroud P.

Ullman S.

et al. Anatomy of hierarchy: feedforward and feedback pathways in macaque visual cortex. Packer et al., 2015 Packer A.M.

Russell L.E.

Dalgleish H.W.P.

Häusser M. Simultaneous all-optical manipulation and recording of neural circuit activity with cellular resolution in vivo. (1) One of the core postulates is that there are neurons in each area of cortex that maintain an internal representation of the world in a local coordinate system. To the best of our knowledge, there has been no clear demonstration of the existence of such neurons. The problem with identifying such neurons is that they will exhibit responses that appear driven by a bottom-up input in many conditions. However, there are a few functional and anatomical characteristics that might aid in identifying them. First, internal-representation neurons should comprise a class of neurons separate from the prediction-error neurons. It is possible that internal-representation neurons are intermixed with prediction-error neurons in different cortical layers or that they are enriched in deep layers of cortex (), which are the main source of top-down signals (). Second, internal-representation neurons provide input to both local prediction-error neurons and, either directly or indirectly, give rise to projections, which convey predictions to other cortical areas. Third, within a cortical area, the current internal representation should function like a prediction (the current scene is a decent predictor of future scenes). Therefore, local internal-representation neurons should interact with prediction-error neurons in the same way top-down predictions do. Negative prediction-error neurons should be net excited by internal representation neurons, while positive prediction-error neurons should be net inhibited. Fourth, activity in prediction-error neurons should update the local internal representation. Positive prediction-error neurons, which report more bottom-up input than expected, should net activate the corresponding local internal-representation neurons. Conversely, negative prediction-error neurons, which report less input than expected, should net inhibit the corresponding internal-representation neurons. Given that we do not have a genetic handle on the different functional neuronal classes, experiments would need to rely on the possibility that there is a predominance of one or the other neuron type in different cortical layers (for example, a preponderance of prediction-error neurons in layer 2/3 and of internal representation neurons in layer 5). In this way, one could test the influence of activation of a subset of putative internal representation neurons, either in a given cortical layer or through targeted photostimulation (), on functionally identified prediction-error neurons in layer 2/3.

Douglas et al., 1989 Douglas R.J.

Martin K.C.

Whitteridge D. A Canonical Microcircuit for Neocortex. Attinger et al., 2017 Attinger A.

Wang B.

Keller G.B. Visuomotor Coupling Shapes the Functional Development of Mouse Visual Cortex. Zmarz and Keller, 2016 Zmarz P.

Keller G.B. Mismatch Receptive Fields in Mouse Visual Cortex. Fiser et al., 2016 Fiser A.

Mahringer D.

Oyibo H.K.

Petersen A.V.

Leinweber M.

Keller G.B. Experience-dependent spatial expectations in mouse visual cortex. (2) Assuming that cortex is built based on a canonical circuit motif (), we should find prediction-error neurons for every instance of a behaviorally meaningful correlation of activity across any two cortical areas. To illustrate this, let us take the interaction between motor areas and visual areas. Every movement that is coupled to a predictable change in visual input (whole-body translation; eye, head, limb, or whisker movements; etc.) should have a corresponding set of prediction-error neurons in a sensory cortex. We think that a subset of layer 2/3 neurons in mouse V1 functions to compute prediction errors between whole-body translation and visual input (). Similar predictions of sensory input may be based on spatial location, head direction, or sensory input in other modalities. Consistent with this, a subset of neurons in layer 2/3 of mouse V1 signal a prediction error between a prediction of visual feedback based on spatial location and visual input (). Similar prediction-error neurons may exist for predictions based on auditory input, vestibular input, etc.

Heindorf et al., 2018 Heindorf M.

Arber S.

Keller G.B. Mouse Motor Cortex Coordinates the Behavioral Response to Unpredicted Sensory Feedback. Inoue et al., 2016 Inoue M.

Uchimura M.

Kitazawa S. Error signals in motor cortices drive adaptation in reaching. (3) Most work on cortical circuits for predictive processing has focused on primary sensory areas. The advantage of examining in a primary sensory area is that there is some experimental control over bottom-up inputs. Assuming predictive processing describes a canonical cortical computation, we should find similar prediction-error signals in other cortical areas. These prediction-error signals would be encoded in the same coordinate system as the bottom-up input to the area. By this we mean that there should be neurons in prefrontal cortex that signal deviations in a conceptual rule the animal has learned or a deviation in social patterns the animal expects to encounter in its conspecifics. There is some evidence that a subset of neurons in motor cortex signals a deviation between intended and actual motor state, given proprioceptive or other sensory feedback ().

Attinger et al., 2017 Attinger A.

Wang B.

Keller G.B. Visuomotor Coupling Shapes the Functional Development of Mouse Visual Cortex. Fiser et al., 2016 Fiser A.

Mahringer D.

Oyibo H.K.

Petersen A.V.

Leinweber M.

Keller G.B. Experience-dependent spatial expectations in mouse visual cortex. Attinger et al., 2017 Attinger A.

Wang B.

Keller G.B. Visuomotor Coupling Shapes the Functional Development of Mouse Visual Cortex. Sawtell et al., 2003 Sawtell N.B.

Frenkel M.Y.

Philpot B.D.

Nakazawa K.

Tonegawa S.

Bear M.F. NMDA receptor-dependent ocular dominance plasticity in adult visual cortex. (4) In neocortex, it is likely that the prediction-error circuits are shaped by experience, as they are in primary visual cortex for sensorimotor and spatial predictions (). What is still unclear is which synapses in this circuit undergo experience-dependent plasticity. In the case of the negative prediction-error circuit in layer 2/3 of mouse V1, we can constrain the site of plasticity to some extent. The activity in the somatostatin-positive inhibitory interneurons, which mediate the bottom-up inhibition, is not dependent on sensorimotor experience (). Hence, experience-dependent plasticity must modify at least one of the other connections in the circuit. This could be the synapse from the inhibitory neuron onto the prediction-error neuron, or the one from the top-down predictive input onto the prediction-error neuron. Identifying the site of plasticity could be achieved by preventing experience-dependent plasticity in specific cell types during sensorimotor learning ().

Fu et al., 2014 Fu Y.

Tucciarone J.M.

Espinosa J.S.

Sheng N.

Darcy D.P.

Nicoll R.A.

Huang Z.J.

Stryker M.P. A cortical circuit for gain control by behavioral state. Polack et al., 2013 Polack P.-O.

Friedman J.

Golshani P. Cellular mechanisms of brain state-dependent gain modulation in visual cortex. Purushothaman et al., 2012 Purushothaman G.

Marion R.

Li K.

Casagrande V.A. Gating and control of primary visual cortex by pulvinar. Wimmer et al., 2015 Wimmer R.D.

Schmitt L.I.

Davidson T.J.

Nakajima M.

Deisseroth K.

Halassa M.M. Thalamic control of sensory selection in divided attention. Pinto et al., 2013 Pinto L.

Goard M.J.

Estandian D.

Xu M.

Kwan A.C.

Lee S.H.

Harrison T.C.

Feng G.

Dan Y. Fast modulation of visual perception by basal forebrain cholinergic neurons. (5) To explain the phenomenon of attention and the fact that passive experience does not modify the motor program during sensorimotor learning, we predict the existence of a modulating signal that can attenuate or amplify prediction-error signals. In vocal learning, for example, listening to conspecific vocalizations should not generate prediction errors that update the motor program for vocalization. Hence, prediction-error signals have to be gated on only during times of self-vocalization. A similar modulation mechanism is necessary to explain attention-related phenomena. There may be at least three defining characteristics of such a signal. First, this input should selectively alter the coupling between prediction-error neurons and internal representation neurons. Second, in sensory and motor regions, the modulating signal should be correlated with movement. Third, manipulations of the modulating system should result in shifts in the balance between top-down and bottom-up inputs, and this may change the gain of responses in prediction-error neurons according to internal state. Possible sources of modulatory signals include classical neuromodulatory systems (e.g., acetylcholine, noradrenaline) or the thalamus, both of which have been shown to change the operating regime of cortical circuits ().

(6) Assuming psychosis is a state of imbalance in processing in which the internal representation is not updated by sensory feedback and thus dominated by predictions, and prediction errors are either too strong or too weak, we would expect to find a common functional signature of drugs that reduce psychosis. It is conceivable that antipsychotic drugs function by changing the balance between positive and negative prediction errors or by changing the balance between top-down and bottom-up input. Testing this hypothesis requires systematic characterization of the effects of drugs that are anti- or pro-psychotic on prediction errors, predictions, and bottom-up signals.

Predictive processing in the form we are proposing here will very likely not provide a complete description of cortical function. Hence, our intention should be to identify the limits and shortcomings of the framework in order to formulate a more complete theory. One thing is certain: we need to move away from a purely representational understanding of the cortex if we aim to make conceptual progress in this endeavor.