In this paper, we suggest that cortical anatomy recapitulates the temporal hierarchy that is inherent in the dynamics of environmental states. Many aspects of brain function can be understood in terms of a hierarchy of temporal scales at which representations of the environment evolve. The lowest level of this hierarchy corresponds to fast fluctuations associated with sensory processing, whereas the highest levels encode slow contextual changes in the environment, under which faster representations unfold. First, we describe a mathematical model that exploits the temporal structure of fast sensory input to track the slower trajectories of their underlying causes. This model of sensory encoding or perceptual inference establishes a proof of concept that slowly changing neuronal states can encode the paths or trajectories of faster sensory states. We then review empirical evidence that suggests that a temporal hierarchy is recapitulated in the macroscopic organization of the cortex. This anatomic-temporal hierarchy provides a comprehensive framework for understanding cortical function: the specific time-scale that engages a cortical area can be inferred by its location along a rostro-caudal gradient, which reflects the anatomical distance from primary sensory areas. This is most evident in the prefrontal cortex, where complex functions can be explained as operations on representations of the environment that change slowly. The framework provides predictions about, and principled constraints on, cortical structure–function relationships, which can be tested by manipulating the time-scales of sensory input.

Currently, there is no theory that explains how the large-scale organization of the human brain can be related to our environment. This is astonishing because neuroscientists generally assume that the brain represents events in our environment by decoding sensory input. Here, we propose that the brain models the entire environment as a collection of hierarchical, dynamical systems, where slower environmental changes provide the context for faster changes. We suggest that there is a simple mapping between this temporal hierarchy and the anatomical hierarchy of the brain. Our theory provides a framework for explaining a wide range of neuroscientific findings by a single principle.

Introduction

Our brains navigate our bodies, including our sensory apparatus, through a dynamically changing environment. This is a remarkable achievement, because a specific behaviour might be optimal in the short-term, but suboptimal over longer time periods. It is even more remarkable that the brain selects among different behaviours quickly and online. Causal dynamics and structure in the environment are critical for selecting behaviour, because the brain can learn this structure to predict the future, and exploit these predictions to negotiate the environment adaptively. Ontogenetically, there is good reason to believe that the brain learns regularities in the environment from exposure to sensory input and internally generated signals [1],[2]. Similarly, over evolutionary time, one can argue that selective pressure ensures the brain has the capacity to represent environmental structure [3]–[5]. In the following, we will first review the ‘free-energy principle’ [6], which suggests that ‘adaptive agents’ like the brain, in a dynamic environment, minimize their surprise about sensory input. We will then motivate the hypothesis that the environment exhibits temporal structure, which is exploited by the brain to optimise its predictions. This optimisation transcribes temporal structure in the environment into anatomical structure, lending the brain a generic form of structure-function mapping.

For an adaptive agent, surprise means sampling unexpected input given the expectations of the agent. Mathematically, surprise or improbability is quantified by −ln p(y(a)|m), where y(a) is sensory input sampled under action a and m represents the agent. Minimizing surprise depends on the agent's expectations about its sensory input and the behaviour it chooses. If these expectations (e.g., being warm but not on fire) are consistent with survival, an agent, which minimizes free-energy, will exhibit behaviour that is adapted to its environment. If an agent did not minimize surprise, it would sooner or later encounter surprising interactions with the environment, which may compromise its structural or physiological integrity (e.g., walking into a fire). Both action and perception can be understood as trying to minimize surprise about sensory input. An agent cannot minimize surprise directly because the agent does not have full knowledge about its environment [6]. However, an agent can minimize its so-called free-energy F≥−ln p(y(a)|m), which is an upper bound on surprise: if an agent minimises its free-energy, it implicitly minimises surprising sensory input.

To predict extero- and interoceptive input online, an agent must entertain dynamic expectations about its input using an internal model of environmental causes and their trajectories. These models reduce high-dimensional input to a few variables or ‘causes’ in the environment. These environmental causes do not need to be physical objects but can be any quantity that predicts the agent's past and future sensory input (we use prediction here in reference to the mapping between causes and their sensory consequences; this mapping subsumes but is more than a forecast of future events). Critically, from the point of view of an agent, its body is a part of the environment. Therefore, internal models embed an agent's knowledge about how environmental dynamics, including its own movements, generate sensory input [6]. The concept of ‘internal models’ which predict future sensory input due to the agent's own action is a key element of many related theoretical accounts: for example, the ‘corollary discharge hypothesis’ [7], predictive coding [8],[9], and motor control theory [10],[11].

In general, the sensory consequences of environmental causes are mediated by dynamical systems. This necessarily induces delays in the mapping between causes and their sensory consequences. How can an agent accommodate this temporal dislocation to explain causes after they are expressed in the sensorium [12],[13]? In this paper, we suggest that agents model sensory input using representations or ‘concepts’ that provide temporally stable predictions about future sensory input. In this paper we will use ‘concept’ to refer to a representation of an environmental cause or state that endures for about a second or more and ‘percept’ for representations that more transient. In terms of dynamical systems, concepts could be regarded as control parameters that shape the attractor or manifold on which lower-level representations unfold. This attractor provides constraints on the expected trajectories, which enable fast dynamics to be predicted by supraordinate representations that change more slowly (see Results). This rests on the assumption that the world can be modelled as a hierarchy of autonomous dynamical systems, where the output of one system controls the motion of another's states. In principle, an agent may be able to model the evolution of environmental states over milliseconds, seconds, or much longer periods of time using generative or forward models at various time-scales. For example, speech could be decomposed at various time-scales (from fast to slow): instantaneous frequency (acoustics); spectral profiles (phonemes); phoneme sequences (lexical); lexical sequences (semantics); syntactical structure (pragmatics), and so on [14].

Predictions about sensory input at fast time-scales become imprecise when projected too far into the future. One way to deal with this uncertainty is to use concepts to guide representations at shorter time-scales. If predictions of sensory input remain veridical at a fast time-scale and action ensures these predictions are fulfilled, the agent will avoid surprising input. The ensuing behaviour would be consistent with the agent's concepts. Note that an agent following this principle can still handle novel, unexpected input, although the agent might experience a large prediction error and adapt its internal model accordingly (see simulations). If the high-level representations or concepts prove correct in predicting sensory input, they confirm the validity of those concepts. Therefore, concepts can be seen as self-fulfilling prophecies, which, given a compliant environment, would appear to mediate goals, plans and long-term strategies for exchange with the world [15]. Conflict among competing explanations (i.e., concepts) for sensory data has to be resolved to avoid surprise. This conflict can be between similar time-scales; e.g. between the visual and auditory stream when experiencing the McGurk effect [16]. Conflict could also exist between different time-scales; e.g., between eating a chocolate cake or maintaining a strict diet. In robotics and motor control theory, conflict resolution among different time-scales has been addressed using hierarchical control structures [17]–[22]. These hierarchies are ordered according to the temporal scales of representations, where the slowest time-scale is at the top (c.f., ‘slow feature analysis’ [23],[24]). A hierarchical model enables a selection of predictions that is accountable to all time-scales, such that concepts and percepts are nested and internally consistent.

The novel contribution of this paper is to consider hierarchical models, in which high-level states change more slowly than low-level states, and to relate these models to structure-function relationships in the brain. The basic idea is that temporal hierarchies in the environment are transcribed into anatomical hierarchies in the brain; high-level cortical areas encode slowly changing contextual states of the world, while low-level areas encode fast trajectories. We will present two arguments in support of this hypothesis. First, using simulations, we will demonstrate that hierarchical dependencies among dynamics in the environment can be exploited to recognise the causes of sensory input. The ensuing recognition models have a hierarchical structure that is reminiscent of cortical hierarchies in the brain. Second, we will consider neuroscientific evidence that suggests the cortical organisation recapitulates hierarchical dependencies among environmental dynamics.

Note that this paper is not about hierarchies of neuronal dynamics; see e.g. [25]–[27]. Rather, we consider neuronal dynamics under hierarchical models of the environment, which, according to the principles outline above, should be represented in the brain to predict sensory input.