It is generally thought that skilled behavior in human beings results from a functional hierarchy of the motor control system, within which reusable motor primitives are flexibly integrated into various sensori-motor sequence patterns. The underlying neural mechanisms governing the way in which continuous sensori-motor flows are segmented into primitives and the way in which series of primitives are integrated into various behavior sequences have, however, not yet been clarified. In earlier studies, this functional hierarchy has been realized through the use of explicit hierarchical structure, with local modules representing motor primitives in the lower level and a higher module representing sequences of primitives switched via additional mechanisms such as gate-selecting. When sequences contain similarities and overlap, however, a conflict arises in such earlier models between generalization and segmentation, induced by this separated modular structure. To address this issue, we propose a different type of neural network model. The current model neither makes use of separate local modules to represent primitives nor introduces explicit hierarchical structure. Rather than forcing architectural hierarchy onto the system, functional hierarchy emerges through a form of self-organization that is based on two distinct types of neurons, each with different time properties (“multiple timescales”). Through the introduction of multiple timescales, continuous sequences of behavior are segmented into reusable primitives, and the primitives, in turn, are flexibly integrated into novel sequences. In experiments, the proposed network model, coordinating the physical body of a humanoid robot through high-dimensional sensori-motor control, also successfully situated itself within a physical environment. Our results suggest that it is not only the spatial connections between neurons but also the timescales of neural activity that act as important mechanisms leading to functional hierarchy in neural systems.

Functional hierarchy in neural systems, defined as the principle that complex entities may be segmented into simpler elements and that simple elements may be integrated into a complex entity, is a challenging area of study in neuroscience. Such a functional hierarchy may be thought of intuitively in two ways: as hierarchy in space, and as hierarchy in time. An example of hierarchy in space is visual information processing, where elemental information in narrow receptive fields is integrated into complex features of a visual image in a larger space. Hierarchy in time is exemplified by auditory information processing, where syllable-level information within a short time window is integrated into word-level information over a longer time window. Although extensive investigations have illuminated the neural mechanisms of spatial hierarchy, those governing temporal hierarchy are less clear. In the current study, we demonstrate that functional hierarchy can self-organize through multiple timescales in neural activity, without explicit spatial hierarchical structure. Our results suggest that multiple timescales are an essential factor leading to the emergence of functional hierarchy in neural systems. This work could contribute to providing clues regarding the puzzling observation of such hierarchy in the absence of spatial hierarchical structure.

Introduction

Functional hierarchy, defined broadly as the principle that complex entities may be segmented into simpler elements and that simple elements may be integrated into a complex entity, is a ubiquitous feature of information processing in biological neural systems [1]–[4]. For example, in primary sensory areas such as VI and SI, the receptive field of neurons is relatively small, and these neurons respond to features of the stimulus that are simpler than those responded to by higher associative areas. Determining how these functional hierarchies are implemented in neural systems is a fundamental challenge in neuroscience.

The human motor control system is a representative example of a system with functional hierarchy. Humans acquire a number of skilled behaviors through the experience of repeatedly carrying out the same movements. Certain components of such movements, through repetitive experiences, are segmented into reusable elements referred to as “primitives”. In adapting to various situations, series of motor primitives are in turn also integrated into diverse sequential behavior. The idea underlying this basic process was proposed by Arbib in terms of “schema theory” [5], and has since been used as the basis for many studies (e.g. [6],[7]).

The action of drinking a cup of coffee, for example, may be broken down into a combination of motor primitives such as the motion of reaching for a cup on the table, and the motion of grasping the cup and bringing it to one's mouth. Ideally, these motor primitives should be represented in generalized manner, in the sense that the representation should be adaptive for differences in locations and in shapes of the cup. Primitives must also be flexible with respect to changes in the sequence of actions; for example, after grasping a cup, one sometimes brings the cup to one's mouth to drink, but one also sometimes takes the cup off the table to wash up. It is this adaptability (intra-primitive level) and flexibility (inter-primitive level) of primitives that allow humans to generate countless patterns of sequential behavior.

A number of biological observations suggest the existence of motor primitives. At the behavioral level, Thoroughman [8] for example showed that humans learn the dynamics of reaching motions through a flexible combination of movement elements. Sakai showed that, in visuomotor sequential learning, human subjects spontaneously segmented motor sequences into elementary movements [9]. At the level of animal muscle movement, Giszter [10], through observations of muscle movement in the frog's leg, found that there are a finite number of linearly combinable modules, organized in terms of muscle synergies on limbs. At the brain level, meanwhile, it has been shown that electrical stimulation in the primary motor and premotor cortex of the monkey brain evokes coordinated movements, such as reaching and grasping [11].

These observations strongly suggest that the diversity of behavior sequences in animals is made up of flexible combinations of reusable movement elements, i.e. motor primitives. What is not yet clear, however, is what underlying neural mechanisms govern the segmentation of continuous sensori-motor flows into primitives, and how series of primitives are combined into a variety of different behavior sequences.

To address this issue, we propose a neural network model for describing the neural mechanisms of segmentation and integration in continuous sensori-motor flows. This work can, as such, be seen as one possible neural implementation of schema theory. In experiments, the proposed network model was tested through the interaction of a humanoid robot with a physical environment, the robot requiring high-dimensional sensori-motor control. The robotics experiment is important when one considers the idea of the embodied mind by Varela [12], who explained that cognitive functions of neural systems emerge not only in the brain, but also in dynamic interactions between the physical body and the environment (see also a recent review [13]). This idea is also related to the so-called “synthetic approach” to neuroscience (or “robotic neuroscience”), an approach which has as its aim to extract essential mechanisms of neural systems using a variety of neuro-cognitive robotics experiments [14],[15].

There exist earlier studies on the computational modeling of functional hierarchy in sequences of motor primitives, representative examples being the “mixture of expert” model [16] and the “MOSAIC” model [17]. In these studies, functional hierarchy is realized through the use of explicit hierarchical structure, with local modules representing motor primitives in the lower level, and a higher module representing the order of motor primitives switched via additional mechanisms such as gate-selection (Figure 1A). We refer to this type of model as the “local representation” model.

PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Figure 1. Schematic drawings of (A) local representation model and (B) multiple timescale model. (A) Curves colored red, blue, and green represent sensori-motor sequences corresponding to motor primitives. Output of the system consists of behavior sequences made up of combinations of these primitives. In the local representation model, functional hierarchy is realized through the use of explicit hierarchical structure, with local modules representing motor primitives in the lower level, and a higher module representing the order of motor primitives switched via additional mechanisms such as gate-selection. (B) In the multiple timescale model, primitives are represented by fast context units whose activity changes quickly, whereas sequences of primitives are represented by slow context units whose activity changes slowly. https://doi.org/10.1371/journal.pcbi.1000220.g001

There are a number of possible advantages to the local representation. First, the learning of one module would seem not to affect other modules. Second, based on this independence in the learning process, it would seem that increasing the number of local modules would lead to an increase in the number of acquirable primitives. An earlier study using multiple sensori-motor sequences, however, demonstrated that difficult problems arise in the local representation model as a result of its local nature [18]. Similarities in learned sensori-motor sequences create competition in the learning process between corresponding modules. Generalization requires similar patterns to be represented in the same module as the same primitive, even subtle differences exist in the treatment of sets of between such patterns. On the other hand, for the purposes of achieving “crisp” segmentation of sensory-motor flow, different patterns must be represented as separate primitives in distinct modules. This conflict between generalization and segmentation poses serious problems in the treatment of set of multiple sensori-motor sequences within which there are similarities and overlap. Due to the difficulty of this problem, it is not possible to increase the number of acquirable primitives simply by increasing the number of local modules [18]. In addition, due to the explicit hierarchical structure of the local representation, learning of the lower module (primitives) and learning of the higher module (sequences of primitives) have to be explicitly separated through subgoals arbitrarily set by the experimenter [15],[16].

In order to overcome difficulties associated with the local representation model, we introduce in the current study a different type of representation for functional hierarchy. The representation we use neither makes use of separate local modules to represent primitives, nor introduces explicit hierarchical structure to manipulate these primitives. Instead of setting up an explicit hierarchy, we attempt to realize the self-organization of a functional hierarchy by means of neural activity with multiple timescales. This functional hierarchy is made possible through the use of two distinct types of neurons, each with different temporal properties. The first type of neuron is the “fast” unit, whose activity changes quickly over the short term. The second type of neuron is the “slow” unit, whose activity changes over the long term (Figure 1B).

The idea that multiple timescales may carry advantages for neural systems in interacting with complex environments is intuitively understandable. Indeed, the importance of multiple timescales in neural systems has been emphasized in a number of earlier studies from various different fields. For example, at the level of behavior, it has been shown that the process of acquiring motor skills develops through multiple timescales [19],[20]. Biological observations on motor adaptation, such as for example saccade adaptation and force field adaptation, likewise suggest that these processes involve distinct subsystems with differing timescales [21],[22]. At the level of neural synchrony, meanwhile, it is thought that differing timescales in neural synchrony are involved at different levels of information processing, such as for example in local and global interactions of brain regions [23],[24]. These previous studies strongly suggest the possibility that multiple timescales may be essential for the emergence of functional hierarchy in neural systems.

At the neuron level, the use of timescale variation has also been proposed as a means of representing different levels of functionality. In a study of auditory perception, for example, Poeppel [25] hypothesized that different temporal integration windows in neural activities correspond to a perceptual hierarchy between formant transition level and syllable level. In a study of an evolutional neural network model using a mobile robot, Nolfi [26] showed that a model with differing temporal integration windows is superior to the normal model in cases in which the robot is required to achieve two different tasks: collision avoidance, which requires short-term sensori-motor control, and self-localization, which requires long-term sensory integration. Furthermore, Paine [27] showed that, using a similar evolutional neural network model with a mobile robot, it was possible to achieve hierarchical functionality of motor primitives (wall avoidance) and execution of a given sequence of primitives (global goals) through a particular constraint on neural connectivity. In this model, one part of the network evolved so as to be responsible for primitives with fast dynamics, whereas another part of the network evolved so as to be responsible for sequences of primitives with slower dynamics. Paine's study is similar to the current study in that, in the functional hierarchy between motor primitives and behavior sequences, no separate local modules are used to represent primitives, and neither is any explicit hierarchical structure used to manipulate these primitives.

In the current study, however, our focus is on studying the impact to neural activity of multiple timescales. Unlike the earlier study by Paine, in which multiple timescales evolved as a result of an explicit requirement for different levels of functionality, in the current study we investigate whether functional hierarchy can self-organize through the imposition of constraints on timescales of the network. The proposed model will show that, through repetitive execution of skilled behavioral tasks, continuous sensori-motor flows are segmented into reusable motor primitives (adaptable to differences in location), and segmented primitives are flexibly integrated into new behavior sequences. The model does this without setting up an explicit sub-goal or functions such as gate-selection for manipulating primitives in the lower module, deriving this functional hierarchy instead through the use of distinct types of neurons, each with different temporal properties.

The main focus of the current study is on the question of how temporal behavior sequences can arise from neural dynamics. Thus we chose a dynamical systems approach [28] using a neural network model rather than a statistical model, the latter of which is often used as a powerful tool for studying mechanisms of neural systems [29]–[31]. Among dynamical systems models, the use of physiologically detailed models with spiking neurons has become popular in explaining accumulated neurophysiological findings [32]–[34]. It is nonetheless still difficult to reproduce diverse sequential behavior in robots starting at the level of models with spiking neurons. In the current study, in order to mediate between the conceptual level of schema theory and the physiologically detailed level of models using spiking neurons, we propose a macro-level neural dynamics model.

The main component of the current model is a continuous time recurrent neural network (RNN). Thanks to its capacity to preserve the internal state, which enables it to reproduce complex dynamics, the RNN is often used for modeling temporal sequence learning [35]–[37]. The continuous time RNN (CTRNN) is a type of RNN which implements a feature of biological neurons, namely that the activities of neurons are determined not only by current synaptic inputs but also by the past history of neural states. Due to this characteristic, according to which activation changes continuously, the CTRNN is superior to discrete time RNN models in modeling mechanisms for producing continuous sensori-motor sequences [38],[39].

The model of neurons is a conventional firing rate model, in which each unit's activity represents the average firing rate over a group of neurons. Spatio-temporal patterns of behavior arise from dynamics of neural activities through neural connectivity. The CTRNN is as such considered to emulate characteristic features of actual neural systems, and the current model is considered consistent at the level of the macro-level mechanisms of biological neural systems. For this reason, consistency in physiological details, such as features of neural activity at the level of individual neurons and characteristics of individual synapses, are not considered in detail. It is not our intention in the current study to map directly between model components and actual brain structures. Possible implications to biology of the current results were discussed only at an abstract level, in terms of the model employed in the current study.