Recurrent activation of multiple memories within an instance-based system can be used to discover links between experiences, supporting generalization and memory-based reasoning.

Replay of experiences from this system supports interleaved learning and can be modulated by reward or novelty, which acts to rebalance the general statistics of the environment towards the goals of the agent.

Both natural and artificial learning systems benefit from a second system that stores specific experiences, centred on the hippocampus in mammalians.

Recent work shows that once structured knowledge has been acquired in such networks, new consistent information can be integrated rapidly.

Discovery of structure in ensembles of experiences depends on an interleaved learning process both in biological neural networks in neocortex and in contemporary artificial neural networks.

We update complementary learning systems (CLS) theory, which holds that intelligent agents must possess two learning systems, instantiated in mammalians in neocortex and hippocampus. The first gradually acquires structured knowledge representations while the second quickly learns the specifics of individual experiences. We broaden the role of replay of hippocampal memories in the theory, noting that replay allows goal-dependent weighting of experience statistics. We also address recent challenges to the theory and extend it by showing that recurrent activation of hippocampal traces can support some forms of generalization and that neocortical learning can be rapid for information that is consistent with known structure. Finally, we note the relevance of the theory to the design of artificial intelligent agents, highlighting connections between neuroscience and machine learning.

Purchase access to all full-text HTML articles for 6 or 36 hr at a low cost. Click here to explore this opportunity.

To read this article in full you will need to make a payment

Differences in hippocampal neuronal population responses to modifications of an environmental context: evidence for distinct, yet complementary, functions of CA3 and CA1 ensembles.

What representations and computations underpin the contribution of the hippocampus to generalization and inference?.

Sleep and the price of plasticity: from synaptic and cellular homeostasis to memory consolidation and integration.

The representational capacity of the distributed encoding of information provided by populations of neurons in primate temporal visual cortex.

Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory.

Glossary

networks with recurrent connectivity that have stable states which persist in the absence of external inputs, and afford noise tolerance. Discrete/point attractor networks can be used to store multiple memories as individual stable states. Continuous attractor networks have a continuous manifold of stable points which allow them to represent continuous variables (e.g., position in space).

the storage within an attractor network of an input pattern constituting an experience, such that elements of the input pattern are linked together through plasticity within the recurrent connections of the network. The operation of recurrent connections supports functions such as pattern completion, whereby the entire input pattern (e.g., memory of a birthday party) can be retrieved from a partial cue (e.g., the face of a friend).

exemplar models in cognitive science, related to instance-based models in machine learning, operate by computing the similarity of a new input pattern (i.e., presented as external sensory input) to stored experiences. This results in the output of the model, for example a predicted category label for the new input pattern, at which point the process terminates.

we use this term to refer to algorithms where each experience or datapoint has its own set of coordinates, where capacity can be increased as required – and the number of parameters may grow with the amount of data. K-nearest neighbor constitutes one common example of such a non-parametric instance-based method.

we use this term to refer to algorithms that do not store each datapoint, but instead directly learn a function that (for example) predicts the output value for a given input. The number of parameters is typically fixed.

a paradigm in which items are organized into (e.g., a hundred) sets of triplets (e.g., ABC) or larger sets (e.g., sextets: ABCDEF). Participants view item pairs (e.g., AB, BC) during the study phase and are tested on their ability to appreciate the indirect relationships between items that were never presented together (e.g., A and C).

a paradigm where item pairs are experienced during study (e.g., word pairs such as ‘dog–table’ in a human experiment, or flavor–location pairs in a rodent experiment), and at test the individual must recall the other item (e.g., specific location) from a cue (the specific flavor, e.g., banana).

recurrent similarity computation allows the procedure performed by exemplar models to iterate: that is, the retrieved products from the first step of similarity computation are combined with the external sensory input, and a subsequent round of similarity computation is performed. This process continues until a stable state (i.e., basin of attraction in a neural network) is reached. This allows the model to capture higher-order similarities present in a set of related experiences, where pairwise similarities alone are not informative.

spontaneous neural activity occurring within the hippocampus during periods of rest and slow wave sleep, evident as negative potentials (i.e., sharp waves). Transient high-frequency (∼150 Hz) oscillations (i.e., ripples) occur within these sharp waves, which can reflect the replay (i.e., reactivation) of activity patterns that occurred during actual experience, sped up by an order of magnitude.

the proportion of neurons in a given brain region that are active in response to a given stimulus (‘population sparseness’). Sparse coding, where a small (e.g., 1%) proportion of neurons is active, is contrasted with densely distributed coding where a relatively large proportion of neurons are active (e.g., 20%).