In our new paper in Nature Neuroscience (Download a PDF here), we use the meta-reinforcement learning framework developed in AI research to investigate the role of dopamine in the brain in helping us to learn. Dopamine—commonly known as the brain’s pleasure signal—has often been thought of as analogous to the reward prediction error signal used in AI reinforcement learning algorithms. These systems learn to act by trial and error guided by the reward. We propose that dopamine’s role goes beyond just using reward to learn the value of past actions and that it plays an integral role, specifically within the prefrontal cortex area, in allowing us to learn efficiently, rapidly and flexibly on new tasks.

We tested our theory by virtually recreating six meta-learning experiments from the field of neuroscience—each requiring an agent to perform tasks that use the same underlying principles (or set of skills) but that vary in some dimension. We trained a recurrent neural network (representing the prefrontal cortex) using standard deep reinforcement learning techniques (representing the role of dopamine) and then compared the activity dynamics of the recurrent network with real data taken from previous findings in neuroscience experiments. Recurrent networks are a good proxy for meta-learning because they are able to internalise past actions and observations and then draw on those experiences while training on a variety of tasks.

One experiment we recreated is known as the Harlow Experiment, a psychology test from the 1940s used to explore the concept of meta-learning. In the original test, a group of monkeys were shown two unfamiliar objects to select from, only one of which gave them a food reward. They were shown these two objects six times, each time the left-right placement was randomised so the monkey had to learn which object gave a food reward. They were then shown two brand new objects, again only one would result in a food reward. Over the course of this training, the monkey developed a strategy to select the reward associated-object: it learnt to select randomly the first time, and then based on the reward feedback to choose the particular object, rather than the left or right position, from then on. The experiment shows that monkeys could internalise the underlying principles of the task and learn an abstract rule structure — in effect, learning to learn.

When we simulated a very similar test using a virtual computer screen and randomly selected images, we found that our ‘meta-RL agent’ appeared to learn in a manner analogous to the animals in the Harlow Experiment, even when presented with entirely new images never seen before.