Neurobiologists have captured, for the first time, the moment in a mouse’s brain when it first learns something new: in this case, that a beep signals a delicious droplet of sugar water. The results, recently reported in Nature Neuroscience, support a famous theory of learning first proposed in the 1940s.

Researchers at the Howard Hughes Medical Institute’s Janelia Research Campus in Ashburn, Va. sought to distinguish between two competing models for learning governed by the dopamine-producing nerve cells in the midbrain. These neurons, found in all vertebrates, are involved in both movement and learning. One theory of learning, proposed by Donald Hebb in the 1940s, is that the brain learns from the animal’s successes, firing in response to a good thing happening. Another, to which many neurobiologists subscribe, is that animals learn by making mistakes, in a sense comparing their erroneous expectations to reality.

Luke Coddington, a research scientist in Josh Dudman’s lab, saw a problem with the error-based theory: In the very first stages of learning a new fact or skill, how would an ignorant animal know it’s made a mistake? In previous experiments, neuroscientists studied animals that had already been trained in a task. But Coddington wanted to see that first spark of insight, when the mouse realized that a sound indicated a tasty treat. He and Dudman report found that there are two distinct dopamine neuron signals as the mice learn. The results support the Hebbian model of learning early on, with the animals switching to error-based learning once they’ve grasped the task.

Coddington set up the mice in front of a dropper that delivered the sweetened water. A tone would sound; a second and a half later, the treat arrived. The researchers could tell the mouse knew the sugar water was coming if it licked the dropper. Coddington also mounted the mice in a tube dangling from springs, attached to an accelerometer, so he could assess the animals’ degree of fidgeting when they got excited about a potential reward.

During these experiments, Coddington used a glass pipette to reach and eavesdrop on the dopamine-producing neurons in the midbrain. Since those neurons are rare, and intermingled with other types, he used genetically engineered mice with dopamine neurons that responded to a flash of light. By sending that light through the pipette, he could find the right neurons. Then he used the same pipette to record their activity. Neuronal signaling changed as the mouse went from clueless to confident that the tone announced an incoming treat.

Before playing any tones or providing any sugar water, Coddington collected data on baseline neuron activity. When the mouse fidgeted, he saw that activity go down. It’s as if the neurons noted the movement, and that nothing good happened as a result. The result was an early hint that the brain could sense rewards, or in this case, the lack thereof.

Then, during the first couple hundred trials, Coddington observed the mouse responding to the sugary reward. The mice can smell the treat, and would lick for it after it appeared. This resulted in two peaks in dopamine neuron activity: one when the mouse noticed the sugar water, and another right before the moment of licking. In the case of the latter, it’s as if the brain was saying, “I’m about to lick, and I’m pretty sure there’s going to be sweet water,” says Coddington. Imagine, says Dudman, the moment a child reaches for a piece of Halloween candy, knowing it’s going to be good. That’s when the dopamine-making neurons fired—at the moment of certain success.

Over the next few hundred trials, the mouse started to learn that the tone indicated an upcoming reward. As this happened, Coddington observed an additional peak in dopamine neuron firing, when the mouse heard the tone and anticipated the reward. With time, the licking signal moved earlier and earlier, converging with the timing of the tone, so the mouse would start licking as soon as it knew the water was on its way. “That’s when you get your biggest learning,” says Coddington.

The authors, then, had observed dopaminergic neurons involved in both sensing the signs of an upcoming reward, and noting the movement to get it. They suggest that untrained mice, by combining these two signals, could learn from their successes as Hebb proposed. Only after the mice had mastered their task did the learning process start to look like an error-based one.

Neir Eshel, a psychiatrist and neuroscientist at the Stanford University School of Medicine in California, praised the study for its innovations in monitoring very early learning as well as mouse movement. “It gives us a new way of thinking about the ultimate goal of the system,” he says. “The goal is to figure out a way to maximize those behaviors that lead to better outcomes.”

But Eshel isn’t ready to say early learning is exclusively Hebbian. “I don’t think their paper conclusively disproves error-based learning at any point during training,” he says. “You could actually have both, simultaneously.”