So a brain could learn from reinforcement with or without an explicit signal for errors in predicting that reinforcement. But the brain does have an explicit error signal encoded by dopamine neurons. What does this tell us?

I think it tells us three interesting ideas for how the brain works. I think — fully prepared to be wrong about this, and for there to be a water-tight argument for why you can’t build a brain without an explicit signal for errors in predicting reward.

The first idea is that the existence of an explicit error signal implies the existence of a simple representation of the world in the brain. A so-named “model-free” representation that does not represent every possible outcome of an action, and likely does not use probability either. A quickly accessible look-up table of the values of actions, that is used to choose actions when time is pressing or the world is unchanging. We already have some good ideas of where such representations live in the brain. And all forms of such simple representations we know about require an explicit signal for the error between actual and predicted values.

A second idea is that what is one concept in reinforcement learning is actually two processes in the brain. The one concept in reinforcement learning is that you use the error in your prediction to change your estimate of an action’s value. Why is this two processes in the brain? Because the brain might want to separately control short-term and long-term changes in the estimates of an action’s value. And having an explicit error signal carried by dopamine lets it do both with one signal.

To get long-term changes we could adjust our estimate of an action’s value by changing up or down the strength of connections onto neurons representing that action. Adjusting our estimate of value in this way changes long-term behaviour. And the rapid dopamine signal is indeed thought to control whether and in which direction some connections in the brain are allowed to change their strengths. Here you need the sign of the error signal to tell the connections which direction to change in.

But the brain doesn’t necessarily want each and every bit of feedback it gets to change a connection between neurons. For that locks it into a path from which it might be difficult to recover. Indeed, when we try and change the strengths of these connections ourselves, by stimulating the inputs to a neuron, some of them can prove remarkably difficult to shift. Which raises the possibility that, in the short-term, the brain may want to hedge its bets, by changing its estimates of an action’s value without changing any connection strengths. And it can do this by instead changing how responsive neurons are to their inputs. If you make the neuron for action A more likely to fire, then you’ve increased its predicted value; and vice-versa. Guess which transmitter in the brain has many hundreds of papers showing it changes the responsiveness of neurons that control action? Yep, dopamine.

Put together, the argument here is that the explicit error signal exists to allow the brain to control changes of predicted value on two time-scales. And do that using one error signal coded by dopamine: to both allow changing connection strengths in the long-term, and change how responsive neurons are in the short-term.

The third idea is that an explicit error signal in the brain is evolutionary happenstance. Building a system to learn from feedback is easier with an explicit error signal than with representations of probabilities across a group of neurons. Ancient animals likely had a neuron or two that spritzes dopamine, or something similar, as part of their control of movement. We can find plenty of invertebrates with a just a few thousand neurons in which dopamine alters movement by changing the ways neurons respond to their inputs. With this dopamine system in place, perhaps the path of least resistance for evolution was to co-opt this broadcast signal to change the coupling between neurons following an error. Which seems potentially easier than, from the same crude beginnings, first evolving a distributed system for representing information that does not require an explicit error signal.