Abstract Recent research in neuroeconomics has demonstrated that the reinforcement learning model of reward learning captures the patterns of both behavioral performance and neural responses during a range of economic decision-making tasks. However, this powerful theoretical model has its limits. Trial-and-error is only one of the means by which individuals can learn the value associated with different decision options. Humans have also developed efficient, symbolic means of communication for learning without the necessity for committing multiple errors across trials. In the present study, we observed that instructed knowledge of cue-reward probabilities improves behavioral performance and diminishes reinforcement learning-related blood-oxygen level-dependent (BOLD) responses to feedback in the nucleus accumbens, ventromedial prefrontal cortex, and hippocampal complex. The decrease in BOLD responses in these brain regions to reward-feedback signals was functionally correlated with activation of the dorsolateral prefrontal cortex (DLPFC). These results suggest that when learning action values, participants use the DLPFC to dynamically adjust outcome responses in valuation regions depending on the usefulness of action-outcome information.

Maximizing reward obtained over time can be a daunting challenge to any organism (1). Without concrete instruction, an animal can only develop and fine-tune its reward-harvesting strategy through trial and error. Reinforcement learning (RL) theory has formalized this intuition and associated prediction error to the phasic changes of activities in dopaminergic neurons that track the ongoing difference between experienced and expected reward (2). Under this framework, prediction error is thought to broadcast to valuation structures, such as the striatum and ventromedial prefrontal cortex (vmPFC), to direct learning and integrate with other streams of information to facilitate decision making (3–6).

Recent research in neuroeconomics has demonstrated that this theoretical model of trial-and-error reward learning captures the patterns of both behavioral performance and the blood-oxygen level-dependent (BOLD) signals during a range of economic decision-making tasks (3, 7–10), demonstrating important cross-species similarities in the mechanisms of reward learning (11). Because of this finding, RL models have become a primary means to characterize neural responses in neuroeconomics. However, this powerful theoretical model has its limits. Trial and error is only one of the means by which individuals can learn the value associated with different decision options. Humans have also developed efficient, symbolic means of communication, namely language, that allow the social communication of information about value without the necessity for committing multiple errors across trials to learn. Little is known about how this explicit, symbolic knowledge can infiltrate the valuation structures mentioned above and exert its influence on action selection, and how the brain's embodiment of the RL algorithm differs in the face of instructed knowledge.

To address these questions, we used functional MRI (fMRI) together with a probabilistic reward task (Fig. 1) to assess the relative contributions of trial-and-error feedback and instructed knowledge on choice selection (12, 13). We designed an experiment with two sessions. In the “feedback” session, participants’ choices were only based on the win/loss feedback, and in the “instructed” session participants could also incorporate the correct cue-reward probability information provided by experimenter to guide choice behavior. We hypothesize that: (i) RL is a robust algorithm to explain and predict choice behaviors and BOLD responses in an environment where trial-and-error feedback is the only information to guide learning and influence choices (13–17), and (ii) when instructed knowledge about reward probabilities is also available, participants use this extra information to achieve better performance by modulating the degree to which RL algorithms are involved. We also explored which brain systems may influence the implementation of instructed knowledge by modulating the patterns of BOLD responses in brain areas typically implicated in RL, valuation and choice selection (13, 18–26).

Fig. 1. Experimental design. In feedback session, the number 5 and a specific visual cue were displayed on the screen. In the instructed session, additional probability information was displayed on top of the visual cue.

Methods Participants. Twenty participants were recruited and tested in compliance with the university committee on activities involving human subjects [University Committee on Activities Involving Human Subjects (UCAIHS)]. The experiment was approved by the UCAIHS at New York University and all subjects provided informed consent before the experiment. Of the 20 participants, 7 were male, 9 were non-Caucasian, and the group had an average age of 21.6 y (SD = 3.72). Experimental Procedures. Each participant played two sessions of the task. One session was named the “feedback” session and the other session was titled the “instructed” session (Fig. 1). For both sessions, participants were told that they would see different visual cues which represent how likely the number underneath the cue would be greater or less than 5 (value of underlying number ∈ {1, 2, 3, 4, 6, 7, 8, 9}). The sequence of the two sessions was randomized across participants, so that 10 out of 20 participants experienced the feedback session first. In both sessions, four different visual cues representing different probabilities (P ∈ {25, 50, 75, 100%}) of the number underneath the cue being greater than 5 were presented to participants. For both sessions, participants saw a cue next to the number 5 on each trial. Each cue was randomly presented 20 times for a total of 80 trials per session (see SI Appendix for details). Functional MRI Image Acquisition. Scanning was performed on all 20 participants with a 3-T Siemens Allegra head-only scanner and a Siemens standard head coil at New York University's Center for Brain Imaging (see SI Appendix for details). Behavioral Analysis. Participants’ choice behaviors in both sessions were modeled by a simple RL algorithm (See SI Appendix for details). We tested our model against others suggested in the literature based on behavioral data with similar tasks (27, 28) using the Bayesian information criterion as a criterion for model selection. For the feedback session, the simple RL with one learning rate (α) for both positive and negative prediction errors fits participants’ behavior better. However, RL with different learning rates (α + and α − ) for positive and negative (δ + and δ − ) PEs fits participants’ choices the best in the instructed session (see SI Appendix for details). Imaging Analysis. We first regressed PEs that were generated for both the feedback and instructed sessions using the best-fitting parameters to the whole-brain BOLD signals at the revelation of monetary outcome to identify the brain areas whose activities were correlated with the calculation of PE. Monetary outcomes were also included as dummy regressors to account for the effect of the magnitude of the reward value. Repeated-measures two-way ANOVA was performed on the functional imaging data with two factors (session and learning phase) at the onset of feedback. The finite impulse response from time 0 to ∼12 s (TR0 to ∼TR6) was generated by resampling the BOLD time series of each voxel in the brain and averaging across 40 trials each for the early and late learning phases in both sessions. Because canonical hemodynamic response function typically peaks at 6 to ∼8 s after the stimulus onset, the two-way ANOVA was performed on both TR3 (6 s) and TR4 (8 s). These whole-brain analyses were performed on each voxel to identify brain regions that showed a significant interaction effect with time (i.e., early vs. late learning) and session (i.e., feedback vs. instructed session). Finally, we conducted a PPI analysis to investigate the connectivity between brain regions that may modulate the impact of instructed knowledge on RL learning signals (see SI Appendix for technical details).

Acknowledgments We thank K. Sanzenbach and the Center for Brain Imaging at New York University for technical assistance. This study was funded by a James S. McDonnell Foundation grant and National Institute of Mental Health Grants MH 080756 (to E.A.P.) and MH 084081 (to M.R.D.).

Footnotes 1 To whom correspondence should be addressed. E-mail: liz.phelps{at}nyu.edu .

To whom correspondence should be addressed. E-mail: . Author contributions: J.L., M.R.D., and E.A.P. designed research; J.L. performed research; J.L. contributed new reagents/analytic tools; J.L. analyzed data; and J.L., M.R.D., and E.A.P. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1014938108/-/DCSupplemental.