Diverse behaviors in response to rewards and punishments

To test how single ACC neurons (n = 329) signal information about reward, punishment, and uncertainty, two monkeys were conditioned with an appetitive–aversive behavioral procedure that contained two separate contexts, or blocks. One block contained 12 trials in which three visual fractal objects (CS) predicted rewards (juice) with 100, 50, and 0% chance. The second block contained 12 trials in which three visual fractal CSs predicted punishments (air puffs) with 100, 50, and 0% chance (Fig. 1a, b). The monkeys did not have to fixate the CSs to complete the trial (Methods). The design was such that the 100% reward CS had the highest value in the reward block and the 0% punishment CS had the highest value in the aversive block. Hypothetical encoding strategies of reward and punishment predictions are shown in Fig. 1c.

Fig. 1 The reward-punishment behavioral procedure. a Monkeys experienced two types of distinct blocks of trials in which three visual fractal-conditioned stimuli (CSs) predicted rewards and punishments with 100, 50, and 0% chance. The reward block consisted of 12 trials in which reward was possible, and the punishment block consisted of 12 trials in which punishment was possible. b Structure of a single trial. CS could appear in the center (as shown) or peripherally, 10 degrees to the left or right of center. c Theoretical valence-coding strategies in the reward-punishment procedure. d–f Monkeys’ behaviors were motivated by reward, punishment, and uncertainty. d Cumulative probability functions of peripheral-CS gaze acquisition time. In reward or punishment block, the speed of CS acquisition was correlated with outcome probability (Spearman’s rank correlations; p < 0.01). e Proportion of trials the monkeys oriented to the location of peripheral CSs shown across the entire CS epoch for reward-block and punishment-block trials. During the last 1000 ms of the trial, the monkeys’ gaze was preferentially attracted to the 50% reward CS (p < 0.01; rank sum test). f Proportion of trials the monkeys blinked shown across the entire CS epoch for reward-block and punishment-block trials. Insets show mean proportion of time monkeys blinked during the last 500 ms of the CS epoch. g In distinct blocks we included an abort cue during approximately one half of the trials (Methods). Structure of trials with an abort cue. (g, left)—trials in which the monkeys did not abort (eye position is schematized by the red arrow); (g, right)—trials in which the monkeys aborted the trial. h Proportion of trials aborted in the reward block (top-left) and punishment block (top-right). The speed of trial aborting decreased with increasing punishment probability (bottom). The black–gray color legend for d–f are defined in a. Error bars indicate standard error. Single asterisks indicate significance at a 0.05 threshold and double asterisks indicate significance at 0.01 threshold Full size image

Analyses across all recorded behavioral sessions showed that the monkeys understood the behavioral procedure. Their CS-related anticipatory mouth movements (e.g., licking of the juice spout) and anticipatory blinking were correlated to the probability of reward and punishment, respectively (Spearman’s rank correlations; p < 0.001; rho = 0.24 for licking; rho = 0.37 for blinking; Supplementary Fig. 1). While these anticipatory responses were related to the expected value of the CSs, other behaviors, such as gaze, were driven by the absolute expected value of the CSs (often called motivational intensity or salience43) and by outcome uncertainty. Particularly, the monkeys’ gaze was initially drawn to the CSs associated with the more probable outcome (Fig. 1d), irrespective of valence. Target acquisition times during all the 100 and 50% CS trials were faster when compared with the 0% trials, within either reward or punishment block (p < 0.001; rank sum tests; single trials from no-abort cue blocks from all behavioral sessions = 31,838 trials). Within the reward block, target acquisition times were faster during 100% reward CS trials than during 50% reward CS trials (p < 0.01; rank sum tests).

Later in the trial the monkeys’ gaze was most strongly attracted towards the 50% reward CS (Fig. 1e; rank sum test; p < 0.001; measured during the last 1 s of the trial). We could not reliably observe CS-related gaze behavior during the last second of the punishment-predicting CSs because the gaze signal was quenched by defensive blinking (Fig. 1f).

To study punishment, it is crucial to verify that the outcome or unconditioned stimulus is aversive. It could be that blinking behaviors depicted in Fig. 1f reflect conditioning, but not aversion. To address this issue, we utilized distinct reward and punishment blocks that contained abort cues during one-half of the trials (Methods). This paradigm is a type of active avoidance used to test the aversiveness of cues, outcomes, and contexts in humans and experimental animals44,45,46,47,48. If monkeys made a saccade to the abort cue, the trial was aborted (Fig. 1g). Because these reward and punishment blocks alternate, the optimal reward-driven strategy would be to rapidly abort every trial in which the 100 and 50% reward CSs were not presented. In contrast, the monkeys’ aborting behaviors were influenced by reward, punishment, and uncertainty. In the punishment block, monkeys actively aborted punishment-predicting CSs more often than the 0% punishment CS, confirming that the air puff was an aversive outcome (Fig. 1h). While initially 100% punishment CS was associated with faster target acquisition than 0% CS (suggesting that it was more motivationally salient because it strongly attracted gaze; 43), later in the trial (when abort cues were presented), monkeys aborted the 100% CS faster than 50% and 0% punishment CSs. The data show that the monkeys’ motivation to abort was positively related to the probability of punishment.

Also, monkeys aborted 50% reward CSs less than 100% reward CS trials (though, note that the proportion of aborted trials during either 100 or 50% reward CSs was extremely low). This decrease in abort error rate is consistent with the observations in Fig. 1e that the 50% reward CS captured attention (despite having a lower expected value than 100% reward CS), reducing the number of saccades to the location of the abort-cue.

In sum, the behavioral data suggest that monkeys utilized different representations (or encoding strategies) of rewards and punishments to influence their behaviors.

Next, we asked if the ACC contains distinct representations of rewards and punishments or if ACC neurons encode rewards and punishments with a common currency, such as a general value signal (Fig. 1c). The locations of ACC neuronal recordings are shown in Supplementary Fig. 2 and match the locations of neuronal recordings in previous studies of macaque ACC5, 9, 32, 49, 50.

Many ACC cells signal value of either reward or punishment

The neuronal recordings revealed both that many ACC neurons represent expected value in a valence-specific manner, displaying greatest sensitivity to the probability of either rewards or punishments; and also that many ACC neurons prefer uncertain CSs in a valence-specific manner, often displaying preference for either reward or punishment uncertainty.

To summarize the valence-encoding strategies of ACC neurons, correlation analyses were peformed for each recorded neuron (n = 329), which assessed the relationship of CS responses and outcome probability (Methods; Supplementary Fig. 3 and Supplementary Table 1). The correlation coefficients are shown in Fig. 2b-inset. Most neurons that displayed significant correlations with outcome probability (Spearman’s rank correlation; statistical threshold: p < 0.05) did so in a valence-specific manner, for either reward or punishment probability.

Fig. 2 Many ACC neurons signal outcome probability in a valence-specific manner. a Two single ACC neurons, one that signals punishment probability (left) and one that signals reward probability (right). b Scatter plot shows the difference in CS responses for 100 and 0% CSs in the reward block (x-axis) vs. difference in CS responses for 100 and 0% CSs in the punishment-block (y-axis). Green—neurons displayed significant differences between 100 and 0% CSs in the punishment block only, blue—neurons that displayed significant differences between 100 and 0% CSs in the reward block only, red—neurons that displayed significant differences between 100 and 0% CSs in both blocks and both differences were either positive or negative, and purple—neurons that displayed significant differences between 100 and 0% CSs in both blocks and the differences were of different sign. Colors in pie chart (inset) correspond to colors in scatter plot and in Fig. 1c. Uncertainty-selective neurons are studied separately in Fig. 3. (b, inset) Correlation coefficients of all recorded neurons that displayed significant correlations with either reward or punishment probabilities, or both. Significance of each correlation was tested by 10,000 permutations (p < 0.05). c Single neuron responses (gray) shown separately for four major groups of outcome probability coding neurons in the scatter plot in b. Single neurons’ CS responses were normalized to the maximum CS response; from 0 to 1. Red asterisks indicate that the neurons’ responses varied significantly across the CSs within the reward block (Kruskal–Wallis test; p < 0.01) and displayed significant correlation with outcome probability (Spearman’s rank correlation; p < 0.01); blue asterisks indicate that the neurons’ responses varied significantly across the CSs within the punishment block (Kruskal–Wallis test; p < 0.01) and displayed significant correlation with outcome probability (Spearman’s rank correlation; p < 0.01) Full size image

Two valence-specific example neurons are shown in Fig. 2a. The first neuron displayed greatest excitation to punishment-predicting CSs and did not differentiate among 100, 50, and 0% reward CSs. The second neuron displayed greatest excitation to reward-predicting CSs (strongest to 100% reward) and did not differentiate among 100, 50, and 0% punishment CSs. To assess if such specific valence coding strategies were common among ACC neurons, we visualized the differences in neuronal responses for predictions of good outcomes and bad outcomes in the reward and punishment blocks (Fig. 2b). These analyses confirmed that many value-coding ACC neurons display valence-specific responses (Fig. 2).

Notably, other studies showed that within reward loss or reward gain trials, distinct ACC neurons signaled predictions and deliveries of reward gains and losses7, 28, 51, 52. Our data replicate these studies. Among reward-value neurons, many were excited or inhibited by increasing probabilities of rewards (Fig. 2c), signaling positive or negative reward values, respectively. In contrast, among the punishment-value neurons, most were excited by increasing probabilities of punishments.

Also, these ACC neurons did not display-trial-by-trial correlations with conditioned responses; see Supplementary Table 2.

The second most common valence coding strategy seen among ACC neurons signaled the motivational intensity or unsigned value of the CSs (red in Fig. 2). Few neurons showed a generalized value signal that rank-ordered the reward or punishment predictions according to their expected value (100 > 50 > 0% rewards and 0 > 50 > 100% punishments; Fig. 2). The time courses of neuronal responses among these neuronal types are shown in Supplementary Figs. 4–5.

Following 50% CSs, many ACC neurons discriminated between outcome deliveries and omissions. Many of these prediction error responses also tended to be valence specific. For example, distinct ACC neurons signaled reward omissions, reward deliveries, punishment omissions, and punishment deliveries (Supplementary Fig. 3). Interestingly, the value of the CSs (during the CS epoch) and their associated outcomes were often signaled by different neurons (Supplementary Fig. 3). To summarize, the data thus far suggest that ACC neurons can signal the values of predictions and deliveries of rewards and punishments through the activity of distinct populations of neurons.

ACC neurons are known to discriminate among different behavioral tasks33 and contexts53, and integrate task-related information over long time scales31, 54. Hence, it was important to assess if ACC contains valence-specific neurons in a behavioral procedure in which reward and punishment CSs are presented in the same context or block of time. To this end, in an additional control procedure, Monkey B was conditioned with nine distinct visual fractal CSs that predicted reward with 100, 75, 50, 25, and 0% chance and punishment with 100, 75, 50, and 25% chance. The monkey displayed conditioned responses that suggested that it understood the meaning of the CSs. The monkey’s licking and blinking behaviors were correlated with the probability of reward and punishment, respectively (Supplementary Fig. 6A). Among 95 ACC neurons recorded in this procedure, 16 were selectively enhanced by increases in the probability of punishment, but were insensitive to changes in the probability of reward. An example of one such neuron and their average responses are shown in Fig. 3a, b. Oppositely, 9/95 neurons were enhanced by increases in the probability of reward, but were insensitive to changes in the probability of punishment. An example of one such reward-specific value-coding neuron and their average activity are shown in Fig. 3c, d. Therefore, in the blocked reward-punishment behavioral procedure (Fig. 1) and in the single-block control procedure, the ACC contained neurons that signaled reward and punishment values in a valence-specific manner. Consistent with this observation, population coding strategies (summarized by plotting correlation coefficients of correlations that assessed the relationship of neuronal activity and probabilities of either reward or punishment) were qualitatively similar across the blocked and the single-block behavioral procedures (Fig. 2b; Supplementary Fig. 6B).

Fig. 3 Valence specificity in a single-block reward/punishment procedure. a The activity of a single punishment-sensitive ACC neuron shown separately for the nine CSs in the single block reward/punishment procedure. b The average activity of 16 punishment-enhanced neurons in ACC. c The activity of a single reward-sensitive ACC neuron shown separately for all nine CSs. d The average activity of nine reward-enhanced neurons in ACC. Spearman’s rank correlations of the average responses with reward and punishment probabilities are indicated in b and d. Error bars denote standard error. Classification of neurons as punishment-enhanced value coding neurons and reward-enhanced value coding neurons were the same as in Fig. 2 (Methods). Before averaging, single neurons’ CS responses were normalized to the maximum CS response; from 0 to 1 (same as in Fig. 2c). Red—activity in reward trials, blue—activity in punishment trials Full size image

In the two-block appetitive/aversive procedure, monkeys and ACC neurons were highly sensitive to the nearing of their preferred context over many trials (Fig. 4). To show this, the analyses were concentrated on the behavioral and neuronal responses to the trial start cue. Though trial start cues were presented 3.5 s before the trial’s outcome would be experienced by the monkeys, their anticipatory orienting behavior (the duration it took them to foveate the trial start cue) was significantly faster in the reward vs. punishment block (Fig. 4a). Also, during the punishment block the speed of target acquisition changed as a function of trial number, decreasing as the reward block neared (Fig. 4a). Similarly, reward and punishment-sensitive neurons signaled the nearing of their preferred blocks. Reward-sensitive neurons displayed gradual and systematic changes in the aversive block in anticipation of the reward block (Fig. 4b). And, punishment-sensitive neurons displayed systematic changes in the reward block in anticipation of the aversive block (Fig. 4b).

Fig. 4 Monkeys and ACC neurons anticipate the nearing of their preferred block. a Reward and punishment blocks of 12 trials were separated into four sub-blocks. The blocks were separated relative to the monkeys’ knowledge about which block they were in. For example, the reward block starts after the first reward CS trial after which the monkeys knew they were in the reward block. It ends during the first punishment CS trial (because the monkeys had not yet seen the punishment CS). Hence, the block trial numbers in this figure are numbered 2 through 13. The average times of when monkeys foveated the trial start cue (relative to the time of the trial start cue presentation) are shown for the four sub-blocks in the reward block (red, left) and aversive block (blue, right). The results of a Spearman’s rank correlations assessing the relationship of sub-block number and speed of orienting to the trial start cue are indicated above each block. Asterisk highlights when a correlation was significant. Error bars denote standard error. b Responses of the three groups of valence-specific neurons identified in Fig. 2 during the last 500 ms of the trial start cue epoch. Conventions are the same as in a. In all four plots, the data in the reward block (red) differed from the data in the punishment block (blue; rank sum test; p < 0.05) Full size image

Therefore, ACC neurons can signal valence specifically or non-specifically (Fig. 2). And, valence-specific neurons can signal the nearing of their preferred contexts or events over long time scales (Fig. 4).

Representation of reward and punishment uncertainty in ACC

Next, we assess if and how single ACC neurons encode uncertainty. In contrast with the hypothesis that ACC encodes economic or general common currency values, some recent neuroimaging studies have highlighted the ACC as a central hub for processing outcome uncertainty and guiding behaviors aimed to reduce this uncertainty. However, single neuron evidence for uncertainty processing in ACC has been missing.

We found 88 uncertainty selective neurons in the ACC (Methods), some of which responded for punishment uncertainty. An example neuron is shown in Fig. 5a (left). This neuron did not respond selectively to reward-predicting CSs or reward outcomes. In the punishment block, the neuron was most strongly activated by 50% punishment CS until the trial outcome. Among uncertainty selective neurons, 23/88 neurons were selectively excited by punishment uncertainty but not reward uncertainty (Fig. 5b; Supplementary Fig. 7). Zero were selectively inhibited by punishment uncertainty (Fig. 5b, c). Among the punishment-uncertainty neurons, nine neurons displayed significant variability (Kruskal–Wallis test; Methods) among the reward block CSs (Fig. 5c) but there was no obvious trend for positive or negative reward value coding among them. These results show that some ACC neurons signal punishment uncertainty in a selective manner, and that a minority of them can also multiplex this uncertainty signal with information about rewards.

Fig. 5 Distinct coding of reward and punishment uncertainty by many ACC neurons. a Two single ACC neurons, one that signals punishment uncertainty (left) and one that signals reward uncertainty (right). b Single neuron discrimination indices of punishment and reward uncertainty-selective neurons (left and right). Inset shows discrimination of eight neurons that displayed uncertainty selectivity in both reward (x-axis) and punishment blocks (y-axis). Discrimination indices were obtained by ROC analyses (Methods). Values below 0.5 indicate selective suppression, indices above 0.5 indicate selective enhancement. c Punishment uncertainty-enhanced neurons in the reward block (left-scatter plot) and punishment block (right-scatter plot). Scatter plots show, for individual neurons, the differences between 50 vs. 100% CS responses (x-axis) and 50 vs. 0% CS responses (y-axis). Red stars, neurons showing significant differences in CS response in both 50–100% comparison and 50–0% comparison. Green stars, neurons showing significant differences in only 50–0% comparison. Blue stars, neurons showing significant differences in only 100–50% comparison. Black stars, neurons showing no significant differences. Single neuron and average CS responses (right) for two major groups of punishment uncertainty enhanced neurons (black and blue stars on the left). d Reward uncertainty-enhanced neurons. Conventions are the same as in c Full size image

Consistent with the observation that ACC can signal task-related information in a valence-specific manner (Fig. 2), we also found neurons that preferred reward uncertainty but not punishment uncertainty (Fig. 5a, b). An example neuron is shown in Fig. 5a-right. This neuron responded to the 50% reward CS and maintained tonic firing until the time of the trial-outcome (reward or no reward). It was not modulated by punishment predicting CSs. 45/88 uncertainty selective neurons were excited by reward uncertainty, while 12/88 uncertainty selective neurons were inhibited. Some reward uncertainty excited neurons also carried information about punishment probability; Fig. 5d and Supplementary Fig. 7. Among these, almost all were enhanced by increased probability of punishment. Finally, a minority of uncertainty selective neurons (8/88) were selective for both reward and punishment uncertainty (Fig. 5b).

Like all other populations of uncertainty or risk-coding neurons, ACC uncertainty neurons discriminated between 100 and 0% CSs in their preferred block37, 55,56,57,58. Within the aversive block, 9/23 punishment uncertainty neurons responded more to 100 than 0% punishment CSs, and 3/23 responded more to 0 than 100%. 22/45 reward uncertainty enhanced neurons responded more to 100 than 0% reward CSs, and 0/45 responded more to 0 than 100% reward CSs.

Outcome deliveries following uncertainty elicit prediction errors43, 59, 60. However, the majority of ACC uncertainty neurons did not signal prediction errors (Supplementary Fig. 8). This observation provides further evidence for the notion that within ACC, there may be distinct groups of neurons tracking predictions and outcome-related or feedback-related prediction errors (Supplementary Fig. 3).

In sum, ACC can signal uncertainty about either rewards or punishments in a valence-specific manner or to multiplex information about uncertainty and value in multiple manners (Supplementary Fig. 9). Also, prediction errors following uncertain epochs were often signaled by distinct populations of neurons.

Uncertainty can arise due to variability in outcomes or due to the possibility of making an error. For example, in our behavioral procedure aborting a reward-associated CS is an error because it reduces the probability of reward on that trial to 0. Though we observed few such errors (Fig. 1), the anticipation of the abort cue resulted in increases of overt attention towards the reward-possible CSs and the activity of uncertainty neurons in ACC (Supplementary Fig. 10). Therefore, ACC uncertainty neurons are sensitive to uncertainty arising due to internal processing (that increases attention27) as well as due to uncertain stimulus-outcome associations.

Recent studies show that several subcortical brain regions in the septum and the striatum contain populations of neurons that signal the graded level of reward uncertainty56, 57. Because these brain regions receive inputs from the ACC61, 62, an important question is, do ACC reward uncertainty neurons also signal graded levels of uncertainty and reward size? To answer this question, ACC reward enhanced uncertainty neurons were recorded while monkeys participated in the reward probability/reward amount behavioral procedures used in our previous studies55,56,57, 63. The reward-probability block contained five objects associated with five probabilistic reward predictions (0, 25, 50, 75 and 100% of 0.25 ml of juice). The reward-amount block contained five objects associated with certain reward predictions of varying reward amounts (0.25, 0.1875, 0.125, 0.065 and 0 ml). The expected values of the five CSs in the probability block matched the expected value of the five CSs in the amount block. The block design was used to remove the confounds introduced by risk-seeking related changes in subjective value processing of the CSs37, 55,56,57.

Consistent with our previous reports using the same procedure, after conditioning, monkeys choices rank ordered the CSs in either block according to their expected values (Fig. 6a, see refs 55,56,57, 63) indicating they understood the meanings of the CSs. After training, 58 ACC reward uncertainty enhanced neurons were identified and studied in the single CS reward probability/reward amount behavioral procedure.

Fig. 6 Reward uncertainty ACC neurons are sensitive to the level of uncertainty. a Monkeys’ choice preference between CSs associated with different reward amounts and probabilities. Choice percentage of a single reward probability CS (x-axis) vs. all other reward probability CSs (top). Choice percentage of a single reward amount CS vs. all other reward amount CSs (bottom). Data are compiled from a data set of 13407 choice trials. b Average responses of 58 reward uncertainty-enhanced ACC neurons in the reward-probability block (left) and reward amount block (right). c Average normalized responses of the same neurons in the probability (red) and amount (black) CSs. Asterisks indicate differences between adjacent CSs (p < 0.05; paired sign-rank test). The result of a Spearman’s rank correlation assessing the relationship of neuronal firing and reward amount in the reward amount block is indicated below. Before averaging, single neuronal responses were normalized to the maximum CS response, from 0 to 1, across the ten different CSs. Error bars indicate standard error Full size image

The online identification of reward uncertainty neurons was the same as in our previous studies56, 57, 63. The average response of ACC reward uncertainty enhanced neurons is shown in Fig. 6b, c. During the reward-probability block, the neurons responded most strongly to the most uncertain CS (50% CS), more weakly to 25 and 75% CSs. The same neurons displayed the weakest response to the certain CSs (0 and 100%). In the reward amount block, their responses encoded reward size in a roughly linear manner, displaying highest firing for the greatest CS associated with the greatest reward sizes (Fig. 6c; rho = 0.3; p < 0.001; Spearman’s rank correlation; n = 58). These data indicate that, on average, ACC reward-uncertainty neurons signal the level of reward uncertainty and can also signal reward amounts in a roughly linear manner (Fig. 6b).