Behaviour

Subjects’ choices in the card game were faster when choosing the objectively better deck as opposed to the worse deck (P<10−5, two-tailed t-test, n=1,268 and 592 trials, Supplementary Fig. 1a). They were faster following immediately previously observed wins compared with immediately previously observed losses (P=0.003<0.05, t-test, n=987 and 873, Supplementary Fig. 1b) and following increasing numbers of coherent previous outcomes (for example, the left card losing and the right card winning both coherently predict the right card deck to be the better one; P=0.016, Spearman test, n=1,860, Supplementary Fig. 1c). Previous outcomes in the chosen deck did not influence reaction times significantly (Supplementary Fig. 1d–f). Choice time was significantly higher during the first round of a game, when no prior knowledge was available (P<10−5, ANOVA, n=1,860) but not significantly different between subsequent rounds 2–5 (P>0.05, ANOVA, n=1,488, Supplementary Fig. 1g) and no effect on choice times was found for the side, from which a card was drawn (P>0.05, t-test, n=966 and 894, Supplementary Fig. 1h).

For a more in depth analysis we also constructed a normative hierarchical Bayesian reversal-learning model (Fig. 1d and ‘Methods’ section). The model-derived trial-by-trial difference in the expected values between card decks and the choice entropy (a measure of choice difficulty that captures the model-estimated uncertainty about a choice) both reliably predicted the likelihood that subjects would pick a card from a particular deck (logistic regression analysis predicting choices of the left deck over the right deck: expected value difference: P<10−5; choice entropy: P<0.005, one-tailed t-test, n=10 subjects). Furthermore, choice entropy also predicted how long subjects would take to make the respective choice (multiple linear regression predicting choice time: expected value difference: P>0.05; choice entropy: P<10−5, one-tailed t-test, n=10), indicating subjects’ decisions were slower when the model predicted they were more difficult. These analyses confirm that the model quantitatively captured trial-by-trial variation in subject behaviour in this task (Supplementary Fig. 2a).

To test more directly whether subjects’ choices were explained by past win/loss outcomes and prediction errors, we performed further regression analyses. These analyses showed that subjects did in fact learn from both the previous win/loss outcomes of their own self-experienced choices and those of the other players whose choices they observed (logistic regression analysis predicting subject choices on current trial t from the previous two win/loss outcomes of each player, spanning previous trials t-1 to t-6; one-tailed t-test, averaging over previous two choices for self-experienced trials: P=0.0006 and separately, for observed trials: P=0.0001, n=10; Supplementary Fig. 2b; see ‘Methods’ section). The results from this analysis imply that subjects’ choices were a function of prediction errors computed from both self-experienced and observed past outcomes. To more directly test this relationship, we used the full prediction error term [win/loss—choice expected value (computed from the reversal-learning model)] from the most recent past trial for both self-experienced and observed outcomes in the same regression model to predict subject choices in the current trial t. This analysis furnished strong evidence that subjects’ choices in the current trial could indeed be predicted by the most recent self-experienced and observed prediction errors (self-experienced: P<10−7; observed: P<10−5, n=10; Supplementary Fig. 2c; see ‘Methods’ section), thereby motivating our attempts to identify neuronal correlates of self-experienced and observational prediction errors in the human brain.

Neuronal response properties

While subjects performed the experimental paradigm we recorded neuronal spiking activity using microwires implanted in their AMY, rmPFC and rACC13 (Fig. 1e). From 842 recorded units, we isolated 358 single neurons (42.5%, Supplementary Fig. 3) and all subsequent analysis was conducted using these single-units only (125 neurons in the AMY with a mean firing rate of f=2.51 +/− 0.22 Hz; 95 in the rmPFC, f=1.72 +/− 0.18 Hz; and 138 in the rACC, f=2.28 +/− 0.19 Hz; f was not found to be significantly different across areas in an ANOVA, P>0.05, n=358). During task performance the mean firing rate in all three brain areas was elevated, albeit non-significantly (f(AMY)=2.88 +/− 0.35 Hz, f(rmPFC)=2.2 +/− 0.35 Hz, and f(rACC)=2.81 +/− 0.28 Hz, P>0.05/3, Bonferroni corrected t-test, measured when the cards appeared on the table at the beginning of each round, n=125, 95 and 138, Supplementary Fig. 4). No significant difference in firing rate was observed in response to the low-level difference in the individual card decks (suite/colour) in any of the three brain areas (P>0.05/3, Bonferroni corrected t-test, n=125, 95 and 138, Supplementary Fig. 4).

To initially compare the mean neuronal response profiles across the three brain areas, both before and after outcome, we selected only units, which showed a significant increase in their mean firing rate across all card game trials at outcome, independent of the trial type or outcome (self-experienced/observed and win/lose respectively). For this comparison we used a conservative response criterion based on the h-coefficient14, which returned 32 units in the AMY, 9 in the rmPFC, and 24 in the rACC (5,760, 1,620 and 4,320 trials, respectively), analyzing three time periods: the choice period (−500–0 ms), an early response period (500–1,000 ms), and a late response period (1,500–2,000 ms, at t=0 ms the outcome was revealed). In the AMY we recorded a higher mean firing rate during self-experienced trials compared with observed trials during the early response period (P=0.002<0.05/3, Bonferroni corrected t-test, n=1,920 and 3,840) and the late response period (P=0.003<0.05/3 Bonferroni corrected t-test, n=1,920 and 3,840). In the rmPFC, we found the same numerical difference, but the effect was not significant (P>0.05/3, Bonferroni corrected t-test, n=540 and 1,080). Conversely, in the rACC the response properties were reversed, displaying a higher firing rate during observed trials as compared to self-experienced trials. These rACC firing rates were found to be significantly different between the two trial types during the choice period, before the outcome was revealed (P=0.006<0.05/3, Bonferroni corrected t-test, n=1,440 and 2,880; Fig. 2a).

Figure 2: Comparing response envelopes in outcome responsive neurons. (a) Peristimulus time histograms (300 ms bin width, 10 ms step size) were calculated across selected units for self-experienced (light colours, mean +/− s.e. of the mean, s.e.m.) and observed trials (dark colours; balanced for high win/loss trials and low win/loss trials, respectively) and the mean firing rates in the two trial types were compared with each other in three different time intervals: choice (magenta), early response (cyan) and late response (yellow; *P<0.05/3, t-test, n AMY =1,920 and 3,840, n rmPFC =540 and 1,080, n rACC =1,440 and 2,880). (b) Combining the self-experienced and observed trials revealed a significant decrease in the rmPFC firing rate during choice and a significant increase in all three brain areas during the early and late response periods (*P<0.05/9, t-test compared with −3,000 to −1,000 ms, n AMY =5,760, n rmPFC =1,620, n rACC =4,320). (c) The same data as in b smoothed and plotted with the bootstrapped 95% c.i. further emphasized the sharp hiatus in the rmPFC neurons’ firing rate at −370 ms (half minimum onset at −470 ms and offset at −230 ms, left panel). The response in the rACC was significantly earlier (half-maximum, vertical lines) than in both the AMY and the rmPFC and higher than in the AMY (peak, horizontal lines; right panel, zoomed in on orange highlight in left panel). Full size image

To further compare the mean response envelopes across the three brain areas we analyzed all trials combined (self-experienced and observed) and compared the mean firing rate during the three time periods to a pre-choice period serving as baseline (−3,000 to −1,000 ms). This analysis revealed a significant, sharp cessation of activity in the rmPFC, shortly before the outcome was revealed (P=0.0009<0.05/9, Bonferroni corrected t-test, n =1,620; Fig. 2b). 10,000 bootstrapped smoothed15 mean response envelopes further emphasized the sharp cessation of firing during the choice period in the rmPFC and were used to measure response onset times (half-maximum) and response amplitudes (Fig. 2c). The response onset in the rACC (249.986 +/− 30 ms, 95% c.i.) was significantly earlier than in the AMY (380.325 +/− 35 ms, P=<10−5<0.05/3, Bonferroni corrected t-test, n=10,000) and the rmPFC (385.19 +/− 85 ms, P=0.0026<0.05/6, Bonferroni corrected t-test, n=10,000), while no difference in onset time was observed between the AMY and the rmPFC (P=0.5>0.05/6, Bonferroni corrected t-test, n=10,000). The amplitude of the responses was higher in the rACC than in the AMY (P=0.001<0.05/6, Bonferroni corrected t-test, n=10,000) but not significantly different between the rACC and the rmPFC or between the rmPFC and the AMY (P>0.05/6, Bonferroni corrected t-test, n=10,000).

Outcome encoding

After finding specific differences in response envelopes and onset times between the three brain areas, we investigated the three complete neuronal populations’ general response properties to winning versus losing. For this analysis, we measured the absolute mean difference in each individual neuron’s firing rate between winning and losing trials (subtracting the mean differences before the outcome was revealed, −1,500–0 ms, cf. Fig. 2a right panel). In self-experienced trials this mean response difference increased in all three brain areas after the outcome was revealed (t=0 ms, Fig. 3a). However, only the neuronal population in the rACC also showed an increase of the mean response difference after outcome in both observed and slot machine trials, while in the AMY and rmPFC this effect was only very weak or absent (repeated t-tests and mean response difference higher than the 95% of 10,000 bootstrapped means calculated over the pre-response period, Fig. 3a). We then asked whether it is the same rACC neurons that encode outcome across all three different trial types. If this were the case, we would expect the rACC population’s mean response difference values to be correlated between any given pairing of trial types (for example, self-experienced versus observed trials). While we found such a correlation in all three brain areas between self-experienced and observed outcomes, only in the rACC did we find that the mean response difference values were indeed correlated across all three trial type pairings (self-experienced versus observed, self-experienced versus slot machine, and observed versus slot machine; P<0.01, Pearson correlation over time and also during a predefined response period, n=138 neurons; Fig. 3b,c and Supplementary Fig. 5). These results indicate that not only do rACC neurons encode winning versus losing in all three trial types, but also that a subset of rACC neurons individually encoded all three outcome types; self-experienced outcomes, observed outcomes, and even outcomes in the slot machine trials, an entirely different win/lose task (for an example see Supplementary Fig. 6; notably, in some cases a reversal of the response direction between trial types could also be observed. For example the unit in Fig. 3d,e displayed what might be termed a shadenfreude response, increasing its firing rate for self-experienced wins and observed losses and decreasing its firing rate for self-experienced losses and observed wins).

Figure 3: Outcome encoding. (a) The mean response difference between winning and losing (at t=0 ms) in self-experienced trials in the AMY, rmPFC and rACC (n=125, 95 and 138, respectively). Top panels: percentage of neurons with values outside of the 95-percentile of values recorded during the 1 s immediately before outcome (bold lines show percentages which are higher than expected based on a binomial test with α=0.01). Middle panels: the mean (+/− s.e.m.) response difference values (arrowheads and horizontal lines mark the upper end of the 95% c.i. of 10,000 bootstrapped means during the 1 s before outcome). Bottom panels: time points where the mean response difference values were significantly different from those before outcome (P<0.01/3, t-test, n AMY =125, n rmPFC =95, n rACC =138). (b) Pearson correlation coefficients between the individually plotted mean response difference values from a, bold traces highlight a significant correlation between the two variables (P<0.01/3, n AMY =125, n rmPFC =95, n rACC =138). (c) The same analysis as in b in the rACC but with values averaged over the time window highlighted there in orange. Every data point represents the mean response difference between winning and losing in a single rACC neuron, in all three comparisons we found a significant Pearson correlation (p (S-E versus Obs)<10−4<0.01, r(S-E versus Obs)=0.425; p(S-E versus Slot)<10−4<0.01, r(S-E versus Slot)=0.812; and p(Obs versus Slot)<10−4<0.01, r (Obs versus Slot)=0.311, n=138). (d) Waveform of all spikes recorded from an rACC neuron (n=10,828, mean +/− s.e.m in red, middle 95% of values in black, insert: interspike interval frequency plot). (e) For the same neuron as in d raster plots and peristimulus time histograms of wins (top) and losses (bottom) in self-experienced (left), observed (middle), and slot machine trials (right). This example neuron not only responds differentially to winning and losing in all three trial types, but notably does so inversely for observed outcomes compared with self-experienced and slot machine outcomes. Full size image

Amount encoding

In the unsigned and averaged population analysis of mean response difference values, more subtle, directional coding or the encoding of task variables within subpopulations of neurons may remain unobserved. We therefore additionally investigated, whether a subpopulation of neurons, specifically selected for directionally encoding the amount won or lost in observed trials (−$100, −$10, +$10 or +$100), also encoded that same parameter in self-experienced trials. In this analysis only neurons were included, whose firing rate increased as the observed amounts increased in at least one time point after the observed outcome was revealed (300–900 ms, P<0.05/2, Bonferroni corrected for positive and negative regression coefficients; n AMY =30, n rmPFC =25, n rACC =40). In the AMY and rmPFC this selected subpopulation of neurons on average also positively encoded self-experienced amounts, increasing their firing rate as the subject gained higher amounts (P<0.01, cluster statistical analysis over 10,000 equivalent but label shuffled datasets, Fig. 4a; the same selected subpopulation of neurons in the AMY also fired significantly higher in response to self-experienced outcomes than to observed outcomes, P<10−5<0.05/3, Bonferroni corrected t-test, n=1,800 and 3,600 trials, Fig. 4b). In the rACC, however, the selected subpopulation of neurons on average encoded self-experienced amounts with the opposite, negative sign, decreasing their firing rate as the subject’s gains increased (P<0.01, cluster statistical analysis over 10,000 equivalent but label shuffled datasets, Fig. 4a; Supplementary Fig. 7).

Figure 4: Amount encoding. In this analysis only neurons showing a positive regression of their firing rate to the observed amount were included (n=30, 25 and 40 in the AMY (left), rmPFC (middle) and rACC (right), respectively). (a) Top panels: same as in Fig. 3. Middle panels: the mean (+/− s.e.m.) t-statistic of the regression coefficients of the firing rates to self-experienced amounts and observed amounts (revealed at t=0 ms). Bottom panels: time points after outcome with mean regression coefficient t-statistic values, for the sum of which a cluster statistical analysis across 10,000 label shuffled datasets was significant (α<0.01). We note that an increase in the regression coefficient for observed amounts after outcome is caused by the implicit selection bias; for self-experienced amounts, however, no selection bias was present. (b) The peristimulus time histograms of the selected populations of units were analyzed in the same way as in Fig. 2a revealing a significant difference in the AMY response amplitudes between self-experienced and observed trials (*P<0.05/3, n=1,800 and 3,600). (c) The mean values of the regression coefficients in a during the outcome period (300–900 ms, orange line in a). Post-hoc testing revealed a significant difference in the distance between observed and self-experienced values between the AMY and the rACC (*P<0.05/3, t-test, n=30 and 40 right panel). (d) Spike waveform of an example neuron presented as in Fig. 3d (n=10,536). (e) Raster plots and peristimulus time histograms for the same neuron as in d showing higher firing rates for self-experienced losses than for self-experienced wins and higher firing rates for observed wins than for observed losses reflecting the findings in a, left panel. Full size image

We compared this amount encoding across the three brain areas using the mean t-statistics of the regression coefficients for amount won or lost over the whole response period (300–900 ms). In this comparison, we found no significant effect for observed amount encoding across the three areas (F (2, 92)=1.315, P=0.274>0.05, ANOVA, n=95 neurons; Fig. 4c), but we did measure an effect for self-experienced amount encoding (F(2, 92)=5.484, P=0.006<0.05, ANOVA, n=95). Taken together, the distance between self-experienced and observed values forms a measure of the asymmetry between self versus observed amount encoding. This measure also showed a significant effect across the three brain areas (F(2, 92)=5.484, P=0.006<0.05, ANOVA, n=95). Post-hoc comparisons of this distance measure revealed no significant difference between the AMY and the rmPFC (P=0.238, Bonferroni corrected t-test, n=30 and 25), nor between the rmPFC and the rACC (P=0.067, Bonferroni corrected t-test, n=25 and 40), but a significantly larger distance was measured in the rACC than in the AMY (P=0.003<0.05/3, Bonferroni corrected t-test, n=40 and 30). These results suggest that the difference in amount encoding between the three brain areas was driven by differential encoding for self-experienced and observed outcomes primarily between AMY and rACC, reflecting the observation that in these AMY units there was little difference between amount encoding for self and other, while amount was encoded with opposite signs for self-experienced and observed outcomes in rACC (for an example of a single rACC neuron displaying this type of encoding see Fig. 4d,e; for localization of the neurons selected in this analysis see Supplementary Fig. 8a; when instead selecting for self-experienced amount encoding neurons, the same directionality was observed but only reached significance in the AMY, P<0.01, cluster statistical analysis over 10,000 equivalent but label shuffled datasets, Supplementary Fig. 8b).

Encoding of observational learning parameters

Having found evidence of observed outcome and amount encoding, we investigated if these effects may contribute to the encoding of observational PEs. We therefore tested if neurons not only encoded how rewarding an observed event was (outcome amount) but also how rewarding it was expected to be (expected value). In particular, we tested specific predictions of algorithms originally developed in artificial intelligence known as temporal difference learning models16. The prediction error term from these learning algorithms has been shown to closely resemble the activity of a subpopulation of primate dopamine neurons during self-experienced reinforcement learning2,5,17. Beyond the PE when the outcome is revealed, temporal difference models additionally postulate that a PE signal should occur at the earliest predictive event, signalling the difference between the new expected value and the immediately preceding expected value. In observed trials this occurs at the point at which the other player’s choice is revealed (card highlighted, Fig. 1b), and, in our task setting, is approximated by the expected value of the observed choice. We therefore tested if neurons encoded the following tripartite coding scheme during observed trials: a positive expected value signal before the outcome was revealed (a positive correlation of their firing rate to the expected value at the point of choice); a positive signal of the amount won or lost by the observed player (a positive correlation of their firing rate to the amount); and a negative expected value signal after the outcome was revealed (a negative correlation of their firing rate to the expected value but after the outcome was revealed), with the combination of and constituting the full PE signal—the amount obtained minus the expected value. We selected neurons based on prediction by only including units that showed a positive effect of the expected value at choice (P<0.05/2, Bonferroni corrected for positive and negative regression coefficients, in at least one time point during the choice period of −900 to –300 ms; n AMY =14, n rmPFC =9, n rACC =22), and then tested for , a positive effect of amount and , a negative effect of the expected value, after the outcome was revealed. Note that selection of these units during choice means that there is no selection bias for statistical tests performed during outcome.

Using these criteria, the selected neurons in the AMY and rmPFC did not show a significant observational PE signal (compared with 10,000 equivalent but label shuffled datasets, Fig. 5a; as above, these selected AMY neurons also responded with a higher firing rate during self-experienced trials than during observed trials during the early and late response periods, P=0.002 and P=0.003<0.05/3, Bonferroni corrected t-tests, n=840 and 1,680 respectively, Fig. 5b). In the rACC, however, during observed trials the selected subpopulation of units did encode a positive PE by encoding the amount positively (P<0.01) and the expected value negatively at outcome (P<0.01, cluster statistical analysis over 10,000 equivalent but label shuffled datasets, Fig. 5a; Supplementary Fig. 9; the selected rACC neurons encoding this observational PE also significantly decreased their firing rate during the self-experienced choice period, before the outcome was revealed, P=0.006<0.05/3, Bonferroni corrected t-test, n=1,320 and 2,640, Fig. 5b; we did not observe any significant PE encoding when selecting units in the same way but for self-experienced trials, although the encoding profile in rmPFC was suggestive; Supplementary Fig. 10). Comparing the mean t-statistics of the regression coefficients over the whole response period (300–900 ms), we found no difference between the three brain areas for observed expected value encoding (F (2, 42)=1.561, P=0.222>0.05, ANOVA, n=45) or observed amount encoding (F (2, 42)=2.919, P=0.065>0.05, ANOVA, n=45; Fig. 5c) on their own. We did, however, measure a significant difference between the observational prediction error effect, defined as the amount effect minus the expected value effect (the distance between the two), across the three brain areas (F(2, 42)=3.964, P=0.026<0.05, ANOVA, n=45). Post-hoc t-test comparisons revealed no significant difference in this PE term between the AMY and the rmPFC (P=0.189, n=14 and 9) or between the rmPFC and the rACC (P=0.26, n=9 and 22), but a significant difference was found between the AMY and the rACC (P=0.011<0.05/3, n=14 and 22).

Figure 5: Observational learning in the rACC. This analysis only included neurons with a positive regression of their firing rate to the expected value during observed choice (−900 to −300 ms). The data is presented in the same way as in Fig. 4. (a) In the AMY (left, n=14) and rmPFC (middle, n=9) the selected neurons did not show any significant PE encoding after the outcome was revealed. However, in the rACC (right, n=22), besides the selected-for predictive encoding of the expected value (), neurons additionally encoded the amount positively () and the expected value negatively () at outcome, as postulated by formal learning theory. (b) The selected neurons in the AMY fired higher during self-experienced trials than during observed trials in the early and late response period, while in the rACC the firing rate during self-experienced trials was reduced significantly during the choice period (*P<0.05/3, t-test, n=1,320 and 2,640). (c) Post-hoc testing revealed a significant difference in the distance between the amount term and the expected value term between the AMY and the rACC (*P<0.05/3, t-test, n=14 and 22, right panel). (d) Localization of the recording sites of the neurons selected in this analysis (MNI space, Supplementary Table 1). (e) Three examples of individual neurons from the rACC subpopulation selected in this analysis presented as in Fig. 3d,e but showing the mean (+/− s.e.m) for the upper and lower quartile (25%) of trials ordered according to their PE values. All three units show a higher firing rate for the upper quartile than the lower quartile after the outcome was revealed (at t=0 ms; the middle panel shows the same unit as in Fig. 4). Full size image

The rACC subpopulation of neurons encoding observational learning parameters (22 out of 138 rACC units) was localized predominantly in the rostral gyral subdivision of the cingulate cortex (Fig. 5d). The analysis of the regression coefficients (Fig. 5a) demonstrates a linear relationship between the firing rate and the prediction error. In a more conservative analysis, in which however this linear relationship is lost, we additionally investigated to what extent we could still observe the same coding scheme by simply comparing the firing rates during high and low PE value trials with each other. In this test, 15 of the original 22 selected neurons still showed the same effect (time points with higher firing rate and non-overlapping s.e.m. in the upper as compared to the lower quartile of PE trials during the response period; for example Fig. 5e, the neuron in the middle panel being the same unit as in Fig. 4). While the number of units selected in this less sensitive analysis is relatively small (15 out of 138 rACC units) it is still significantly higher than expected by chance (P=0.0015<0.01 in a binomial test).