Neuronal signals in the prefrontal cortex have been reported to predict upcoming decisions. Such activity patterns are often coupled to perceptual cues indicating correct choices or values of different options. How does the prefrontal cortex signal future decisions when no cues are present but when decisions are made based on internal valuations of past experiences with stochastic outcomes? We trained rats to perform a two-arm bandit-task, successfully adjusting choices between certain-small or possible-big rewards with changing long-term advantages. We discovered specialized prefrontal neurons, whose firing during the encounter of no-reward predicted the subsequent choice of animals, even for unlikely or uncertain decisions and several seconds before choice execution. Optogenetic silencing of the prelimbic cortex exclusively timed to encounters of no reward, provoked animals to excessive gambling for large rewards. Firing of prefrontal neurons during outcome evaluation signals subsequent choices during gambling and is essential for dynamically adjusting decisions based on internal valuations.

Neuronal signatures of economic choice have been reported in the lateral orbitofrontal cortex (). In rodents, recent evidence has emerged that medial parts of the prefrontal cortex may be paramount (), but distinct neuronal underpinnings are yet to emerge. A series of findings has identified the prelimbic cortex as a key structure in value-guided decision making (), which is in line with findings linking the medial prefrontal cortex with the top-down cognitive control based on internal valuations across species (). The prelimbic cortex has also been suggested in contribution to behavioral flexibility, which enables adaptive control and allows spontaneous choices based on internal valuation (). During choices under risk, dopaminergic cells have been reported () to reflect the utility function of decisions. We aimed to unravel neuronal signals in the prelimbic cortex of rats that combine various relevant signals and reflect a binary choice output during gambling. For this purpose, we adopted a two-arm bandit-task design to incite inherent valuation processes for decision optimization during dynamically changing gambling conditions. We measured and manipulated neuronal activity in the prelimbic cortex of rats to uncover firing patterns, which, in the absence of perceptual cues or offers, signal upcoming decisions based on internal valuation of stochastic past experiences.

Discoveries of neuronal firing patterns reflecting economic and subjective value (), risk-taking (), or reward prediction () in distinct synaptic circuits () have provided mechanisms and inspired several models for value-based decision making (). In contrast to a standard behavioral task design, many of our decisions are not guided by external perceptual cues informing us about a correct or an incorrect choice, and decisions are not often based on perceptually presented stimuli with a deterministic consequence. Regularly, choices need to rely on experience-based inner valuations of different options with a probabilistic outcome distribution. Gambling tasks with changing reward contingencies serve as a model in which flexible decision making relies on internal valuations without external cue guidance and aims toward reward maximization and individual satisfaction. During gambling, encounters of choice options under uncertainty often lead to seemingly unpredictable decisions, and the neuronal mechanisms driving such unguided decision-making based on the inner valuation of probabilistic outcome remain poorly understood.

As prelimbic firing patterns during the occurrence of no-reward are predictive for subsequent choices, we tested whether optogenetic silencing of the prelimbic cortex, exclusively timed to no-reward encounters, impeded optimal decision making. We used a novel viral approach to express channelrhodopsin2 exclusively in GABAergic neurons () of the prelimbic cortex ( Figures 7 A and S8 K). Shining blue light via optic fibers into the prelimbic cortex for 1 ms at 66 Hz activated putative interneurons and inhibited the activity of 90% of prelimbic neurons ( Figure 7 B). Such bilateral and spatially restricted optogenetic silencing ( Figures S8 A–S8D and S8K) of the prelimbic cortex during task performance and timed only to no-reward encounters during gamble trials impaired the performance of rats ( Figures 7 C and 7D). Rats persisted in choosing the gamble-arm even during highly unfavorable reward contingencies compared to optimal choice selection with control stimulations during the run1 episode, during the reward episode, of rewarded-gamble or safe-arm trials, or without stimulation ( Figures 7 E–7H). The significantly increased number of gambles for prelimbic silencing during no-reward resulted in an increased number of disadvantageous decisions as animals continued to gamble for big rewards after non-rewarded trials even in situations when the evidence would indicate the opposite ( Figures 7 E–7G and S8 E). This alteration in choice behavior resulted in fewer arm changes ( Figure S8 F) and was reflected in parameters of the expected value and reinforcement learning model ( Figures S8 G and S8H). The increased tendency to take riskier choices might not result from reward-inducing effects of optogenetic interventions as further control experiments show that animals do not prefer locations where such optogenetic stimulations occur ( Figures S8 I and S8J). Thus, the firing of prelimbic neurons during no-reward encounters is required for adjusting decisions based on negative feedback.

(H) Optogenetic inactivation during reward experience on gamble (RR) or safe-arm trials (SafeR) or during the Run1 episode on any trial (R1) did not alter performance significantly. (E–G: data as mean ± SEM, n= 22, n= 17, n= 11, n= 10, n= 9; one-way ANOVA, post hoc multiple comparison Tukey test,p < 0.05,p < 0.01,p < 0.001; data from 4–7 rats, see supplementary information and Figure S8

(D–G) Inactivation of prelimbic cortex, timed to the experience of no-reward (Nor), increased number of gambles (D) compared to controls (E: F 4 = 8.403, p < 0.001). Ctr, no stimulation; R1, stimulation during run1 episode; RR, stimulation during reward experience on the gamble arm; SafeR, stimulation during reward episode on the safe arm. Compared to controls a persistent increase in (F) behavioral choice of gambles following earlier non-rewarded gamble-arm trials (F 4 = 6.940, p < 0.001) results in (G) higher number of disadvantageous actions when evidence for the safe arm was high (EV < 1; F 4 = 7.177, p < 0.001).

(C) Compared to control, bilateral optogenetic silencing of the prelimbic cortex, exclusively timed to the occurrence of no-reward during gamble-arm trials, increased gamble-arm choices when safe-arm choices would be favorable (first block) and after an unannounced decrease in reward probability on the gamble arm (arrow).

Next, we tested whether the predictive neuronal activity of those choice-predicting cells might also allow inference about future choice when the animal consumes a small reward on the safe arm. However, these cells did not exhibit a differentiating firing pattern for future choice during safe arm trials ( Figure 5 G). Using an elastic-net regression, we identified a distinct population of 88 cells, which significantly differentiated their firing rate during the reward episode of safe-arm trials depending on the choice of the animal in the following trial during ambiguous- and low-choice-evidence for gambling ( Figure 6 A). These neurons did not carry predictive power for upcoming choices during non-rewarded gamble-arm trials ( Figure 6 B). Thus, distinct sets of neurons in the prelimbic cortex provide a reliable and predictive firing-rate-based signal, indicating the upcoming choice on the following trial in an arm-specific manner. This aligns with the observation that neurons in our task design present strong arm-identity-dependent information ( Figures S6 A and S6B). When gamble and safe arm identity was switched halfway through the task (without physically moving the goal arms), neurons, in general, did not respond to changes in spatial location of the goal arms but maintained their firing according to gamble or safe arm identity ( Figure S7 A–S7F). This further confirms that firing of prelimbic neurons was related more to task and cognitive content rather than to spatial or motoric parameters.

(B) During unrewarded gamble arm trials, these neurons did not differentiate in their firing for distinct future choices (left panel: choice ep. Z = −2.6146, p = 0.0089, reward ep.: Z = −0.6321, p = 0.527; n.s. n = 81, right panel: reward ep.: Z = −1.2040, p = 0.2286, n = 79).

(A) Elastic-net regression identified 88 cells which significantly differentiate their firing rate during the reward episode of safe-arm trials depending on the choice of the animal in the following trial. Low-choice evidence: Z = −5.7220, ∗∗∗ p < 0.001, ambiguous choice evidence: Z = −5.0456, ∗∗∗ p < 0.001; n = 81 cells (left) and 86 cells (right) alpha at 0.00166 (Bonferroni corrected).

During the Encounter of Reward on the Safe Arm, the Firing of Another Subset of Prelimbic Neurons Indicates the Upcoming Choice of the Animal

Figure 6 During the Encounter of Reward on the Safe Arm, the Firing of Another Subset of Prelimbic Neurons Indicates the Upcoming Choice of the Animal

In contrast to the reinforcement learning model, which is based on behavioral parameters only, a prediction model based on the firing of these choice-predicting cells achieved accurate persistent forecasts of future choices even during ambiguous choice evidence ( Figure 5 F). In fact, the predictive power of the no-reward activated cell population benefited from the population of choice-predicting cells ( Figure S5 L).

If neurons indeed present a firing rate differentiation during reward evaluation dependent of future choice but independent of changes in goal value and reward prediction error, then they should continue to do so independent of prior experiences. First, we confirmed that the outcome during the trial before the non-rewarded gamble trial has no significant impact on the firing rate differentiation of choice-predictive cells ( Figure S5 C). Then, we analyzed recurring choice scenarios during four consecutive trials, with defined outcomes during the first three trials, and the firing rate of choice-predicting cells was analyzed dependent on what choice the animal will cast on the fourth trial ( Figures 5 E, S5 D, and S5E). Irrespective of whether the animal has experienced none or repeated reward omissions in the two previous trials, the choice-predicting cells always differentiated their firing in the third trial according to animal’s choice in the fourth trial. This confirms that previous outcomes and choices have no influence on the choice-predictive signal, suggesting its independence from expectancy and surprise. Within this population of choice-predicting cells, the predictive increase in firing is temporally restricted to the reward episode and does not persist significantly earlier or later in the non-rewarded trial ( Figures S5 D and S5E).

To adjust for choice evidence-modulated firing of no-reward activated cells, we subtracted the mean firing rate of no-reward activated cells during trial t−1 (excluding the reward episode) from the firing rate during reward episode of trial t. Using these relative firing rate values as input predictor for future choice in trial t+1, the elastic-net regression selected a population of cells, whose actually-recorded firing patterns exhibited stable high predictive power for future choices. Timed to the encounter of no-reward on the gamble-arm, these cells significantly differentiate their recorded firing rate according to the subsequent choice of the animal ( Figures 5 A–5D, S5 A, and S5B). Even during periods of ambiguous choice evidence or for unlikely upcoming choices (safe-arm choices during periods of high choice evidence for gambling), these “choice-predicting cells” exhibited significantly higher firing during the reward episode when the animal would change its strategy in the next trial and select the safe option as opposed to choosing a further gamble subsequently ( Figures 5 A, middle panel, 5 B, 5C, and S5 B). We observed that this firing differentiation was independent from reward, location, motoric confounds, different levels of value, choice evidence, and reward prediction error ( Figures S4 A–S4C and S5 F–S5H). Nevertheless, many of the latter variables present trial-to-trial variations, which could still account for firing rate variances in the investigated trial scenarios. Thus, we used the residuals of a multivariate regression (controlling against reward prediction error [RPE], action value for choosing the gamble-arm, trajectory changes, head-directional changes and choice evidence) instead of the firing rate as input to an elastic net regression analysis. The firing of the selected cells differentiated their firing according to future choice ( Figures 5 D and S5 K), confirming that future choice prediction of these cells is independent of reward prediction error, action value of the gamble-arm, trajectory changes, head-directional changes, and choice evidence. Furthermore, the identification of choice-predictive signals remained independent of different reinforcement models used (alternative RL model with similar predictive power; Figure S5 I) and independent of reinforcement model parameters during ambiguous choice situations when expected reward values were similar on both arms ( Figure S5 F) or probability of gamble is close to 0.5 ( Figure S5 G).

(G) On the safe arm, the firing of these choice-predicting cells does not indicate the upcoming choice in the next trial (n = 84; same statistics as in A, all episodes n.s.).

(E) Predictive firing rate differentiation was not influenced by differences in choice and outcome of the two preceding trials (two-way RM ANOVA, all group and time variables significant; left: p = 0.002, middle: p = 4.07eright: p = 0.015; 1and 2trial comparisons all n.s., mean ± standard deviation; see Supplemental Information and Figures S5 D and S5E).

(D) A multivariate regression of the firing rate of choice-predicting cells against variance changes of major task variables (choice-evidence, reward prediction error, head-direction, movement, and action value of the gamble arm) was performed. The resulting residuals were subjected to a lasso regression analysis. The firing rate of the selected cells (shown here) maintains a significant difference (p = 1.83e −4 ; n = 20 sessions) for distinct future choices, indicating their independence from these task variables.

(C) Different visualization of a choice-predicting cell across all non-rewarded gamble arm-trials (see Figures S5 A and S5B for more examples).

(B) Firing of a choice-predicting cell during reward episodes across 48 consecutive trials with mostly ambiguous choice evidence. Gold and black ticks indicate no-reward occurrence on the gamble-arm with a safe-arm or gamble-arm choice in the next trial, respectively.

(A) At the time of no-reward, firing of choice-predicting cells indicates rat’s choice in the next trial even during ambiguous choice evidence, before unlikely choices for the safe arm during high choice evidence for gambling, or for unlikely choices for the gamble arm during low choice evidence for gamble. Neurons significantly increase firing during the occurrence of no-reward on gamble-arm trials before the animal will change its strategy to the safe-arm in the next trial compared to a subsequent gamble-arm choice. (Signed rank test, adj. for multiple comparisons, ∗∗∗ p < 0.0001; left panel: Z = −5.9879, right panel: Z = −5.0817). Note that only unrewarded gamble-arm trials are considered here.

A Differentiating Firing of Choice Predicting Cells during the Encounter of No-Reward Indicates the Choice of the Animal in the Subsequent Trial

Observing an influence of primary task and decision variables on the prelimbic neuronal activity during the gambling behavior, we asked whether prelimbic firing patterns might be predictive for future choices during task performance. We applied an elastic-net regression as feature selector to identify the best predictors and evaluated their power with a general linear prediction model. Using firing rates of either all recorded prelimbic neurons or the activity of no-reward activated cells only as input for the regression, the resulting models, on average, correctly predicted 78.1% ± 1.3% and 75% ± 1.3% (mean ± SEM), respectively, of future choices across all behavioral conditions ( Figures S3 A–S3C). In order to explore which neuronal signals might be responsible for successful predictions, we focused on a key situation during gambling, when an animal chooses the gamble-arm but does not receive a reward. What will the animal decide to do in the next trial: continue gambling or play it safe? In this situation, the animals choose the safe arm in the subsequent trial with almost similar likelihood 45% ± 5% (mean ± SEM) as the gamble-arm. This intriguing scenario allowed us to control for goal-arm location and reward information as in all instances the animal is located on the same goal arm and receives no reward, while retaining high unpredictability about what choice the animal will cast in the following trial. We observed that changes in track trajectory, head direction (relative speed), and heading (degrees), during the reward episode were independent of the choice the animal will cast on the subsequent trial (see Figures S4 A–S4C). We analyzed whether the firing of no-reward-activated cells in such a scenario of non-rewarded gamble-arm trials is indicative of future choice. During periods of high-choice evidence for gamble, these cells fired with higher rates when the animal will change its strategy and select the safe-arm in the subsequent trial, compared to no-reward encounters, when the animal will decide to continue gambling on the next trial ( Figure 4 A). Indeed, no-reward activated cells predict future choices under such conditions and provided more predictive information toward a change in strategy than the other recorded cells ( Figures 4 B–4D). The accuracy of the prediction increased with increasing number of co-recorded cells (r = 0.374, p = 0.023). However, during periods of ambiguous choice evidence for gamble (with multiple choice fluctuations), no-reward activated cells, on average, did not significantly differentiate their firing rate according to future choices in upcoming trials ( Figure 4 A, right panel). Thus, the predictive power of no-reward activated cells appears at least partly linked to the previously described correlation of firing rate with choice evidence. Therefore, the firing of no-reward activated cells, as a population, may not allow a reliable prediction of subsequent trial choices on a trial-by-trial basis or during periods with ambiguous choice evidence.

(D) No-reward-activated cells (blue, n nor = 77) better predict the change of choice compared to other recorded cells (red, n other = 74) as indicated by their elastic-net coefficients after non-rewarded gamble trials. Left panel: two sample Kolmogorov-Smirnov test; right panel: Mann-Whitney U test: U = 1,676.000, p < 0.001; (A–C): n = 37 sessions).

(B) Receiver-operating statistics of successful predictions of future choices during non-rewarded gamble trials based on the firing of no-reward activated cells (mean curve ± SEM). Inlet: prediction accuracy per session mean = 74.2%; median = 73.3%.

(A) During non-rewarded gamble-arm trials, no-reward activated cells differentiate their average firing according to the subsequent choices of the animal for trials with high but not ambiguous choice evidence for gamble. Note: the difference in absolute firing rate between left and right panels contributes to prediction power. Wilcoxon signed rank test, alpha = 0.00167 (bonferroni corrected); left: reward episode: Z = −2.864, p = 0.0261; all other episodes n.s; n = 308; right: run1 episode: Z = −5.533, p < 0.0001; Reward episode: Z = −3.245, p = 0.0012; all other episodes n.s; n = 339.

We performed tetrode recordings in four rats during task performance and measured the activity of 1,006 neurons across 45 behavioral gambling sessions. A demixed principal-component analysis (DPCA) () and a multiple regression analysis ( Figure 2 ) revealed that neuronal firing in the prelimbic cortex differentiates according to different task episodes, occurrence or absence of reward, and according to modeled choice evidence. Strikingly, a major proportion of recorded neurons in the prelimbic cortex significantly increased their firing during the experience of no-reward at the gamble-arm. The firing of these classified no-reward activated cells (n = 402) was significantly correlated to the occurrence of no-reward during any three consecutive time-bins during the reward episode ( Figure 2 B, right panel). Additionally, these cells changed their firing according to choice evidence and exhibited higher firing rates during trials with low- and ambiguous- compared to high-choice evidence for gamble ( Figures 3 A–3D and S2 A–S2C). This distinction in firing rate according to different levels of choice evidence was restricted to the gamble-arm and was not observed during safe-arm choices, which were always rewarded ( Figure 3 B). The increased firing rate during low and ambiguous choice evidence for gamble, could not be explained by possible speed-related changes and thus unlikely reflect possible motivational biases ( Figures S1 F and S1G). No-reward-activated neurons were similarly present across animals (∼40%, Figure S2 D); their firing reflected current but not past reward information ( Figure S2 E) and allowed a correct prediction about reward occurrences on 90.97% ± 1.60% (mean% ± SEM) of trials ( Figure S2 F). Although the firing of some no-reward-activated neurons was not modulated by choice evidence ( Figures 3 D and S2 D), the activity of most of these cells exhibited additional correlation with choice evidence and arm/or chosen goal in a task-episode-dependent manner ( Figures 3 C, 3D, and S2 A–S2C), likely integrating context, goal value, and reward as reported earlier ().

(D) Correlations between firing rate and choice evidence during four indicated task episodes (unrewarded trials only, 402 neurons). Bottom: large fractions of no-reward activated cells exhibit significantly correlated firing (unrewarded trials) with choice evidence for gamble arm for at least one trial episode (run1, run2, reward, and/or intertrial episode, n = 402).

(C) Normalized firing of no-reward activated neurons for unrewarded (left, sorted for peak firing) and rewarded (right) gamble-arm trials. Note, cells with peak firing during but also outside of the reward episode differentiate firing during the reward episode according to reward occurrence (see Figure S2 B for individual examples; for visualization purposes, maxima and minima outside the color range are omitted).

(B) Firing of no-reward activated cells (n = 402) according to reward occurrence, modeled choice evidence, and arm choices. Note an increased firing during non-rewarded trials and during periods of ambiguous and low choice evidence for gambles exclusively on the gamble-arm.

(A) Firing rate during the reward episode of two no-reward activated neurons increases during unrewarded gamble trials (blue) and depends on choice evidence. Red: rewarded gamble-arm trials.

Note: to be classified as a no-reward activated neuron, firing rate had to depict significant correlation in three (out of 9) consecutive time bins during the reward episode. Dark gray depicts chance level.

(B) Multiple regression analysis indicates correlations of reward occurrence, modeled choice evidence, and spatial arm location with firing of prelimbic neurons along trial episodes (left panel). The firing of large subsets of prelimbic neurons is correlated with reward omission or reward occurrence in a time-dependent manner (right panel).

(A) Demixed principal-component analysis (PCA) reveals major contributors of firing rate variance on gamble-arm trials. Time and episode modulation contributes most significantly to the firing rate variances followed by choice evidence and reward as denoted in the pie chart segments. The first 15 principal components of the demixed PCA and its contributing variables for gamble-arm firing rate modulation are depicted in the bar graph (left top panel). The first two component contributions are cut for comparison reasons. The upper-right triangle of the top right panel depicts dot products between all pairs of the first 15 demixed principal axes. Stars denote significantly non-orthogonal principal components; note components 5 and 4 indicating an interaction of choice evidence- and reward omission-related neuronal activity. Bottom left triangle shows correlations between all pairs of the first 15 principal components. A selection of the main principal components indicates time-dependent reward modulation (C. #5 and #6), choice evidence modulation (C. #4 and #8), and task episode modulation (C. #1 and #3). Black lines indicate significant periods.

Inspired by bandit tasks for humans, we trained rats to choose freely and without cue guidance between a certain-small reward on the “safe-arm” of a Y-maze or a possible-big reward on the “gamble-arm” ( Figure 1 A). The likelihood of reward on the gamble-arm was changed twice during a session, altering the advantage between the two arms or allocating similar merit to both options ( Figures 1 B and S1 A). We observed that animals were able to adjust their choices to maximize the long-term amount of reward and follow the changes in reward contingencies ( Figures 1 C, 1D, and S1 B–S1E). Choice behavior of animals adapted based on the diverse reward experiences during individual behavioral sessions ( Figures S1 B and S1C). Comparing the animals’ behavior to an optimal agent allowed only a measure of performance but did not allow adequate tracking of subjective values and goal preferences during individual behavioral sessions ( Table S1 ). Among several tested behavioral models ( Table S1 ), we applied a reinforcement-learning (RL) model () to estimate subjective goal values and predicted 80.4% (±2.7 SEM) of the animals’ choices. We refer to the modeled probability for a subsequent choice of the gamble-arm as “choice evidence for gamble” ( Figure 1 B). Modeled choice evidence allowed a more refined representation of subjective goal value changes during the task ( Figures S1 B and S1C). As expected, we observed fewer choices for the gamble-arm during episodes of low choice evidence for gamble ( Figure 1 E). During periods with ambiguous choice evidence, rats adjusted their strategy more often and made significantly more changes between the two arms ( Figure 1 F), while running speed did not correlate with different levels of choice evidence ( Figures S1 F and S1G).

Data as mean ± SEM, post hoc multiple comparison Student-Newman-Keuls method ∗ p < 0.05; ∗∗∗ p < 0.001, n amb = 45, n low = 39, n high = 45 sessions for (D) and (E), 4 rats for (C), (E), and (F).

(C) Each animal (R#1–R#4) was able to maximize reward (see also Figure S1 A) according to reward occurrence on the gamble arm (blue curve: logistic fit function; gray: only Gamble-arm choices above an expected gamble arm reward value [EV] of 1 are optimal).

(B) For an individual session, choice evidence for going to the gamble-arm was calculated with a reinforcement learning model and classified as low (purple), ambiguous (turquoise), or high (blue). Animals had to explore both arms in forced trials at the beginning of each block of trials with a defined reward probability on the gamble-arm. Dotted line: expected value for the gamble arm based on reward occurrence. Ticks indicate observed choice for each trial.

(A) Running on a Y-maze, rats choose between a small and always-available or a big reward given only with a 12%, 25%, or 75% probability.

Discussion

Our adapted framework of a two-arm bandit-task required animals to explore and integrate probabilistic reward outcomes and, accordingly, adapt policy selection to strive toward reward maximization. During this task, animals often exhibited volatile choice behavior, and the experience of a negative outcome on the gamble arm presented the animal regularly with an ambiguous decision scenario on whether to choose again the gamble arm or the safe arm in the next trial. Taking advantage of this behavioral scenario, we discovered neuronal firing activity in the prelimbic cortex, which informs differentially about future choices of the animal in economic decision making. We discovered a specialized subset of neurons, whose firing patterns during the evaluation of negative outcome signals choice in the subsequent trial even for unlikely choices and more than 5 s before this decision is executed. Optogenetic inhibition of prelimbic activity, exclusively timed to the encounter of no-reward, resulted in an increased number of gambles and confirmed the importance of the underlying prelimbic neuronal activity for optimal decision making. These results suggest reliable reflection of intrinsic evaluation processes of negative outcomes by prelimbic neurons signaling future choice. These signals are well suited for optimizing action adaptations, in particular during conflicting choice situations.

Fujisawa et al., 2008 Fujisawa S.

Amarasingham A.

Harrison M.T.

Buzsáki G. Behavior-dependent short-term assembly dynamics in the medial prefrontal cortex. Hare et al., 2011 Hare T.A.

Schultz W.

Camerer C.F.

O’Doherty J.P.

Rangel A. Transformation of stimulus value signals into motor commands during simple choice. Svoboda and Li, 2018 Svoboda K.

Li N. Neural mechanisms of movement planning: motor cortex and beyond. The predictive firing-rate increase of choice-predicting cells precedes an upcoming change to the safe arm on the next trial by several seconds and is highly time restricted to the encounter of no-reward. Thus, they contrast with working-memory signals observed in the prelimbic cortex in tasks where animals are required to hold goal relevant cue information in memory (). There is no evidence that choice-predicting cells in the prelimbic cortex remain informative for subsequent choices for prolonged periods lasting into the next trial (see Figure S5 E). Furthermore, the temporal span of at least 5–6 s between firing rate differentiation and decision manifestation and the resulting mix of motoric behaviors during those time windows (transfer, waiting time, and run initiation) makes it highly unlikely that the signal corresponds to preparatory pre-action or pre-motoric signals ().

Ito et al., 2015 Ito H.T.

Zhang S.J.

Witter M.P.

Moser E.I.

Moser M.B. A prefrontal-thalamo-hippocampal circuit for goal-directed spatial navigation. Kim and Shadlen, 1999 Kim J.N.

Shadlen M.N. Neural correlates of a decision in the dorsolateral prefrontal cortex of the macaque. Matsumoto et al., 2003 Matsumoto K.

Suzuki W.

Tanaka K. Neuronal correlates of goal-based motor selection in the prefrontal cortex. Padoa-Schioppa and Assad, 2006 Padoa-Schioppa C.

Assad J.A. Neurons in the orbitofrontal cortex encode economic value. Future choice prediction has been reported for a cohort of neurons in the anterior cingulate cortex (). The predictive firing rate of these cingulate neurons peaked just before the execution of the decision of which goal arm to choose. This would be comparable to the run1 episode in our task, suggesting different signals for choice execution in cingulate cortices and choice-predictive signals for the subsequent trial in prelimbic cortices described here during reward evaluation. Decisions in our task design mainly reflect changes in internal goal valuations—they are not associated with manipulations of external perceptual and sensory cues—informing about optimal choice as in cue-dependent and perceptual-decision tasks (). Similarly, seminal work from Padoa-Schioppa and colleagues describe neuronal responses in the primate lateral OFC signaling the upcoming “chosen juice” following stimulus presentation (). The cue presented during each trial distinctively informed the animals about goods and outcome. In light of our data, it might be interesting to explore how those neurons in OFC of monkeys fire when subjective good values are presented in an almost equally matched manner and with a stochastic outcome distribution.

Bell, 1982 Bell D.E. Regret in decision making under uncertainty. Loomes and Sugden, 1982 Loomes G.

Sugden R. Regret theory: an alternative theory of rational choice under uncertainty. Chua et al., 2009 Chua H.F.

Gonzalez R.

Taylor S.F.

Welsh R.C.

Liberzon I. Decision-related loss: regret and disappointment. Steiner and Redish, 2014 Steiner A.P.

Redish A.D. Behavioral and neurophysiological correlates of regret in rat decision-making on a neuroeconomic task. Bell, 1982 Bell D.E. Regret in decision making under uncertainty. Loomes and Sugden, 1982 Loomes G.

Sugden R. Regret theory: an alternative theory of rational choice under uncertainty. Choice-predicting cells increase their firing during negative outcome on the gamble arm only when the animal will go to the safe arm in the subsequent trial. Thus, they do not just signal negative and unsatisfactory outcome but much more act as a stable indicator when an imminent choice adaptation is favored. This presents a signal that provides a potent driver of behavioral change during reward evaluation independent of choice evidence and goal value. The increase in firing rate might influence goal value updating in interconnected networks to guide choice to an alternate goal in the future. This is a signal reminiscent of “regret,” a long-standing concept of decision-making in economics (). Although regret has already been linked to dorso-medial and dorsolateral prefrontal cortex (PFC) in humans (), a neuronal mechanism has not yet been identified. Regret, in contrast to disappointment, carries self-blame about one’s choice and thus a stronger negative affective reaction to the outcome of the agent’s choice. A better choice could have been made, which potentially carries a more direct effect on subsequent choices. Regret can only be experienced once the agents either can infer the likely outcome of alternative options or is informed about the outcome of the not-taken alternative. Our task design fulfills this requirement as, during the reward episode of non-rewarded gamble-arm trials, the animal can infer that it would have received a small reward on the safe-arm. Prelimbic regret-based signals may link reciprocal OFC and ventral striatal signals () and potentially drive or reinforce decisions contrary to value but in favor of utility and complement utility-based decision processes specifically during decision making under uncertainty ().

In conclusion, the firing patterns of choice-predicting cells present an intrinsic evaluation process of negative outcomes that signal future choice and provide a neuronal framework for understanding individual decisions during economic choice. These signals’ apparent independence from value representations and reward prediction error signals provides a nuanced view on neuronal correlates and models of decision-making processes.