Summary of Findings

In the present study, we explored how cells in NCL encode value as an integrative sum of reward amount and delay to reward. We recorded from 207 NCL cells during a task where birds were required to peck a stimulus that predicted either a small or large reward, following either a short or long delay period. We examined the firing of these cells during the period that the birds were presented with each stimulus, and in the delay period prior to reward.

When cells where filtered on the basis of showing a significant Stimulus effect in the Sample period, 35 of 207 cells (16.9%) fired in a pattern that closely mirrors the birds’ stimulus preferences (note that a shorter response latency to a stimulus indicated that the birds valued the stimulus higher). Specifically, the birds responded fastest to the stimulus that predicted a short delay followed by a large reward (S3) over the stimulus that predicted a short delay followed by a small reward (S1), and both of these stimuli were preferred over the two stimuli that predicted a long delay (L3 and L1) which were no different from each other. The neural data in the Sample period of these 35 cells mirrored these stimulus preferences. When the period immediately following the stimulus presentation was analysed, cells still fired in preference for S3, however the differentiation between S1 and the other stimuli disappeared. Given that it cannot be guaranteed that the bird is looking at the stimuli at this time, the first analysis used is likely to be a more accurate representation of the cell’s response to the stimulus. Despite evidence for value coding in the Sample period, the stimulus preferences of these sample-selective cells were not carried over into either the Delay-1 or Delay-2 periods. However, in the first 500 ms of the reward period, these cells fired significantly more to S3 than to L1. The difference in firing was not evident in the second 500 ms of the reward period.

When cells were filtered based on activity in the Delay-1 period, 25 of 207 cells (12.0%) showed a significant effect of Stimulus. There was no difference in firing to any of the four stimuli in the Sample, Delay-1, or Delay-2 periods of these 25 cells. In other words, although a particular cell might have shown a preference for one stimulus over another, a different cell would have shown a different pattern of preference, such that when all these cells were combined, as in our population plot, no clear stimulus preference emerged. The firing of cells filtered based on the Delay-1 period did however show some difference in firing in the first 500 ms of the reward period. Cells fired more to S3 than to either L1 or L3, but the difference in firing disappeared in the second 500 ms of the reward period.

Finally, when the cells were filtered based on activity in the Delay-2 period, 39 of 207 cells (18.8%) displayed a significant effect of Stimulus. Interestingly, these 39 cells that were selected on the basis of their firing during the Delay-2 period did display some evidence of value coding in the Sample period in that the cells fired significantly more to S3 than to all stimuli except L1. Similar to the situation for the Delay-1 period, there was no difference in firing to any of the four stimuli in the Delay-1 or Delay-2 periods. However, we found that differentiation did occur to some extent in the last 500 ms of the Delay-2 period in that the neural responses to S1 and S3 were no different to each other, S1 differed from both L1 and L3, but S3 only differed from L3 and the difference between S3 and L1 neared significance. In other words, cell firing in the last half of the delay-2 period began to differentiate between short and long delay. While there was no evidence of actual value coding from the Delay-2 period cells, the change in firing pattern seems to reflect anticipation of reward.

There was also very little evidence for value coding in the reward period. The only evidence for value coding was in the first 500 ms of the reward period expressed generally as an increased firing to S3, but the effect was weak. By the second 500 ms of the reward period there was no evidence of any value coding.

Comparison to Other Studies

Previous studies showed that NCL appears to encode reward amount9 as well as subjective reward value of a stimulus by integrating reward amount and delay-to-reward in choice paradigms10. In both of these studies, the NCL activity showed modulation during the delay period prior to the delivery of the reward. The findings of the current study support the role of NCL in detecting value of conditioned stimuli and rate coding the “best” option (short delay-large reward) during the Sample period when the bird is shown the stimulus. Our study also finds that the modulation of NCL activity based on reward amount and delay-to-reward is not exclusive to choice paradigms. Instead, NCL cells react in response to a single stimulus that indicates a reward and its temporal cost, so any change in activity cannot be explained by factors that could be at play in choice paradigm, such as an integrated value assessment based on being presented with two different options.

Previously, NCL activity has been found to be modulated by reward delivery9,16. At the most, the firing patterns observed in the current study in the reward period reflected to some extent the “best” reward outcome when cells were filtered by activity in the Sample and the Delay-1 periods. The fact that activity reflected the best option is consistent with findings of neural activity of mammalian OFC. The OFC is thought to play an important role in updating information about expected rewards4,15. Our finding that activity of NCL cells during reward delivery was modulated to some extent by the value of a reward suggests that NCL may play a role similar to the OFC in updating expected reward outcomes to optimise future behaviours.

Our findings also have implications for a recent study by Kasties et al.11, whose design was on the surface similar to that of the current study, yet failed to find any evidence for value coding in NCL. They found that while NCL cells responded differently to four different stimuli, there was no evidence of modulation in NCL activity in response to reward amount, delay length, nor an interaction of reward and delay length. The absence of value coding in NCL neural activity occurred despite the fact that, based on latencies, their subjects showed clear preferences for the stimuli that were similar to that of our own birds. In contrast, we did find modulation in NCL activity in response to stimuli that predicted different reward amounts. What may have accounted for the differences between the two studies? We incorporated a number of small changes that we believe assisted the pigeons to differentiate between the different stimuli and the reward outcomes they predicted. First, in contrast to Kasties et al.11 who delivered reward on only 50% of trials, we delivered reward on 100% of trials. Furthermore, in contrast to Kasties et al.11 who manipulated reward amount by increasing the duration for which access to food was available, we manipulated reward amount by increasing the number of reward presentation periods. We believe that by increasing the reward delivery periods, and by delivering reward on 100% of the trials, the birds were better able to determine the “best” option. As a result, we saw that in both the behavioural data, and in the neural data during the Sample period, NCL cells clearly reflected the most valuable outcome, followed by the next most valuable outcome.

Our finding of no value coding in the delay seems to stand in contrast to that of Koenen et al.9, who reported reward modulation during the delay period. Closer inspection of our data, however, revealed the emergence of reward modulation in the last half of the Delay-2 period. The fact that Koenen et al.9 reported more evidence of reward modulation in the delay period may be due to the fact that they employed a 3 s delay whereas we only analyzed a 2 s delay period. More to the point, it is more likely that the activity we began to witness towards the end of the Delay-2 period, and that observed by Koenen et al.9 is better classified as activity related to the anticipation of reward than value coding. The reason in the case of the current study is that the increase in neural activity was observed on short delay trials irrespective of whether there were one (S1) or three (S3) rewards imminent. Thus, the neural activity in the last half of the Delay-2 period was more likely coding an upcoming reward rather than the value of the recently-seen stimulus. The same outcome was reported by Koenen et al.9 in the no-choice condition, where reward modulation was observed in that the cells displayed much less firing to 0 upcoming rewards than to 1 or 3 upcoming reward, the latter two being neurally indistinguishable. It thus appears that reward modulation in the delay is more a representation of whether a reward is imminent, than a code for the value of the recently seen stimulus.

Implications for Value Coding in NCL

The literature supporting NCL as a functional analogue to the mammalian PFC, at least with respect to value coding, is relatively small. Based on the mammalian literature the PFC, and in particular the OFC, is involved in encoding value. Studies of the mammalian OFC find that firing is modulated by reward magnitude and in anticipation of reward delivery2,3,4. Furthermore, OFC activity is modulated in response to changes in reward value achieved by manipulating the delay to reward only at the presentation of a stimulus, during the delay period and when reward is delivered5,6. To date, the research in the avian brain has confirmed that NCL is implicated in encoding reward amount and in encoding delay to reward9,10. The present study adds to the current knowledge in that NCL appears to show properties similar to the OFC with respect to encoding value based on both delay to reward and reward amount. We can now add value coding relating to temporal discounting to the list of functional similarities with the mammalian PFC.