Animals

All experimental procedures were performed in accordance with guidelines established by the Okinawa Institute of Science and Technology Experimental Animal Committee. Serotonin neuron-specific ChR2(C128S)-expressing mice were produced by crossing Tph2-tTA mice with tetO-ChR2(C128S)-EYFP knock-in mice5,6. Seven male bigenic and five male WT mice, aged >4 months at the beginning of the behavioral training period, were used in the study. Animals were housed with one mouse per cage at 24 °C on a 12:12 h light:dark cycle (lights on 07:00–19:00 h). Seven bigenic (one for experiment 1 only, one for experiment 2 only, five for both experiments 1 and 2) and five WT animals contributed to the data reported here. Training and test sessions were conducted during the light period 5 days per week. Mice were deprived of food in their home cage and received their daily food ration during the experimental sessions only (~2–3 g per day). Food was freely available during the weekend and removed >15 h before the experimental sessions started. Water was freely available in the home cage.

Surgery

After mice had mastered the sequential tone-food waiting task, they were anesthetized with equithesin (3 ml/kg, i.p.), and an optical fiber (400 μm diameter, 0.48 NA, 4 mm length, Doric Lenses) was stereotaxically implanted above the DRN (from bregma: posterior, −4.6 mm; lateral, 0 mm; ventral, −2.6 mm). The optical fiber was fixed to the skull and anchored with dental acrylic and stainless steel screws. Animals were housed individually after surgery and were allowed at least 1 week to recover.

Reconstruction of optical stimulation sites

Mice were deeply anesthetized with 100 mg/kg sodium pentobarbital i.p. and were then perfused with 0.9% NaCl, followed by 10% formalin. Their brains were removed and stored in 10% formalin for a minimum of 24 h before being sliced into 60 mm coronal sections. Cresyl violet staining was used to help verify placements of optical fiber tracks (Fig. 1c).

Behavioral apparatus and training

A free operant task that we designated as a sequential tone-food waiting task was used. Mice were individually trained and tested in an operant-conditioning box (Med-Associates) measuring 21.6 cm × 17.8 cm × 12.7 cm. The box could be illuminated with a single 2.8 W house light located in the top center of the rear wall. One speaker was positioned in the top right side of the rear wall. Three 2.5 cm square apertures were positioned 2 cm above the floor. The rear stainless steel wall of the chamber contained one aperture defined as the tone site. On the front wall, two apertures defined as the food sites were positioned 7 cm apart. Both apertures on the front wall were connected to a food pellet dispenser that delivered a food pellet (20 mg) to these apertures. In all experiments, only the right food site was used, and the left aperture was covered with an opaque window to prevent nose poking. An infrared photo-beam crossed the entrances of all of the apertures to detect nose poke responses positioned at a depth of 0.5 and 1 cm from the bottom of the aperture. The operant box was illuminated by a house light and was enclosed in a sound-attenuating chamber equipped with a ventilation fan. When the mouse poked its nose through the apertures in the back and front walls, the control infrared photo-beam was interrupted to detect the mouse’s responses. The tone site nose poke induced an 8 kHz tone (0.5 s, 85 dB) from the speaker. At the food site, a small food pellet (20 mg) was delivered into the aperture through the food dispenser. All experimental data were recorded with an EPSON personal computer that was connected to the operant box via an interface using MED-PC IV software (Med-Associates).

The beginning of the sequential tone-food waiting task was signaled by turning on the house light, and termination was indicated by turning off the house light. The behavioral instrumental response in this task was for the mouse to hold its nose in a fixed posture in either the tone site aperture while waiting for the conditioned reinforcer tone or the reward site aperture while waiting for the food reward. This task required the mice to perform alternate visits and nose pokes to the tone site and the reward site. The mouse initiated a trial by nose poking in a fixed posture to achieve continuous interruption of the photo-beam at the tone site during a delay period until the tone was presented, signaling that a food reward was available at the reward site. After the tone was presented, the mouse was required to continue nose poking at the reward site during another delay period until the reward was delivered. The delay period that preceded the tone was called the tone delay and that which preceded the food was termed the reward delay. During the initial training period, the tone delay and the reward delay were fixed at 0.2 s.

Two types of error were present in this task: the tone wait error and the reward wait error. The tone wait error and the reward wait error occurred when the mouse failed to wait for the tone and the food, respectively, during the delay period, by keeping its nose in a fixed posture. After the tone wait error, the mouse could restart the trial until it succeeded in waiting for the tone. A trial ended when the mouse received the food or a food wait error. During a trial, the tone wait error could occur multiple times. By contrast, the reward wait error could only occur one time. Occurrences of tone and reward wait errors were not signaled. Mice could start the next trial at any time after food consumption or after making a reward wait error. Mice were trained daily for a period of 2 h. In 2 weeks or less, mice learned the sequential tone-food waiting task.

In vivo optical stimulation during the task

During the test session, an external optical fiber (400 μm diameter, 0.48 NA, Doric Lenses) was coupled to the implanted optical fiber with a zirconia sleeve. The optical fiber was connected to an optic swivel (Doric Lenses) that allowed unrestricted in vivo illumination. The optic swivel was connected to 470 nm blue and 590 nm yellow LEDs (470 nm: 35 mW, 590 nm: 10 mW, Doric Lenses) to generate the blue and yellow light pulses through the optical fiber (960 μm diameter, 0.48 NA, Doric Lenses). Blue and yellow light power intensities at the tip of the optical fiber, as measured by the power meter, were 1.2–2.8 mW and 1.4–1.8 mW, respectively. The LED was controlled by the transistor-transistor-logic pulses generated by a MED-PC IV.

Experiment 1: effect of reward probability and reward value

To examine whether reward prediction modulates the effect of serotonin on patience during waiting, we prepared six tests in which the RP and the reward amount were changed (75% reward one-pellet, 75% reward two-pellet, 25% reward one-pellet, 25% reward three-pellet, 50% reward one-pellet, and 50% reward three-pellet tests) (Supplementary Fig. 1). The tone and reward delays were fixed at 0.3 and 3 s, respectively. One test of experiment 1 lasted 3000 s or until the mouse completed 40 trials. The tones in the 75% one-pellet, 75% two-pellet, 25% one-pellet, 25% three-pellet, 50% one-pellet, and 50% three-pellet tests were set at 8 kHz (0.5 s), white noise (0.5 s), 2 kHz (0.25 s) followed by 7 kHz (0.25 s), click (0.5 s), 7 kHz (0.25 s) followed by 2 kHz (0.25 s), and 2.5 kHz (0.5 s), respectively. Removing the nose for >500 ms before the end of the reward-delay period caused a reward wait error, in which no reward was presented. The trials in which serotonin neurons were or were not optogenetically stimulated were named serotonin activation trials or serotonin no-activation trials, respectively (Supplementary Fig. 1). For serotonin activation trials, 0.8 s of blue light was randomly applied for half of the trials at the onset of the nose poke to the reward site following the tone presentation. For serotonin no-activation trials, 0.8 s of yellow light were applied for half of the trials at the onset of the nose poke to the reward site following tone presentation. One trial was ended by applying 1 s of yellow light at the onset of food presentation or the reward wait error (Supplementary Fig. 1).

We executed 75, 25, and 50% reward tests separately. The sequence of 75, 25, and 50% tests was changed for each mouse. During the 75% reward test, 1 or 2 days were used for training in the one-pellet and two-pellet tests and then the recording sessions were started. Each mouse experienced both the one-pellet and two-pellet tests at least once per day. During recording sessions, the order of the one-pellet and two-pellet tests was counterbalanced by daily recording. During both the 25 and 50% reward tests, 1 or 2 days were used for training in the one-pellet and three-pellet tests and then recording sessions were started. Each mouse experienced both one-pellet and three-pellet tests at least once per day. During the recording sessions, the order of the one-pellet and three-pellet tests was counterbalanced by daily recording.

Experiment 2: effect of reward timing uncertainty

To examine whether the timing of presentation of an expected reward influences promotion of patience by serotonin, we prepared four delayed reward tests with 75% RP, in which the timing of reward delivery was changed: (i) the reward delay was fixed at 6 s (D6 test) (Supplementary Fig. 6a); (ii) the reward delay was randomly set to 4, 6, or 8 s (D4-6-8 test) (Supplementary Fig. 6b); (iii) the reward delay was randomly set to 2, 6, or 10 s (D2-6-10 test) (Supplementary Fig. 6c); and (iv) the reward delay was fixed at 10 s (D10 test). One test of experiment 2 lasted 3000 s or until the mouse completed 40 trials. The tone was 0.5 s at 8 kHz and was fixed through four reward-delay conditions. Removing the nose for >500 ms before the end of the reward-delay period caused a reward wait error, in which no reward was presented. Light stimulation patterns during the serotonin activation and serotonin no-activation trials were the same as in experiment 1. In the D4-6-8 and D2-6-10 tests, the eight trial patterns (two light conditions multiplied by four delay lengths) were randomly selected without repetition until all items were selected, and then this selection was repeated five times. In the D6 and D10 tests, eight trials (three fixed delay with serotonin activation, one omission with serotonin activation, three fixed delay without serotonin activation, and one omission without serotonin activation) were randomly selected without repetition until all items were selected, and then this selection was repeated five times.

We executed the D6, D4-6-8, D2-6-10, and D10 test sessions in this order. In each reward-delay test session, the first day was a training session followed by 3 or 4 days of recording sessions. The 1-day recording sessions consisted of at least one reward-delay test. For two mice, D4-6-8 and D6 test sessions were further executed in this order after D2-6-10 test session (one mouse) or D10 test session (one mouse). Since in both D6 and D4-6-8 test sessions, waiting time in omission trials did not differ significantly between first and second sessions, data from first and second sessions were merged for analysis (in the D6 test, P > 0.10 with serotonin activation, P > 0.10 without serotonin activation, Mann–Whitney U-test; in the D4-6-8 test, P > 0.79 with serotonin activation, P > 0.13 without serotonin activation, Mann–Whitney U-test).

Data analysis

No statistical tests were used to determine sample size, but our sample sizes were similar to those employed in our previous study7. To examine how serotonin neuron activation promotes waiting for delayed rewards, we focused on waiting time during omission trials. To quantify effectiveness of serotonin neuron activation at promoting waiting time during omission trials, we calculated the waiting time ratio (waiting time with serotonin neuron activation/waiting time without serotonin neuron activation) for each test. Statistically significant differences (waiting time or waiting time ratio) between two groups were assessed by Mann–Whitney U-test. To compare waiting time in serotonin activation and in serotonin no-activation by within animal averages, we used paired t-test. For analysis of ChR2-expressing group (ChR2) data and control group (WT) data, two-way ANOVA using light effect (two levels; yellow and blue) as within-subject factors and group effect (two levels; ChR2 and WT) as between-subject factors were used. The normality of data for paired t-test and two-way ANOVA were assessed by Shapiro–Wilk test. We have checked a homogeneity of variance of the waiting time ratio data in experiments 1 and 2. Since data did not satisfy homogeneity of variance in both experiment, non-parametric statistical tests were used. To examine the main effect of RP (three level; 75, 50, and 25%) and that of expected reward value (four levels; 0.25, 0.5, 0.75 and 1.5 EPs per trial) on promoting waiting time, Scheirer–Ray–Hare test, which is non-parametric method equivalent to two-way ANOVA, followed by the Bonferroni correction for multiple comparisons was used for analysis of the waiting time ratio. A linear mixed model analysis was performed, taking the waiting time ratio (Y) as a dependent variable, RP, and EP as independent variables with fixed effect, and MI as an independent variable with random effect. We fitted the model to data using R package {lme4} with the formula Y = RP + EP + (1|MI). To test difference of means, we used Z-value instead of t-value because the degree of freedom of t-value is not readily available for an unbalanced mixed model. Further, to test whether variance of mice is zero, it is not appropriate to use a χ2-test because the null hypothesis is located in the end of domain of variance. As a bail-out method, we used a parametric bootstrap. Kruskal–wallis test followed by Bonferroni correction for multiple comparisons was used for analysis of the waiting time ratio in experiment 2. In Bonferroni correction for multiple comparisons, P-values of pairwise Mann–Whitney U-tests were multiplied by m, where m was the number of pairwise Mann–Whitney U-tests. Statistically significant differences were achieved when P-value × m < 0.05. m was 15 and 10 in Scheirer–Ray–Hare test and Kruskal–wallis test, respectively. Data collection and analysis were not performed blind during the experiment, and no randomization was used. In a very small number of omission trials, mice removed the nose from the reward site within 1.5 s (in the 75% one-pellet test, 2 for serotonin activation trial and 2 for serotonin no-activation trial; in 50% three-pellet test, 3 for serotonin activation trial, and 4 for serotonin no-activation trial; in the 50% one-pellet test, 1 for in serotonin activation trial; in the 25% three-pellet test, 4 for serotonin activation trial and 2 for serotonin no-activation trial; in the 25% one-pellet test, 1 for serotonin activation trial and 1 for serotonin no-activation trial; in the D10 test, two for serotonin no-activation trial). These data were excluded from the analysis. Statistical analyses were performed using SPSS, Matlab (MathWorks), and R.

Bayesian decision model of waiting

Each trial had a hidden state X = {reward, no-reward}, and for a reward trial, the timing of reward delivery was given by a Gaussian distribution N(t; μ, σ2). Given an observation that a reward had not been delivered by time t, the likelihood for a reward trial was 1 – f(t; μ, σ2), where f is the cumulative Gaussian density function, whereas the likelihood for a no-reward trial was one. The posterior probability for a reward trial, given observation of no reward by time t is

$${P}\left({{\mathrm{reward}|t}} \right) = {P}\left( {{\mathrm{reward}}} \right) \times (1-{f}({t};\mu,\sigma^{2}))/[P\left( {{\mathrm{reward}}}\right) \times ({\mathrm{1}}-{f}({t};\mu,\sigma^{2})) \\ + {P}\left({\rm{no}}\,{\rm{reward}}\right)],$$

where P(reward) and P(no reward) are prior probabilities of reward and no-reward trials.

The expected reward to keep waiting was V(wait|t) = P(reward|t) for a unit of reward, while the expected reward for quitting was V(quit|t) = 0 as no reward is obtained by quitting. By assuming a softmax action selection, the choice probability to keep waiting at time t is

$${{P}}\left( {{\mathrm{wait|}}{t}} \right) = {\mathrm{1/(1}} + {\mathrm{exp[}}-{\beta} \times {P}\left( {{\mathrm{reward|}}{t}} \right){\mathrm{]),}}$$

where β is the inverse temperature parameter regulating the stochasticity of choice. The distribution of the time of quitting P quit (t) is given by sequential decisions:

$$\begin{array}{l}{P}_{{\mathrm{wait}}}\left( {\mathrm{0}} \right) = {\mathrm{1,}}\\ {P}_{{\mathrm{wait}}}\left( {{t}} \right) = {P}_{{\mathrm{wait}}}{(t}-{\tau}) \times {P}\left( {{\mathrm{wait|}t}} \right){\mathrm{,}}\\ {P}_{{\mathrm{quit}}}\left( {t} \right) = {P}_{{\mathrm{wait}}}{{(t}}-{{\tau)}} \times \left( {{{1}}-{P}\left( {{\mathrm{wait|}t}} \right)} \right){\mathrm{,}}\end{array}$$

where P wait (t) is the probability of continuing to wait until time t and τ is the interval of repeated decision to wait or to quit. In Fig. 7, we used parameters τ = 0.1 s and β = 50. The code of the Bayesian waiting decision model was written in Python.

Code availability

The code used to generate the results that are reported in this study are available from the corresponding author to responsible request.

Data availability

Data from the experiments presented in this study are available from the corresponding author to responsible request.