All experimental procedures were carried out in accordance with the institutional animal welfare guidelines and licensed by the UK Home Office and the Swiss cantonal veterinary office. A virus expressing GCaMP6f or GCaMP6m (AAV2/1-hsyn-GCaMP6-WPRE;) was injected in the primary visual cortex (V1) in the right hemisphere of C57Bl/6J mice (P49–P57). Imaging and behavioral training started approximately 3 weeks after surgery. We imaged GCaMP6-labeled neurons in layer 2/3 in 93 training sessions and 12 recording sessions under isoflurane anesthesia in 11 mice with a custom-built resonant scanning two-photon microscope with a frame rate of 32 Hz. Supplemental Experimental Procedures contain further details about surgical and imaging procedures.

A subset of mice was trained to switch between blocks of an olfactory and visual discrimination task. In olfactory blocks, mice performed an analogous olfactory go-no go discrimination task in which they were rewarded for licking in response to one of two odors. During this task, mice were also presented with the vertical and angled grating corridor at different positions in the approach corridor. Mice learnt to ignore these irrelevant grating stimuli while accurately discriminating the odors. On switching to the visual block, mice started licking selectively to the rewarded grating as before. See Supplemental Experimental Procedures for further details about the visual stimulus, behavioral tasks, and training.

Mice were head-fixed and trained to run on a styrofoam cylinder. A reward delivery spout was positioned near the snout of the mouse, and licks were detected using a piezo disc sensor. Mice were then trained in a visual discrimination task in which the running speed on the cylinder was detected with an optical mouse and used to control the speed at which mice moved through a virtual environment presented on two screens in front of them. A trial started when the mouse was positioned at a random starting point in an approach corridor with walls showing black and white circles on a gray background. When the mouse reached a specific point in the corridor, it was randomly teleported to one of two grating corridors with either a vertical or an angled grating on the walls. In the vertical grating corridor, the mouse was rewarded with a drop of soya milk, for licking the spout after it had entered a “reward zone,” a short distance into the grating corridor. No punishment was given for licking in the angled grating corridor.

Bilateral silencing of V1 was carried out in four transgenic mice (three males, one female) expressing channelrhodopsin-2 in parvalbumin-expressing interneurons (). Additionally, three male wild-type C57Bl/6J mice underwent identical surgical and experimental procedures. Mice were implanted with two cranial windows over both visual cortices. Intrinsic imaging was used to determine the extent of V1, and all regions excluding V1 were covered with black paint. In expert mice (>90% performance levels), V1 was silenced by illuminating both cranial windows with 470 nm light at one of four intensities shortly before and during the grating corridor. In 30% of trials no light stimulation was applied. The same mice were also trained on an olfactory discrimination task as described above (but without grating stimuli). V1 was silenced shortly before and during presentation of the odors. For further details, see Supplemental Experimental Procedures

Data Analysis

Image stacks were corrected for motion, and regions of interest (ROIs) were selected for each cell in each session. Raw fluorescence time series F(t) were obtained for each cell by averaging across pixels within each ROI. Baseline fluorescence F 0 (t) was computed by smoothing F(t) (causal moving average of 0.75 s) and determining for each time point the minimum value in the preceding 60 s time window. The change in fluorescence relative to baseline, ΔF/F, was computed by taking the difference between F and F 0 , and dividing by F 0 .

To analyze responses to the vertical and angled grating corridors, neuronal activity was aligned to the onset of the grating corridor for each trial. A Wilcoxon rank-sum test was used to determine if responses—the average ΔF/F in a time window of 1 s after grating onset—in the two conditions were significantly different (p < 0.05), and the sign of the difference determined the response preference. The persistence of stimulus preference ( Figure 2 F) was defined as the probability that a cell that significantly preferred one of the two gratings on one day also preferred the same grating on the next day. Recruitment of non-selective cells ( Figure 2 G) was defined as the probability that a cell with no stimulus preference on one day became selective to one of the two gratings on the next day. We computed these measures for three stages of learning, based on the behavioral d-prime (bDP) of two consecutive sessions: before learning (bDP of both sessions < 1, and ΔbDP < 0.5, Nsession = 14), during learning (bDP of first session < 2, bDP second session > 0.5, and ΔbDP > 0.5, Nsession = 14), and after learning (both bDP > 2 and absolute change in bDP < 0.5, Nsession = 19). Varying the criteria to define different stages of learning led to similar results (data not shown).

S I = ( R ¯ V − R ¯ A ) / s p V A ,

To quantify the selectivity of neural responses we computed a response selectivity index (SI) for individual cells from the difference between the mean response in the first second after grating onset to the vertical and angled grating corridor, divided by the pooled standard deviation of the responses

s p V A = ∑ i = 1 k = 2 ( n i − 1 ) s i 2 / ∑ i = 1 k ( n i − 1 ) ,

where

i is the number of trials in condition i for k conditions. Therefore, positive values indicate a preference for the vertical grating corridor and negative values a preference for the angled grating corridor. Please note that in the manuscript text the term selectivity substitutes for SI. To obtain a combined measure of grating discriminability for simultaneously imaged populations of neurons, population selectivity was computed by taking the average of the squared selectivity index across cells and taking the square root: ( ∑ i N c e l l S I 2 ) / N c e l l .

and nis the number of trials in condition i for k conditions. Therefore, positive values indicate a preference for the vertical grating corridor and negative values a preference for the angled grating corridor. Please note that in the manuscript text the term selectivity substitutes for SI. To obtain a combined measure of grating discriminability for simultaneously imaged populations of neurons, population selectivity was computed by taking the average of the squared selectivity index across cells and taking the square root:

Efron, 1979 Efron B. Bootstrap methods: another look at the jackknife. A bootstrap test () was used to test for significant differences between conditions that contained both dependent and independent data points. To test whether changes in the proportion of cells preferring the vertical or angled grating, or without preference across two conditions (typically before and after learning), were significant, we first computed for each session the proportions of cells in each category. Next, we randomly picked the same number of sessions (the minimum across conditions) from both conditions, and repeated this 10,000 times. We then computed in both conditions the average cell proportion across sessions, and we also computed the proportion after randomly assigning sessions to one of the two conditions. The p value was given by the number of bootstraps in which the proportion change in the actual data was greater than the proportion change with randomly assigned condition labels. Similarly, bootstrapping was also used to assign significance to the differences in population selectivity, decoding performance, and pre-stimulus activity increase, by comparing the difference in the original data to the difference with randomly assigned condition labels.

To control for the effect of running speed and optic flow on neural responses and selectivity across learning, grating responses were compared specifically in trials that were matched for running speed across sessions and stimulus conditions ( Figures 5 A and 5B). First, the average running speed was determined in sliding 200 ms time windows from –0.5 to +0.5 s around the onset of the grating corridor (50 ms step size). Then responses in each time window of each trial were assigned to one of three groups, depending on running speed (three bins divided equally from the 2.5% percentile to the 97.5% percentile of the average running speed, across all sessions). Data for each time window were only included if it contained at least ten trials of both grating conditions. In the highest speed bin, not enough matched data were available across learning, thus restricting the analyses to the lowest speed bin (referred to as “slow”) and the intermediate speed bin (“fast”).

μ n V (t)) and angled grating corridor ( μ n A (t)) and the variance of the noise σ n to maximize the observed log-likelihood of the data under a Gaussian noise model. On test trials (the remaining trials that were not used as training data), the log-likelihood at time t that trial k belongs to condition C (where C was for instance the vertical (V) or angled grating corridor (A) condition) is proportional to L C ( t ) = − ∑ n N c e l l ∑ 0 T s t a r t ( D n , k ( t − T s t a r t ) − μ n V ( t − T s t a r t ) ) 2 / ( 2 σ n 2 ) ,

To quantify the accuracy with which two conditions (either trials with vertical and angled grating corridors ( Figures 3 B, 5 C, and 5D) or FA and CR trials ( Figure 6 D) could be classified at time t relative to grating onset, a cumulative decoder was employed. From training data (30 trials of both conditions), the decoder constructed for each neuron n a model of the response using as parameters the mean response to the vertical ((t)) and angled grating corridor ((t)) and the variance of the noise σto maximize the observed log-likelihood of the data under a Gaussian noise model. On test trials (the remaining trials that were not used as training data), the log-likelihood at time t that trial k belongs to condition C (where C was for instance the vertical (V) or angled grating corridor (A) condition) is proportional to

V > L A , the trial was assigned to the vertical condition, otherwise to the angled grating condition. To obtain at each time point t the cumulative likelihood L C , the summation only included time points starting from T start , which was the time of the grating onset, up until time t. Note that without the temporal accumulation of log-likelihood, the decoder would be equivalent to a linear discriminant analysis. To determine the time point at which there was a detectable divergence of running speed between vertical and angled grating trials, we performed a Wilcoxon rank-sum test on the average speed in nonoverlapping, consecutive 50 ms windows. The time of divergence was defined as the center of the first window with p < 0.01 followed by p < 0.01 in at least four consecutive windows. For where D indicates deconvolved ΔF/F (see Supplemental Experimental Procedures ). If L> L, the trial was assigned to the vertical condition, otherwise to the angled grating condition. To obtain at each time point t the cumulative likelihood L, the summation only included time points starting from T, which was the time of the grating onset, up until time t. Note that without the temporal accumulation of log-likelihood, the decoder would be equivalent to a linear discriminant analysis. To determine the time point at which there was a detectable divergence of running speed between vertical and angled grating trials, we performed a Wilcoxon rank-sum test on the average speed in nonoverlapping, consecutive 50 ms windows. The time of divergence was defined as the center of the first window with p < 0.01 followed by p < 0.01 in at least four consecutive windows. For Figure 5 D, we defined post learning sessions with delayed divergence as sessions with behavioral d-prime > 2 and time of running speed divergence greater than 400 ms (n = 8 sessions in n = 7 mice, average d-prime 2.59). We paired each of these sessions to a unique session with the smallest difference in behavioral d-prime, but with time of divergence < 400 ms (n = 8 sessions in n = 6 mice, average d-prime 2.61).

To analyze responses during FA and CR trials, only sessions with at least 15 FA trials were included in the analysis ( Figure 6 ). These were predominantly sessions at intermediate learning stages, as most expert mice made very few mistakes by the end of training (see Figure S1 ). Behaviorally modulated cells were defined as cells with significantly different activity for FA and CR trials in the first second after grating corridor onset (p < 0.05, Wilcoxon rank-sum tests). To obtain average responses for cells preferring the vertical or the angled grating corridor ( Figure 6 E), neurons were classified as vertical (or angled) preferring if they significantly preferred the vertical (or the angled) grating corridor in at least one session and never switched preference, and responses of such cells were averaged across the sessions in which they showed a significant preference.