Participants

Thirty-three right-handed participants were recruited from the student population at the Radboud University in Nijmegen. Sample size was decided on before the collection of data and was aimed at being able to detect experimental effects that had at least moderate effect size (Cohen’s d>0.6). Participants gave written informed consent in accordance with the institutional guidelines of the local ethical committee (CMO region Arnhem-Nijmegen, The Netherlands) and received monetary compensation for their participation. All participants were invited for two separate scanning sessions taking place within maximally 2 weeks’ time. Three participants completed only one of the two sessions and 1 participant was excluded due to excessive head motion. Only the remaining 29 participants (19 female, mean age=23 years) were included in all analyses. All participants had normal or corrected-to-normal visual acuity. Four additional participants (1 female, mean age=28 years) participated in a control experiment.

MRI acquisition

Functional and anatomical images were acquired using a 3T Skyra MRI system (Siemens, Erlangen, Germany) equipped with a 32-channel headcoil. Each of two MRI sessions lasted between 1.5 and 2.5 h, during which we acquired (i) a T1-weighted anatomical scan (1 mm isotropic), (ii) two whole-brain functional localizer runs during the behavioural training of the stimulus sequence (2 mm isotropic; TR=1,800 ms). These runs were used for an online analysis at the MRI scanner console to aid the slice positioning of the ultra-fast functional runs, (iii) eight ultra-fast functional runs to measure BOLD activity during the experimental paradigm (2 mm isotropic; TR=88 ms), where four runs consisted of the left-right and right-left stimulus sequence, respectively. After four runs, participants were trained with the opposite sequence. The order of left-right and right-left sequences was randomized across sessions and participants. Functional slices were positioned based on an anatomical landmark (along the calcarine sulcus) and based on a functional localizer (stimulus versus rest) that was acquired during training. (iv) Three to four functional runs with a moving bar sequence, to estimate pRFs9. In one session, participants performed a task on the dot sequence (attended condition), and in the second session participants performed a task at fixation (unattended condition). The order of sessions was randomized across participants. The retinotopic mapping was only performed once, either during the first or during the second session.

BOLD activity for the localizer runs and the retinotopic mapping was measured using T2*-weighted gradient echo planar imaging sequence (TR/echo time (TE)=1,800/30 ms, 26 transversal slices, voxel size 2 × 2 × 1.7 mm, 60° flip angle; slice gap=20%). BOLD activity for the experimental runs was measured using a standard T2*-weighted gradient echo planar imaging sequence (TR/TE=88/25 ms, 2 transversal slices, voxel size 2.3 × 2.3 × 4.0 mm, 60° flip angle; slice gap=10%). Notably, the fast TR was possible because only 2 slices were acquired that were carefully positioned to cover the relevant parts of primary visual cortex (Supplementary Fig. 4). Anatomical images were acquired with a T1-weighted magnetization prepared rapid gradient echo (MP-RAGE) sequence (TR/TE=2,300/3.03 ms, voxel size 1 × 1 × 1 mm, 8° flip angle). BOLD measurements in the control experiment used a multiband sequence (acceleration factor=3; TR/TE=262/35.80 ms, 9 transversal slices, voxel size 2.4 × 2.4 × 2.4 mm, 38° flip angle; slice gap=20%).

Visual stimuli

Stimuli were rear projected on a screen located 80.5 cm from the participant’s eyes at the head of the scanner table. The screen was viewed using a mirror attached to the headcoil. We presented a moving dot sequence consisting of four dots at spatial locations evenly spaced between x=−10° and +10° and y=+2° above fixation on a black background. Each dot had a diameter of 1.2° and was shown for 100 ms followed by a blank screen of 33 ms. The total duration of the moving dot sequence amounts to 502 ms. A fixation cross (0.6°) was shown at the centre of the screen and participants were instructed to maintain fixation throughout the experiment.

Experimental design

The experiment consisted of four parts. First, participants were shown 108 trials of either the dot sequence left-to-right or right-to-left. The sequences were presented in six blocks of 36 s and each trial lasted 2 s. The blocks were followed by a 36 s rest period, only displaying the fixation cross. This exposure period served to familiarize participants with the dot sequence and was further used to compute an online t-contrast (stimulation>fixation) at the MR console to aid the slice positioning of the following ultra-fast sequence. During this part only the full stimulus sequences were shown and no starting-point-only or end-point-only trials. During the exposure period participants performed the same task as during the main experiment, detecting letters at fixation (unattended condition) or performing a task on the dot (attended condition).

Second, there followed four runs with the ultra-fast fMRI sequence. Two slices were positioned in the sagittal axis with alignment along the calcarine sulcus, the anatomical location of V1, and adjusted, if necessary, to cover as much as possible of the online t-contrast map. Each run contained 24 trials, consisting of 10 stimulus sequence trials that matched the previous exposure period (that is, either left-to-right or right-to-left), 5 preplay trials with a presentation of the starting dot only, 5 control trials with a presentation of the end dot only and 4 stimulus sequence ‘oddball’ trials where the last dot was shown after 167 ms instead of 33 ms. The order of stimulus conditions was pseudo-randomized for a given run with the restriction that the first trial of a run was always a full sequence trial, and preplay and control trials were always preceded and followed by a full sequence trial. Each trial lasted 11.97 s (corresponding to 136 fMRI volumes), consisting of a 502 ms stimulus sequence (638 ms for the oddball trials) and 11.47 s ITI. The long ITI duration was chosen to provide time for the BOLD response to return back to baseline. Each run lasted 4.8 min and started with 8 s of fixation that was discarded from the analysis.

Third, participants were shown 108 trials of the other dot sequence, that is, left-to-right if the previous sequence was right-to-left and vice versa. Which sequence was shown first was randomly chosen and counterbalanced across participants. Fourth, after exposure with the new stimulus sequence, participants underwent four runs with the ultra-fast sequence again with the same parameters as described above. In total, the functional experiment lasted 45 min and contained 80 stimulation trials (10,880 volumes), 40 preplay trials and 40 control trials (5,440 volumes, respectively).

Each participant performed the experiment twice in separate sessions, at least 2 days, but no longer than 14 days apart. In one session (attended condition) participants had to detect rare occasions (20%), when the last dot of the sequence followed slightly later (167 ms) after the previous dot than expected (33 ms). In the other session (unattended condition), participants were presented with a sequence of rapidly changing letters at fixation and had to report whenever target letters ‘X’ or ‘Y’ (target probability=10%) appeared in a stream of non-target letters (‘A’, ‘T’, ‘N’, ‘U’, ‘V’, ‘Y’, ‘H’ and ‘R’). Letters were presented for 400 ms each, separated by 400 ms intervals in which only the fixation point was presented. Apart from the different tasks, the two sessions were identical. The order of sessions was counterbalanced across participants. In a post-experimental interview participants were asked whether they perceived a ‘stripe’ after being exposed to only the starting location of the dot sequence (attended condition: 38%; unattended condition: 28%).

A control experiment was conducted to rule out carry-over effects of the BOLD signal from one trial to the next and to test whether the preplay still persists when the stimulus sequence moves through fixation (Supplementary Fig. 7). Stimuli and timing in the control experiment were virtually identical to the main experiment. Changes were as follows: (i) instead of a fixed ITI of 11.47 s, a variable ITI ranging from 11.79 to 15.98 s was used and each block contained three null events of 10.22 s in which only a fixation cross was displayed. The null events occurred at random positions throughout the block; (ii) we presented the moving dot sequence consisting of four dots at spatial locations evenly spaced between x=−8° and +8° and y=0°, moving through fixation; (iii) instead of the letter task, participants had to detect a dimming of the fixation cross (reduction of stimulus contrast by 30%). The dimming occurred at random locations in time, on average once per trial.

Participants were exposed to the stimulus sequence for 108 trials. Directly after that two blocks with the left-to-right sequence were shown. Each block consisted of 20 trials in which the full stimulus sequence was shown, and 10 preplay trials with the starting point only. The end-point trials were left out to keep the scanning time to a minimum.

In addition, four blocks with an apparent motion paradigm were shown in which two dots were presented alternating at 2.3 Hz. Stimulus duration was 150 ms with an inter-stimulus interval of 67 ms. The timing was based on a previous study by Muckli et al.13. The stimuli were shown for 6 s followed by a variable ITI ranging from 11.79 to 15.98 s. The dots were presented at x=−8° and +8°, respectively. In two blocks, the dots were shown at y=6° and in the remaining two blocks the dots were shown at y=0°, with the apparent motion path going through fixation. Each block consisted of 20 trials with the flickering apparent motion stimulus sequence, and 10 trials in which only the starting point was shown. The latter condition was used to test whether the starting point of the apparent motion paradigm would also elicit an anticipatory BOLD activity wave.

pRF estimation

The data from the moving bar runs were used to estimate the pRF of each voxel in the functional volumes using MrVista (http://white.stanford.edu/software). In this analysis, a predicted BOLD signal is calculated from the known stimulus parameters and a model of the underlying neuronal population. The model of the neuronal population consisted of a two-dimensional Gaussian pRF, with parameters x0, y0 and σ, where x0 and y0 are the coordinates of the centre of the receptive field, and σ indicates its spread (s.d.) or size. All parameters were stimulus-referred and their units were degrees of visual angle. These parameters were adjusted to obtain the best possible fit of the predicted to the actual BOLD signal. This method has been shown to produce pRF size estimates that agree well with electrophysiological receptive field measurements in monkey and human visual cortex. For details of this procedure, see refs 9, 41. Once estimated, x0 and y0 were converted to eccentricity and polar-angle measures and co-registered with the functional images using linear transformation. For the following analyses, only voxels with a model fit of R2>1% were considered. Group-averaged receptive field properties are shown in Supplementary Fig. 2.

Functional localizer

Images of the localizer blocks were preprocessed using FSL42, including motion correction (six-parameter affine transform), temporal high-pass filtering (128 s) and spatial smoothing using a Gaussian kernel (full-width at half maximum=5). Onsets and durations of the moving dot exposure blocks were convolved with a single-gamma haemodynamic response function (HRF) and fitted using a general linear model. For each subject a two-sided t-contrast was calculated comparing stimulation and baseline periods. Resulting statistical maps were thresholded at P<0.001, uncorrected. Significant effects were also tested on a group-level via random-effects analysis using FSL’s FLAME43 and corrected for multiple comparisons using cluster correction with FSL’s cluster command (z=2.58, cluster significance threshold P<0.05). In the visual cortex this resulted in separate significant clusters in hMT+ and early visual cortex.

Selection of V1 and hMT+ regions

V1 was determined using the automatic cortical parcellation provided by FreeSurfer44 based on individual T1 images. The V1 mask was further restricted to voxels with receptive fields along the moving dot path. To this end, only voxels with a receptive field centre ranging from y=1.4° to 2.6°, and x=−10° to 10° (that is, along the stimulus path) were considered (Supplementary Fig. 2b). With increasing receptive field size, voxels will respond to multiple dot locations. To prevent overlap in response profiles the V1 mask was further restricted to voxels with a pRF size ≤3.5°.

hMT+ was determined for individual participants as follows. Significant voxels from the individual contrast (stimulus>baseline) were only considered when they overlapped with significant voxels in the hMT+ cluster from the statistical group analysis. This way, individual hMT+ activations were constrained by the group results.

fMRI preprocessing

Images were preprocessed using FSL (Oxford, UK) including motion correction (six-parameter affine transform, albeit limited due to low number of slices), temporal high-pass filtering (128 s) and Savitzky–Golay low-pass filter45 for each run separately. Individual voxel time courses were then transformed into per cent signal change in reference to the mean over time. No spatial smoothing was performed, and all analyses were carried out in the native subject space.

Next, time courses were temporally averaged over trials and runs separately for each task condition (stimulus, preplay and control), stimulus sequence (left-to-right and right-to-left) and sessions (attended and unattended). In the following analyses only voxels within individual V1 and hMT+ masks were considered.

Amplitude and peak latency

fMRI response amplitude and peak latency were computed by fitting a conventional single-gamma HRF function46 to individual voxel time courses for each participant, task condition, stimulus sequence, session and each region of interest (ROI). This was done using the function curve_fit as implemented in SciPy 0.18 (ref. 47; based on the Trust Region Reflective algorithm for a constrained least-squares fit). The objective function was the sum of squared errors between the predicted and observed response. We allowed the baseline, amplitude and peak delay to vary as free parameters. The peak delay was constrained to a range between 3 and 11.5 s peak latency to prevent pathological fits.

Owing to cortical magnification, the receptive field density is highest around the fovea and rapidly decreases towards the periphery. As a consequence V1 voxels have disproportionally more receptive fields close to the fovea, compared to the periphery (Supplementary Fig. 2a). To give an unbiased estimate of the propagation of visual activity, we binned the receptive fields into 28, evenly spaced bins, covering the relevant eccentricities from x=−12° to +12°. Note, the binning procedure only aids the visualization of the V1 activity wave, but had no influence on the reported BOLD amplitude and peak latency statistics. Fitted HRFs were averaged within each bin and resulted for each subject, task condition, stimulus sequence and session in a M=N × T matrix, where N is the number of bins and T is the number of functional volumes (136). For the different stimulus sequences, that is, left-to-right and right-to-left, matrices M were aligned by reversing the order of N for the right-to-left sequences (effectively changing it into a left-to-right sequence) and then averaged. Note that due to missing pRF coverage, single bins might be empty. Importantly, while averaging these empty bins were not treated as zero, but as ‘not a number’ (Nan) and the averaging was restricted to real values (for example, average(5.3+2.1)=7.4; average(5.3+Nan)=5.3; average(Nan+Nan)=Nan). Individual response matrices are shown in Supplementary Fig. 3. The relationship of BOLD amplitude and BOLD peak delay was tested using Pearson’s correlation across subjects for the stimulation condition. Confirming previous reports48,49, no significant relationship was found (r=−0.12; P=0.81).

To what extent does the detection of the reported BOLD delay differences depend on our ultra-fast scanning sequence? To answer this question, we down-sampled the existing data and repeated the same analyses (Supplementary Fig. 6).

For the BOLD amplitude and delay analyses, only V1 voxel along the stimulus path were selected. We additionally used a pRF-based stimulus reconstruction analysis10,11 based on all voxel in V1 with a pRF size ≤3.5°, to show the spatial specificity of the activity spread. To this end, for each condition, the BOLD amplitude estimates of each voxel were first multiplied with a two-dimensional Gaussian defined by each voxels’ pRF estimates (x0, y0 and s0) and then averaged over the voxel dimension, resulting in a stimulus reconstruction. This stimulus reconstruction was divided by a stimulus reconstruction for which the amplitude for all voxels was set to 1, effectively reducing the distortion due to the uneven distribution of receptive fields throughout the visual field. To illustrate the temporal and spatial dynamics of BOLD activity, we calculated pRF-based stimulus reconstructions (N=29) for each time point (Supplementary Movie 1). These reconstructions were created for the unattended condition. The same pRF-based stimulus reconstruction analysis was used for the control experiment with apparent motion stimuli (Supplementary Fig. 7).

BOLD amplitude for hMT+ was calculated using the same fitting procedure, except that responses were averaged over all hMT+ voxels and no binning was performed. Correlations between V1 and hMT+ amplitude estimates were calculated using Pearson’s product–moment correlation across subjects. To this end, the V1 amplitudes of the stimulation condition were averaged across all four stimulation locations (that is, dot locations 1–4). For the preplay and control condition V1 amplitudes were averaged across the non-stimulated dot locations (that is, dot locations 2, 3 and 4 for preplay and dot locations 1, 2 and 3 for the control condition). Event-related averages of the BOLD responses are shown in Fig. 2. Resulting correlation coefficients were transformed to Fisher z values and tested for significance from zero using one sample t-tests (non-parametric permutation test with 10,000 permutations).

Granger correlation of hMT+ and V1

We used a Granger ‘causality’ analysis, hereby referred to as Granger correlation (GC) analysis to probe the ‘directionality’ of the correlation between hMT+ and V1. GC was calculated in terms of vector autoregressive models in the frequency domain 0.0078–0.25 Hz (that is, the available frequency spectrum of the BOLD signal after temporal filtering). The influence measures F hMT+→V1 , F V1→hMT+ and F hMT+·V1 were computed from the average time course of all voxels in hMT+ and V1 for the stimulation and preplay condition, respectively. As suggested by Roebroeck et al.50 we assessed GC as the difference of the influence terms (F hMT→V1 and F V1→hMT ). The null hypothesis is F hMT+→V1 −F V1→hMT+ =0. The order of the autoregressive model was set to p=1 according to Roebroeck et al.50.

Within the GC framework, a difference value <0 can be interpreted as ‘hMT+ is driving V1’, and a difference value >0 can be interpreted in the opposite direction, that is, that ‘V1 is driving hMT+’. GC difference scores were tested against zero using a two-sided one-sample t-test (Supplementary Fig. 8).

RT analysis

On the basis of a median split, for each participant, delayed sequence trials were divided into fast versus slow RT trials. BOLD time courses of voxels corresponding to the retinotopically defined location of the last dot in the stimulus sequence were averaged and fitted with a HRF for each participant and slow versus fast RT trials separately. Estimated BOLD peak latency differences were tested across participants with a paired-sample t-test.

Control for eye movements

Participants were instructed to maintain fixation throughout the whole experiment. Eye positions were recorded with a video camera at 50 Hz sampling rate under infrared illumination (MEye Track-LR camera unit, SMI, SensoMotoric Instruments). Eyeblink artefacts were identified by differentiating the signal to detect eye pupil changes occurring too rapidly (<60 ms) to represent actual dilation. Blinks and samples in which the corneal reflection was not reliably detected were removed from the signal using linear interpolation. Eyetracking data gathered in the scanner for 22 of the 29 participants during the attended session and for 21 of the 29 participants for the unattended session. We calculated the mean gaze as a function of the four stimulus locations and task conditions. Mean horizontal gaze position did not vary with stimulus position for the scanning session with ‘task on stimulus’ (attended condition; ANOVA, P=0.58; N=22) and for the scanning session with ‘task at fixation’ (unattended condition; (ANOVA, P=0.33; N=21).

Data availability

All data will be made available freely on request. All code will be made available freely on request.