Participants

Twenty healthy human volunteers (13 female, 7 male; age range 19–28 years, mean age = 23.1 years) participated in the study. This sample size was chosen to match a previous study that used the same multivariate decoding methodology in the context of a purely perceptual task and obtained robust effects of expectation17. An additional three participants were tested but data were not included in the final sample either due to excessive movement during scanning (one), premature termination of the experiment (one), or technical issues during scanning (one). All participants had normal or corrected-to-normal vision, and reported no current neurological or psychiatric illness. All participants provided written informed consent prior to participation and were reimbursed £10/h. All experimental procedures were reviewed and approved by the Birkbeck, University of London and University College London Ethics Committees.

Procedure

Stimuli were displayed against a black background on a rear-projection screen using a JVC DLA-SX21 projector (26 × 19.5 cm, 60 Hz). Observed hand stimuli were generated in Poser 10 (Smith Micro Software) and consisted of a gender-neutral right hand viewed from a canonical first-person perspective (height ~ 13 degrees, width ~ 9 degrees, see Fig. 2). Participants lay supine in an MRI scanner with both hands placed on MR-compatible button boxes. The right-hand box was positioned across the midline of the participant’s body, such that the index finger was above the little finger on the dorsal-ventral axis. Participants depressed two buttons on the right-hand box with their index and little finger except when executing movements. The left-hand button box was positioned below the right-hand box on the participant’s left leg, and participants placed their left thumb between two response keys.

Each trial began with the presentation of a white fixation cross, which remained present throughout stimulus presentation. After 750 ms, a neutral hand image was presented behind the fixation cross. On congruent and incongruent trials, this neutral hand image was accompanied by a white shape (square or circle) indicating which action (index or little abduction) the participant was required to perform. The display remained on screen until participants executed the cued action, measured by the release of the depressed key on the button box. Upon release of the key, the neutral hand image was immediately replaced by an image of the avatar hand abducting either its index or little finger. This sequence created apparent motion of the observed finger that could be congruent or incongruent with the participant’s own action, and that always appeared in synchrony with it. Congruency therefore reflects relative expectation – congruent action outcomes are more expected than incongruent ones, based either on inherited evolutionary expectations or a prior lifetime of learning about what is likely2,5,8 (note that all statements are therefore relative throughout the manuscript, and a congruent suppression is equivalent to an incongruent facilitation). The movement of the avatar hand also revealed a coloured dot (red or blue) in the previous fingertip location (see Fig.1). On no move trials, an imperative shape cue did not appear with the neutral hand stimulus and the apparent motion sequence occurred after a fixed delay of 438 ms—matched to the average action execution reaction time in a pilot experiment. A fixed delay was implemented such that the onset of movement had approximately comparable temporal predictability relative to the trials where stimulus onset was yoked to the participant’s action. On all trials, the hand image was removed after 500 ms and the screen was blanked for 1000 ms.

Participants completed one of two tasks, either making judgements about the identity of the observed finger abduction (e.g., ‘Did the INDEX finger move?’—finger-judgement trials) or the colour of the dot revealed by the finger movement (e.g., ‘Was the dot BLUE?’—colour judgement trials). On each trial the question was presented for 1500 ms, within which time participants were required to indicate their response via a keypress with their left thumb. The next trial began after a jittered inter-trial interval of 2–6 s.

The experiment was conducted in eight scanning sessions. Each session comprised 48 trials. On two-thirds of these trials participants executed index or little finger abductions with equal probability, and subsequently observed either congruent or incongruent action outcomes with equal probability (16 each). The remaining third of trials were no move trials (16), where participants observed index or little abductions without moving themselves. The task was blocked within each scanning session, such that one half of the session comprised finger-judgement trials and the other colour judgement trials. The task alternated across sessions, with the order counterbalanced across participants. At the beginning of each block, participants were reminded of the task they were performing, as well as the mapping between imperative shape cues and executed actions. This mapping was counterbalanced across participants, and was also reversed halfway through the experiment (i.e., the beginning of the fifth scanning session) to remove any confound between action-outcome congruency and cue-outcome congruency over the experiment.

Before beginning the main experiment, participants completed two practice blocks of 48 trials in a room outside the scanner. This practice block contained identical ratios of each trial type.

Behavioural performance analysis

A congruency (congruent, incongruent) by task (finger-judgement, colour-judgement) ANOVA on participant accuracies in the tasks revealed that participants were significantly more accurate when making judgements about dot colour than finger identity (F 1,17 = 40.656, p < 0.001, η p 2 = 0.705). There was also an interaction between congruency and task (F 1,17 = 11.130, p = 0.004, η p 2 = 0.396), reflecting superior accuracy on congruent trials relative to incongruent trials when making judgements about finger stimuli (t 17 = 3.954, p = 0.001), but no effect of congruency when making judgements about dot colour (t 17 = −0.944, p = 0.358). This pattern resembles that obtained in previous studies in the sensory cognition literature, where expectations facilitate behavioural performance but only when they are task-relevant17. Due to a technical fault, choice data could not be recovered for two participants on > 40% of trials, who were excluded from the above analyses. Including the data available for these participants did not alter any statistical patterns observed.

fMRI acquisition and preprocessing

Images were acquired using a 3T Trio MRI scanner (Siemens, Erlangen, Germany) using a 32-channel head coil. Functional images were acquired using an echo planar imaging (EPI) sequence (ascending slice acquisition, TR = 3.36 s, TE1/TE2 = 30/30.25 ms, 48 slices, voxel resolution 3 mm isotropic). Structural images were acquired using a magnetisation-prepared rapid gradient-echo (MP-RAGE) sequence (voxel resolution: 1 mm isotropic).

Images were preprocessed in SPM12. The first six volumes of each participant’s data in each scanning run were discarded to allow for T1 equilibration. All functional images were spatially realigned to the first image and temporally-realigned to the 24th (middle) slice. The participant’s structural image was then coregistered to the mean functional scan and segmented to estimate forward and inverse deformation fields to transform data from participant’s native space into Montreal Neurological Institute (MNI) space, and vice versa.

Multivariate decoding analyses

MVPA analyses were implemented using the TDT toolbox. In each analysis, a linear SVM was trained to discriminate which stimulus (index or little) was observed on a given trial from patterns of BOLD activity across voxels. The initial step in each analysis was the specification of a general linear model (GLM) in SPM12 including a separate regressor for each stimulus type (e.g., observed index movement) in each experimental condition (e.g., congruent trials) in each scanning run. All regressors were modelled to the onset of the observed stimulus, movement parameters were included as nuisance regressors, and all model regressors were convolved with the canonical haemodynamic response function. This GLM generated eight beta images (one for each scanning run) for each stimulus type (index or little) in each experimental condition that were used for subsequent decoding analyses.

Separate SVMs were trained and tested on the 16 beta images (eight index and little) in each experimental condition (congruent and incongruent), using a leave-one-out cross-validation procedure. For each decoding step 14 images from seven scanning runs were used to estimate a linear discriminant function separating index and little movements, which was then applied to the remaining two beta images to classify them as either index or little. This procedure resulted in eight decoding steps, where each step reserved beta images from one of the eight scanning runs for classifier testing. The SVM’s accuracy was calculated as the proportion of correctly classified images across all decoding steps.

Defining regions of interest

All analyses used a ‘searchlight’ approach18, which involved building a separate SVM for each voxel in the brain using the beta values falling within a searchlight radius of 3 voxels (9 mm), and assigning the SVM’s accuracy to the voxel upon which the searchlight was centred. This procedure yielded decoding maps in participant’s native space indicating each voxel’s decoding accuracy relative to chance level (50%; i.e., decoding accuracy of 60% is treated as 10%). To allow comparison across participants, these decoding maps were normalised into MNI space using the forward deformation fields estimated in preprocessing and smoothed using a 4 mm FWHM Gaussian kernel in SPM12.

To maximise sensitivity, MVPA analyses were initially conducted collapsing across finger judgement and colour judgement trials. Searchlight analyses from no move trials were used to define regions of interest. Maps from each participant were normalised and smoothed (described above), and subjected to a one-sample t-test in SPM12, using cluster-wise inference to identify contiguous voxels where decoding accuracy was significantly above chance at the group level19. This involves identifying individual voxels that passed a ‘height’ threshold (p < 0.001 uncorrected) and an ‘extent’ threshold applied to contiguous voxels that pass the height threshold (FWE p < 0.0531,). This combination of thresholds has been shown to control appropriately for false-positive rates20. We restricted this contrast to occipital and temporal areas using the SPM12 atlas, to limit analyses to regions putatively involved in different aspects of visual processing32, and analyses were not constrained to clusters of a minimum size. This analysis revealed three clusters in bilateral occipital cortex (bOC, 1825 voxels), left occipital cortex (lOC, 703 voxels) and right occipitotemporal cortex (rOTC, 374 voxels, see Fig. 2). Note that these specific ROIs may not generalise beyond the participants that we scanned. This is because one-sample t-tests on decoding measures do not support population inference, given that below-chance decoding accuracies are not meaningful33. However, the analyses used below to test our hypotheses investigate the difference in decoding accuracy between two conditions, rather than comparing against chance performance and so do support population inference. It is also worth noting that similar findings were obtained if defining the ROIs according to a permutation test approach33 (see Supplementary Note 1).

Effects of expectation on stimulus decoding

To investigate effects of expectation during action on decoding accuracy, we extracted and averaged the decoding accuracies within each cluster separately for congruent and incongruent trials. These mean accuracies were then subjected to a cluster (bOC, lOC, rOTC) by congruency (congruent, incongruent) ANOVA.

To investigate possible interactions between expectations during action and top-down attentional relevance29, additional searchlights were conducted separately for each combination of congruency (congruent, incongruent) and task (finger judgement, colour judgement). This procedure was identical to that described above, though segregating stimulus events halved the number of stimulus events used to model beta images for decoding. Mean decoding accuracies for each participant were calculated for each condition in each cluster, and these values were analysed using a cluster (bOC, lOC, rOTC) by congruency (congruent, incongruent) by task (finger judgements, colour judgements) ANOVA.

Effects of expectation on stimulus-specific activity

We investigated how expectations during action change the profile of activity across sensory populations by examining how stimulus-specific patterns of univariate BOLD activity varied between congruent and incongruent trials, within the same regions of interest. Using the same unnormalised, unsmoothed images used for multivariate decoding, we conducted a t-contrast in SPM12 for each participant comparing activity for observed index finger stimuli and observed little finger stimuli across all conditions. This contrast yielded a t-map for each participant where positive and negative values reflected a voxel’s preference for either index or little finger stimuli, respectively.

After assigning a preferred stimulus to each voxel, we extracted univariate BOLD signal (beta values) from each voxel separately for congruent and incongruent trials as a function of whether the stimulus was the preferred or non-preferred stimulus for a given voxel. For example, if a voxel was classified as ‘index preferring’, the univariate signal on congruent trials where an index finger was presented was congruent-preferred, whereas signal on the same trials was congruent-non-preferred for voxels classified as ‘little preferring’. Univariate BOLD signal was extracted from each voxel in each of the clusters used for decoding and analysed with a cluster (bOC, lOC, rOTC) by congruency (congruent, incongruent) by preference (preferred stimulus, non-preferred stimulus) ANOVA. Analyses examining univariate main effects of congruency are reported in the Supplementary Information (see Supplementary Note 2).

Statistical information

Regions of interest were identified using cluster-wise inference on group-level decoding maps, with a combined primary voxel threshold (p < .001 uncorrected) and cluster-defining threshold (FWE p < .05) that appropriately controls false-positive rates20. For alternative information prevalence analyses, see Supplementary Note 1 and Supplementary Fig. 1. All inferential statistics evaluating differences between experimental conditions used an alpha level of 0.05. Assumptions of parametric tests were met. All error bars show 95% within-participant confidence intervals of the mean difference between conditions34.