All procedures were approved by the ethics committees of the School of Psychology and of the School of Sport, Health, and Exercise Science at Bangor University. Participants gave written informed consent prior to all testing.

Participants

Eighteen female participants (Age 21.6 ± 4.1, one left-handed) took part in this study. Data from two participants were excluded for having fewer than 20 clean trials and those from a third were excluded due to a technical problem with data collection, leaving a final total of 15 participants’ data. All reported having no neurological or psychiatric diagnoses and were non-smokers. Four took oral contraceptives and one used a NuVaRing. Participants reported average caffeine use of 124 ± 109 mg/day.

Apparatus

Fifty-nine Ag/AgCl ring electrodes in a 10–10 montage and two infra-orbital electrodes were used to record direct-current EEG. Prior to collecting data, impedance at each electrode was reduced to ≤ 5 kΩ using Abralyt high-chloride gel (EasyCap, Germany). Cz and FPz were used as the recording reference and ground electrodes respectively. Two BrainAmp DC amplifiers amplified the data before it was digitised and recorded using BrainVision Recorder (both Brain Products, Germany). Stimuli were presented on a 17″ LCD monitor with an electrically-shielded power source and the whole recording occurred in a sound-attenuated Faraday cage.

Procedure

Data presented here were collected during the same sessions as a study into the effects of caffeine on the cortical correlates of perceived effort, during isometric leg contractions. For the sake of brevity, the full procedure of that study will not be reported here, except where relevant, but the interested reader is referred to de Morree et al. (2014) for further details.

Participants were recruited for two EEG sessions, exactly 1 week apart. The study employed a randomised counterbalanced crossover design whereby participants were administered caffeine during one session and placebo in the other. Caffeine was administered via capsules containing 6 mg/kg body weight of caffeine powder and 6 mg/kg body weight of dried milk, while in the placebo condition participants were given capsules containing 12 mg/kg body weight of dried milk. Both the participant and the experimenter were blind to condition. As participants might recognise the effects of caffeine, effectively unblinding them and reducing the potency of the placebo effect, they were told that they would be given caffeine in one session and taurine in the other. Participants were debriefed and told the truth after both sessions were complete.

Participants were asked to maintain their habitual levels of caffeine use throughout the testing period and to have a good night’s sleep before each session. They were also asked to avoid alcohol and intense exercise prior to each session, and to eat a light meal about 2 h prior to testing.

The capsules were administered prior to electrode attachment and the cognitive task described here was conducted after the leg contraction task. The mean duration between capsule administration and the start of the cognitive task was 2 h and 6 min (range 1:50–3:02).

The cognitive task used was a three-stimulus oddball design. Participants saw a series of coloured circles (~ 3 cm diameter) appear on a black background for 500 ms. The majority were white (standards 70%), but a minority were green (targets 15%) or red (distractors 15%). Participants were asked to make a left-handed keyboard response for all white circles, a right-handed keyboard response for green circles, and no response to red circles. Stimuli were presented in three blocks of 340 trials. The inter-trial interval varied uniformly between 1350, 1475 1600, 1725, and 1850 ms.

Data analysis

A script for the portion of our analysis conducted in R (R Core Development Team 2012) is available as supplementary materials to this paper. Readers who would like any clarification on this script are very welcome to contact the corresponding author.

Each session’s data were preprocessed using BrainVision Analyser 2 (Brain Products, Germany). All three blocks were concatenated and sections where amplitude ranged by < .5 μV or > 1500 μV in any 200 ms window were excluded. Independent components analysis (ICA) using the Infomax algorithm was run on a 3-min stretch of data, starting a minute into each dataset, and weightings derived from the ICA were applied to the whole dataset. Components reflecting electro-ocular or electro-cardiographic artefacts were removed before data were back-projected. Channels showing significant residual artefacts limited to just that channel were interpolated using spherical splines (order = 3, Legendre polynomials = 10) before data were average referenced and .05–50.00 Hz filters (25 dB/octave roll-offs) were applied to the data. A second artefact rejection stage with more stringent criteria (amplitude ranging > 150 μV in any 200 ms window) was run to exclude any data still contaminated by artefacts.

An additional 4 Hz (25 dB/octave roll-offs) filter—shown to be optimal for single-trial analysis (Smulders et al. 1994)—was then applied to the data and data from target trials, correctly responded to between 120 and 1000 ms, were cut into segments from 600 ms pre-stimulus until 1800 ms post-stimulus. Data were baseline corrected using the period 600–400 ms pre-stimulus. Shorter stimulus (600 ms pre-stimulus to 1400 ms post-stimulus) and response-locked (700 ms pre-response to 400 ms post-response) segments that shared a common baseline were then cut from the longer segments. These data were then exported to R (R Core Development Team 2012) for single-trial analysis.Footnote 1

Single-trial analysis was run using the same approach reported in Saville et al. (Saville et al. 2011, 2012, 2015a), but repeated here. Averaged stimulus-locked ERPs for each participant on each condition were computed using the single-trial data and these averages were concatenated along the time axis. Spatial principal components analysis (Dien 2010a) with Infomax rotation, implemented using the prcomp and infomax functions from the core stats and GPA rotation (Bernaards and Jennrich 2005) R packages, respectively. Six factors were retained based on a parallel scree test, as recommended for principal components analysis of EEG data by Dien (2010b), implemented using the fa.parallel function from the psych package (Revelle 2016). Factor 1 showed a P3b topography so data from all electrodes were summed at each timepoint, weighted by Factor 1’s loadings, to produce single virtual electrode time-courses reflecting this factor’s activity. The topography of Factor 1 is displayed in Fig. 1 (this figure and all others, was made using the ggplot2 package for R (Wickham 2009)).

Fig. 1 Topography of first infomax-rotated PCA factor used in subsequent analyses. Size of point reflects weighting of each electrode with positive weightings shown in red and negative in blue. Some locations modified slightly to prevent overlapping points Full size image

Peaks were identified for each trial as the time-point with maximal amplitude 250–750 ms post-stimulus for stimulus-locked data, and 250 ms pre-response to 250 ms post-response for response-locked data.Footnote 2 Trials where (a) the stimulus or response-locked peak was identified at the very first or last millisecond of the peak picking window; or (b) response-locked peaks occurred before stimulus-onset; were excluded as they likely reflected an misidentified peak. Before models were fitted to data, P3b latencies were centred and scaled by z-scoring latencies within participant separately.

Mixed effects models, implemented using the lme4 (Bates et al. 2012) package for R (R Core Development Team 2012), were used to test that the assumptions for mediation were met, namely that caffeine predicted P3b latency and P3b latency predicted RT. For this purpose, a model predicting P3b latency with a fixed effect of condition (caffeine/placebo) and a random slope of condition for each participantFootnote 3 was compared to a null model omitting the fixed effect but with the same random effects structure. Likewise a model predicting RT with a fixed effect of P3b latency and a random intercept and slope of P3b latency for each participantFootnote 4 was compared to a null model with no fixed effects. Both comparisons were made using Aikake information criteria. Mediation assumptions were tested separately for stimulus and response-locked P3b latencies.

Mediation analysis was conducted using the mediation package (Tingley et al. 2014) for R. A mediation model was fitted to the data predicting RT on a single-trial basis using the predictor of condition (placebo = 0, caffeine = 1) and centred P3b latency as a mediating variable. The inputs to this model were two linear mixed effects models. The first predicted RT with fixed effects of condition and P3b latency, with random intercepts and a random slope of P3b latency for each participantFootnote 5 (the model did not converge when a random slope for condition was added). The second predicted P3b latency using a fixed effect of condition, with a random intercept for each participantFootnote 6 (again, models including a random slope for condition did not converge). Both models used maximum likelihood estimation. We planned to fit separate mixed effects models for stimulus and response-locked P3b latencies if the assumptions for mediation were met.

Model-based mediation analysis was used to estimate the average direct effect (ADE —the effect of caffeine on RT after controlling for P3b latency) and the average causal mediation effect (ACME – the total effect of caffeine on RT minus the direct effect). Quasi-Bayesian Monte Carlo simulation was used to derive 95% confidence intervals for these parameters (Imai et al. 2010).Footnote 7

We also conducted two control analyses to assess possible confounds. Firstly, with analyses of single-trial latencies it is important to assess whether there are amplitude differences between conditions as this could lead to different signal-to-noise ratios for peak picking in the two conditions, complicating interpretations of apparent latency effects. To do this, a linear mixed effect model was fitted to P3b amplitudes with a main effect of caffeine and a random intercept and slope of caffeine for each participant. This model was compared to a null model omitting the fixed effect using Aikake information criteria.

Secondly, when determining peak picking windows for stimulus and response-locked peaks, there are three options:

1. One can define the two sets of windows separately for stimulus and response-locked analyses. This allows the identified P3b peak to differ for the two analyses, which can mean that for a given trial the stimulus and response-locked P3b latency do not sum to the RT. However, it ensures that the windows are consistent relative to their time-locking events and that the measurement of both latencies is independent of RT (see below). 2. One can use the same stimulus-locked window for both types of peak. This means that in trials with very fast RTs the window for response-locked peaks is much wider post-RT than pre-RT and in very slow trials the window is much wider pre-RT than post-RT. This confounds measurement error in RT and response-locked P3b latencies, meaning that using the latter to predict the former violates the assumption of independence for regression. 3. One can use the same response-locked window for both types of peak. This has the opposite effect of option 2—stimulus-locked windows are moved forward for fast RTs and backwards for slow RTs—again violating independence assumptions.

The safest option to address these issues is run all three analyses and check whether results hold across all three. In addition to our main analysis, which used independent windows, we computed inferred response-locked latencies by subtracting RTs from stimulus-locked latencies, and inferred stimulus-locked latencies by adding RTs and response-locked latencies. The mediation models were also fitted to these data in order to assess whether the same pattern held for inferred latencies. Again, these inferred latencies were centred prior to model fitting.

Finally, in order to see what value single trial analysis added, compared to traditional averaged ERPs, mediation models were fitted to peak latencies obtained from average ERPs of factor 1 computed from the same trials as the single trial analysis was conducted on. The mediation models were the same as used for single trial analysis only they used a single mean RT for each participant in the place of RTs for all trials and the peak latency picked from the average RT in the place of single trial peaks. A random intercept of participant was fitted (a random slope of condition would have yielded more parameters than data-points). Again separate analyses were run for stimulus and response-locked data.