Participants

Twelve right-handed participants with a Diagnostic and Statistical Manual of Mental Disorders 5th Edition26 diagnosis of SCZ or schizoaffective disorder were recruited to the study from the South London and Maudsely NHS Trust in London, UK. Patients were required to provide written informed consent and to be treated with stable doses of antipsychotic medication for the 3 months prior to study enrollment. Participants who met the criteria for alcohol or substance dependence in the previous 6 months were excluded. The inclusion criteria required a score of ≥3 on the hallucinatory behavior (P3) item of the Positive and Negative Syndrome Scale (PANSS)27,28,. The study was approved by the Stanmore National Research Ethics Committee (REC number 15/LO/1007); all study procedures have been conducted in accordance with the Declaration of Helsinki.

Participants were required to attend the study on five separate visits (Fig. 1). Baseline clinical assessments were conducted on the first day. The four subsequent visits for MRI scans were completed over a 2-week period. Clinical assessments were also completed during these visits. Assessment included the PANSS at baseline and after the last fMRI visit, and the Psychotic Rating Symptom Scale (PsyRats)29 hallucination subscale at baseline, after each rtfMRI-NF scan (visits 2–4), and 1 week post fMRI.

Fig. 1 Study design. Participants attended five study visits. During the baseline visit participants were assessed on the inclusion/exclusion criteria, and clinical and socio-demographic information was collected. During the first scanning visit a mask of the speech-sensitive left STG was created for each participant using the functional localizer task: inc/exc inclusion/exclusion, WASI Wechsler Abbreviate Scale of Intelligence, PsyRats Psychotic Symptom Rating Scale, PANSS Positive and Negative Syndrome Scale, rtfMRI-NF real-time functional magnetic resonance imaging neurofeedback Full size image

Intelligence quotient (IQ) was assessed using subsets (Matrix reasoning, and vocabulary) of the Wechsler Abbreviated Scale of Intelligence30. The PANSS and PsyRats instruments were administered by an independent researcher (MS), who was blind to the study protocol and objectives.

Imaging data acquisition

Functional images were acquired on a General Electric MR750 3.0 T MR scanner at the Department of Neuroimaging, Institute of Psychiatry, Psychology and Neuroscience, King’s College London (London, UK). A 12-channel head coil was used for radio frequency transmission and reception. A single-shot gradient recalled echo planar imaging sequence was used for fMRI acquisition (64 × 64 matrix over a 21.1 × 21.1 cm2 field of view (FOV), giving an in-plane voxel size of 3.3 mm; slice thickness 3 mm with a 0.3 mm slice gap; repetition time (TR) = 2000ms; echo time (TE) = 30ms, flip angle = 75°). A high-resolution three-dimensional (3D) T1-weighted enhanced gradient echo (256 × 256 matrix over a 27 × 27 cm2 FOV, giving an in-plane voxel size of 1.05 mm; slice thickness 1.2 mm; TR = 7.312 ms, TE = 3.015 ms, flip angle = 11°) scan was acquired for image normalization. The first four volumes of each rtfMRI-NF run were discarded to allow steady-state magnetization to be established.

Localizer scan

During the first visit participants underwent a localizer scan to identify voice-sensitive regions in their left STG. This consisted of a voice perception task comprising blocks of vocal and non-vocal stimuli9. Vocal stimuli were words for everyday objects neutral in semantic and prosodic content, whereas the non-vocal stimuli were based on non-speech digitalized sounds with amplitude and energy matched with control sounds. The presentation was alternated, with a total of eight 20 s blocks of vocal and seven 20 s blocks of non-vocal stimuli, lasting approximately 5 min in total. Stimuli loudness was adjusted individually, to ensure that participants could hear the stimuli clearly above the scanner noise. To create a functional mask from which to derive the rtfMRI-NF signal, we calculated the effective signal change in areas activated by the functional localizer task using conventional univariate fMRI analysis techniques. This was based on the task contrast of the average BOLD signal between the activation block (vocal stimuli) and the baseline block (non-vocal stimuli). Online pre-processing and analysis were conducted using AFNI software (https://afni.nimh.nih.gov/) and using local scripts written and developed by the author V.G. In short, the data were smoothed and corrected for head motion. The contrast vocal > non-vocal was chosen and faces touching clusters with the highest t-statistic were displayed (https://afni.nimh.nih.gov/pub/dist/doc/program_help/3dclust.html). The individual functional masks were created based on the maximally activated cluster in the left posterior STG by manually thresholding the target cluster until the required size/shape was present in the left posterior STG (ROI STG ). The left STG was chosen because this region is typically active during AVH. We used white matter as a reference region to cancel out non-specific global brain effects (ROI REF ): a white matter mask was created by segmenting the T1-weighted structural image in AFNI, eroding to limit partial volume effects, and mapping it onto the functional localizer mask by reversing the normalization process.

rtfMRI-NF data acquisition and processing

A custom rtfMRI-NF interface system31 and AFNI software32 were used for real-time transfer and analysis of fMRI data. The rtfMRI-NF interface system ran on the scanner hardware to access the fMRI scans as they were reconstructed. The images were then transferred to a Linux workstation where they were pre-processed using AFNI’s built in real-time capabilities. Once the target in the STG ROI (ROI STG ) had been identified using the functional localizer described above, the neurofeedback signal was calculated using the formula: ((ROI STG −ROI REF )−(ROI STG Previous−ROI REF Previous)), where the previous ROIs are the average activation of the left STG and references regions in the previous rest block. Thus, the NF signal was a function of the difference of the current ROI STG activity (averaged over 3 TR periods in order to reduce jitter) to the average of the previous rest block, with all values being measured relative to the corresponding white matter signal (changes which represent global signal variations of no interest).

rtfMRI-NF training

Each rtfMRI-NF training run comprised a block design similar to that used in previous studies24,33. Each rtfMRI-NF training run alternated between no-regulation “rest” blocks (7 blocks of 30 s) and down-regulation blocks (6 blocks of 50 s), lasting around 9 min per run. To motivate participants during training runs, and to provide a more visually engaging task, we used a visual feedback interface depicting a “vertically orientated space rocket”33. Participants were instructed to land the rocket by bringing it down to Earth. Visual feedback was provided during the training blocks, and during the rest block participants observed a fixation cross. After each visit participants reported the strategy that they used to down-regulate their left STG activity. Participants were informed about the inherent delay in feedback due to the hemodynamic response (approx. 6 s). To enhance motivation and the likelihood of successful left STG signal down-regulations, we did not provide any overt instructions or suggest any strategies; participants were asked to devise their own strategy to down-regulate their left STG signal34,35 (Supplementary material). All participants attended four 1-h visits for MRI. The functional localizer task was completed during the first scanner visit. During visits 2, 3, and 4, participants completed between 2 and 6 rtfMRI-NF training runs depending on the time available (mean number of runs per scanner v2 = 4.8, v3 = 4.5, v4 = 3.6). During the fourth (final) visit, participants also undertook a “transfer” run. The transfer run was identical to the training runs except that no visual feedback (static picture) was given. This allowed the overall success of the training to be assessed (i.e., participants’ ability to down-regulate their STG signal in the absence of direct feedback). Participants were informed that the picture would remain static, and were asked to employ the same strategies they used during the rtfMRI-NF training. Transfer runs measure retention of learning and are considered a proximal measure of successful transfer of training strategies to everyday life.

Data analysis

Clinical data

To assess overall clinical change, and any adverse effect of the study procedures on clinical presentation, we conducted a paired t test for PANSS scores pre and post rtfMRI-NF training. To investigate the specific change in AVH symptoms over the rtfMRI-NF training period, we analyzed the total PsyRats AVH symptoms score by specifying a full maximum-likelihood random-effect multilevel model (MLREM)36. Post hoc exploratory analyses examined changes over the rtfMRI-NF training period in individual PsyRats items using one-sided, paired-sample t tests. These results are reported at an uncorrected threshold of p < 0.05, but no tests survived correction for multiple testing (p = 0.05/11). Data distribution checks and statistical analyses were carried out using STATA 12.1.

fMRI data analysis

All offline data were pre-processed and analyzed using Statistical Parametric Mapping 12 (SPM12). All functional data were slice-timed corrected and realigned, to correct for volume-to-volume head motion. Following this, the time series was co-registered to the high-resolution T1-weighted image, and normalized into the Montreal Neurological Institute (MNI) template using parameters generated by unified segmentation of the T1-weighted structural image37. The transformed data were smoothed using an 8 mm full-width at half-maximum isotropic Gaussian kernel. For the localizer task subject-specific fixed models were constructed with regressors encoding the predicted blood oxygenation level-dependent (BOLD) response for vocal stimuli and non-vocal stimuli blocks. For the rtfMRI-NF runs, subject-specific fixed-effects models were constructed with regressors encoding the predicted BOLD response for each of the rtfMRI-NF runs (ranging from 9 to 16 across subjects), with baseline (rest) blocks serving as the baseline. For both the localizer task and the rtfMRI-NF runs, the six motion parameters for each run, generated during realignment, were included as nuisance regressors. For all first-level models voxelwise parameter estimates for these regressors were obtained by restricted maximum-likelihood estimation using a temporal high-pass filter (cutoff = 128 s) to remove low-frequency drifts, and modeling temporal autocorrelation across scans with an AR(1) process. Following parameter estimation, contrasts of beta coefficients for the primary contrasts of interest were generated. For the localizer task the contrast was vocal stimuli > non-vocal stimuli. As the number of rtfMRI-NF runs differed between participants (9–16; mean = 14; median = 14), the second-level model comprised contrasts for the first vs. last rtfMRI-NF run acquired during the second, third, and fourth scanner visits, entered into a repeated-measures analysis of variance (ANOVA) (i.e., 6 contrasts). Finally, to check that down-regulation effects in the left STG were not due to repeated exposure to the rtfMRI-NF task (i.e., habituation) over runs within a single visit, we examined the main effect of “visit” as this would be less prone to habituation confounds, that is, patients would be unlikely to show habituation in the left STG from visit to visit.

Transfer task

Using a separate model, subject-specific first-level models were created using a regressor encoding the predicted response for the first rtfMRI-NF run vs. the transfer run. Motion parameters generated during the realignment for both runs were also included. A second-level paired t test was used to test for effects in the left STG. The association between changes in left STG activity and changes in PsyRats total scores was investigated using a regression model with a single contrast image (first > transfer run) from each subject and their corresponding change in PsyRats scores as a regressor.

In order to focus on signal change in voice-sensitive regions of the left STG, we conducted analyses within the mean left STG ROI created by the localizer task (contrast: vocal stimuli > non-vocal stimuli). First, we adjusted obliqueness to match the individual T1-weighted structural image, removed the white matter control regions, and transformed the individual masks into MNI space, using parameters generated by unified segmentation of the T1 image. The mean mask was then computed using ImCalc in SPM12 based on the transformed individual functional localizer masks (Fig. 2a). For all second-level analyses, significant ROI results are reported at a p value of ≤0.05 following family-wise error correction on the basis of response amplitude (i.e., peak-level family-wise error (FWE)).

Fig. 2 a 3D SPM render of mean STG ROI based on localizer task (vocal stimuli > non-vocal stimuli). b 3D SPM render showing effect of rtfMRI-NF training in left STG ROI. c Plot showing effect of rtfMRI-NF training in left STG ROI (first/last run from visits 2, 3, and 4 plus transfer scan (visit 4) Full size image

Psychophysiological interactions