Participants

All subjects were English-speaking, medically healthy and right-handed. Patients with schizophrenia or bipolar disorder were diagnosed with DSM-IV criteria [18] by consultant psychiatrists from clinical interviews, medical chart review, and consultation with patients' psychiatrists. All patients with schizophrenia were in remission as assessed by Scale for the Assessment of Positive Symptoms [19] (SAPS) and the Scale for the Assessment of Negative Symptoms [20] (SANS). All patients with bipolar disorder were of Type I bipolar disorder, in an euthymic state, as assessed by the Beck Depression Inventory [21], Hamilton Depression Rating Scale [22], Altman Self-Rating Mania Scale [23], Young Mania Rating Scale [24]. Exclusion criteria were a co-morbid psychiatric or neurological disorder in patient groups, including substance abuse or dependence within the previous 6 months or a history of a psychiatric or neurological disorder in healthy volunteers. All participants provided written, informed consent with approval from the South London and Maudsley (SLAM) NHS Trust (Research) ethics committee. There were a total of 104 subjects: 32 patients with schizophrenia in remission, 32 bipolar disorder in an euthymic state, and 40 healthy controls (Table 1). Subject MRI scans were acquired from fMRI studies conducted at the Institute of Psychiatry, SLaM NHS Trust. Data were obtained from 4 studies: 1) verbal fluency study of schizophrenia and healthy controls [9, 13]; 2) Maudsley Family study, patients with schizophrenia or bipolar disorder and their family members [25]; 3) Maudsley Schizophrenia Twin study; and 4) Maudsley Bipolar Twin study, which involved twin pairs concordant and discordant for schizophrenia and bipolar disorder, respectively, and healthy control twins [26]. From the Family study samples, 1 subject was randomly selected from each family, and from the Twin studies, only 1 subject from each twin set was included to ensure that each individual could be considered statistically independent from the other subjects in the final sample; the inclusion of non-independent subjects could have reduced the variance within each of the groups thereby increasing separation between diagnoses artificially. Groups were matched by their performance on the verbal fluency task in the number of correctly produced words during the fMRI scan. The medication status of the patients with schizophrenia consisted of 20 patients taking atypical antipsychotics, 10 conventional antipsychotics, and 2 were not receiving any medication. The mean chlorpromazine equivalent dosage was 625.9 mg daily (SD = 411.2 mg). The mean SAPS rating was 9.52 (SD = 8.85) and SANS rating was 8.31 (SD = 4.96), reflecting their clinical status as being in remission. In the bipolar patient group, 26 patients were receiving medications and 6 patients were medication-free: 24 with mood stabilizer medication, which was lithium in 14 cases (mean dosage of 817.86 mg daily (SD = 207.91 mg); 8 were also taking regular doses of antipsychotic medication; and 8 subjects antidepressants. From the Maudsley Family study, the 16 bipolar patients had a Beck Depression Inventory mean of 7.76 (SD = 7.16) and a Altman Self-Rating Mania Scale mean of 3.65 (SD = 2.69). From the Maudsley Bipolar Twin study, the clinical ratings were a mean of 5.44 (SD = 8.61) in the Hamilton Depression Rating Scale and mean of 2.00 (SD = 3.71) in the Young Mania Rating Scale. All of the bipolar patients were in a euthymic state, none fulfilled criteria for a major depressive or manic episode or had any active psychotic symptoms.

Table 1 Demographic and clinical characteristics Full size table

Verbal Fluency Task

The experimental condition was a phonological letter fluency task [10] with 2 levels of difficulty [9]. Subjects were instructed to overtly generate a word in response to a visually presented letter shown at a rate of one every 4 seconds, while avoiding proper names, repetitions and grammatical variations of previous words [10]. If subjects were unable to think of a response, they were asked to say "pass". The difficulty of the condition depended on which set of letters was presented. The letters were categorized as "easy" and "difficult" according to the mean number of erroneous responses subjects generated in a previous study [9]. There were 7 presentations of each letter within a 28 seconds experimental block, followed by the control condition which was repetition of the word "rest" presented at the same rate (28 seconds control block). The "easy" set of letters were: T, L, B, R, S or T, C, B, P, S; and the "difficult" set of letters were: O, A, N, E, G or I, F, N, E, G. The order of presentation was randomized between subjects. Verbal responses during scanning were recorded.

Data Acquisition

All MRI scans were acquired following the same procedure with the same acquisition system [9, 13], which is regularly monitored to ensure the quality and stability of fMRI measurements [27]. Seventy-four T2*-weighted gradient-echo single-shot echo-planar images were acquired on a 1.5-T, neuro-optimized IGE LX System (General Electric, Milwaukee) at the Maudsley Hospital, SLAM NHS Trust. Twelve noncontiguous axial planes (7 mm thickness, slice skip 1 mm) parallel to the anterior commissure-posterior commissure line were collected over 1100 msec in a clustered acquisition sequence, in order to allow subjects to make overt responses in relative silence (TE = 40 msec, flip angle = 70 degrees). A letter was presented (remaining visible for 750 msec, height: 7 cm, subtending a 0.4 degrees field-of-view) immediately after each acquisition, and a single overt verbal response was made during the remaining silent portion (entire duration = 2900 msec) of each repetition (TR = 4000 msec).

fMRI Data Analysis

The fMRI data were analyzed using SPM5 (Wellcome Department of Imaging Neuroscience, London, UK). MRI scans were realigned to remove motion effects, transformed into standard MNI space, and smoothed with an isotropic Gaussian filter (FWHM = 8 mm). A mask was applied to select intra-cerebral voxels, and the data were high-pass filtered (cutoff 128 sec) to remove low-frequency drifts.

Subject-level model estimation was performed by convolving a canonical hemodynamic response function model on correct and incorrect trials separately. Realignment parameters were included as nuisance covariates in the General Linear Model (GLM) to adjust for residual motion. For each subject, statistical images were computed representing the contrast word production (correct trials only) minus baseline for easy and difficult letter trials. These subject-level images were included in a second-level random effects ANOVA (analysis of variance) which modeled the diagnostic group effect (schizophrenia, bipolar and control) and included task difficulty as intra-subject factor and gender, age and antipsychotic dosage (chlorpromazine equivalent) as potential confounding factors. As heterogeneous mood stabilizer drugs cannot be easily converted into a single equivalent value we did not devise an adjustment strategy for these drugs. Inferences on the model were conducted using a height threshold of p < 0.001 (uncorrected), followed by a corrected cluster-level significance level of p < 0.05, corrected for multiple comparisons. For those clusters of activation showing a significant main effect of diagnostic group, an exploratory post-hoc analysis was conducted using analogous repeated-measures ANOVA models on the cluster peaks of activation to explore the direction of the group differences, by extracting the beta estimate of activation at the voxel of peak activation for each cluster.

Machine learning classification analysis

We additionally conducted a pattern classification analysis to investigate whether clinical diagnosis could be determined on the basis of activation patterns alone. We employed Support Vector Machines (SVM) classification analysis [28], which has been shown to be a powerful tool for statistical pattern recognition. SVM has proven to be a robust and versatile approach for clinical prediction, as demonstrated by its consistently high performance in head-to-head methodological comparisons of diverse machine learning methods performed with fMRI data [29] and other high-dimensional clinical datasets such as proteomics [30] and genomics [31]. Our group has also demonstrated the potential of linear SVM for neuroimaging-based prediction in depression [8, 17]. The inputs to the SVM classification analysis were the activation patterns of each participant during easy and difficult verbal fluency, thresholded using the ANOVA test for group differences. These activation patterns were then fed to a multi-class linear SVM classifier [32] that learned the statistical boundaries that best separates the groups. Afterwards, this boundary can be used to obtain a diagnostic prediction for the scan of an undiagnosed subject. As implemented here, the procedure finds the boundary that maximises the expected overall classification accuracy in new, unclassified examples. This boundary therefore treats as equivalent two types of errors: false positives (FP, e.g. labelling a control as patient) and false negatives (FN, misdiagnosing a patient as a control). For some clinical applications, such types of errors may not be equivalent. For example, if the clinical goal is to confirm the presence of a disorder, a better classification rule would be one that ensures a low FP rate (high specificity) while tolerating a higher FN rate (lower sensitivity) and potentially a lower overall classification accuracy. Our purpose in the present paper, though, was to establish the potential of the neural correlates of verbal fluency as a diagnostic biomarker, and this proof-of-principle goal benefits from optimising the overall diagnostic accuracy rather than sensitivity or specificity.

To avoid circularity, i.e. using the same data to create a classification rule and test its performance, which can lead to over-optimistic results in diagnostic studies, we employed leave-one-out cross validation (LOOCV). LOOCV entails training the model (fitting both the second-level ANOVA and the linear SVM model) with all subjects minus one, and using the remaining single individual to test the accuracy of the prediction. This process is iterated until the sample is exhausted. We used permutation testing to determine the overal model performance, that is whether the observed performance for the diagnostic classification of bipolar and schizophrenia subjects could have been expected by chance alone, by repeating the whole ANOVA model estimation and linear SVM classification process 1000 times after successive random permutation of the diagnostic labels of subjects. The p-value of the experimental accuracies was computed using the resulting null-hypothesis distributions. Because of the gender imbalance present in our sample, we also repeated this classification procedure for male subjects alone. The cost parameter C of the SVM model was optimized through cross-validation within each training sample. Additional analyses were performed using the following packages of the R statistical software [33]: AnalyzeFMRI which offers input/output, visualisation and analysis functions for fMRI data and the e1071 package, which supplies an interface to the libsvm library http://www.csie.ntu.edu.tw/~cjlin/libsvm/. Coordinates are reported in MNI space.