Participants

Forty individuals meeting DSM-IV-TR criteria for schizophrenia (SZ) and 12 healthy comparison (HC) participants were enrolled in a registered clinical trial (identifier: NCT00923078, https://clinicaltrials.gov/) at time of this analysis. For purposes of the current analysis, only data collected at study intake will be presented. The study was conducted under oversight of VA Connecticut Healthcare System (VACHS) Human Studies Subcommittee (HHS protocol # 01245) and Yale University Human Investigation Committee (HIC protocol # 1003006388) institutional review boards. All participants provided written informed consent prior to initiating any study procedures and were compensated $75 for data collected at study intake assessment. Recruitment of HC participants was conducted according to match (age, gender, race) with SZ participants. Sample descriptive statistics are presented in Table 1.

Table 1 Sample descriptive statistics Full size table

Inclusion was limited to individuals aged 18 and 70, native English speaking, with stable housing for minimum of 30 days. In addition, SZ sample members had minimum of 30 days since discharge from last hospitalization, 30 days since last change in psychiatric medications, and were receiving mental health services through VACHS or Yale affiliated outpatient facilities. Individuals were excluded based on current (past 30 days) diagnosis of alcohol or substance abuse disorders, history of brain trauma or neurological disease, mental retardation or premorbid intelligence ≤ 70, and auditory or visual impairment that would interfere with study procedures. In addition, any current or past DSM-IV Axis I diagnosis was exclusionary for HC sample enrollment.

Clinical assessment measures

All participants underwent a clinical interview to obtain treatment, substance use, medical, legal, employment, and psychosocial background information. Diagnosis of SZ sample participants was confirmed using the Structured Clinical Interview for DSM-IV-TR (SCID-I/P; [27]), administered by a licensed clinical psychologist. The Mini International Neuropsychiatric Interview (M.I.N.I; [28]) was administered to healthy volunteers to screen for psychiatric conditions that would be exclusionary. The Wechsler Test of Adult Reading (WTAR; [29]) was administered to all participants to obtain an estimate of premorbid intellectual endowment and the MATRICS Consensus Cognitive Battery (MCCB; [30]) was used to test current cognitive ability across multiple domains. Age- and gender-corrected t-scores for MCCB Working Memory Composite and Continuous Performance Test–Identical pairs (CPT-IP) subtest were used in the current analysis to cross-validate SVM-derived models of EEG activity related to working memory.

EEG data collection procedures

Participants were seated in front of a 24” LCD monitor (1920x1200 pixels, 75 Hz refresh rate) at a viewing distance of 100 cm in a dimly lit room. EEG was recorded using a 64-channel BioSemi ActiveTwo (BioSemi B.V., Amsterdam, Netherlands) bio-amplifier and electrode system with sensors located according to the 10–20 system. Additional electrodes were placed bilaterally at mastoids (reference), the outer canthi of both eyes (horizontal electrooculogram; HEOG), and above and below the right orbit (vertical electrooculogram; VEOG). Continuous EEG was monitored online in ActiView V6.05 and acquired at a 1024 Hz sampling rate with a bandpass filter setting of 0.16-100 Hz. The Sternberg task was administered using NBS Presentation software (Neurobehavioral Systems, Inc., Albany, CA), with behavioral responses captured using two buttons of a Cedrus RB-834 response pad (Cedrus Corporation, San Pedro, CA). Total EEG set up time was approximately 30 min, and the Sternberg task was administered in three blocks of interspersed between blocks of two additional auditory ERP tasks (not included in current report).

Sternberg working memory task

A version of the Sternberg working memory task (SWMT), modified from Raghavachari et al. [31], was used in the present study. Stimuli consisted of sequentially presented letters (200 ms duration, 1200 ms ISI), spanning sets of 4–8 letters each, randomly generated from an array of 12 letters. For each trial the stimulus set was followed by a 3200 ms retention period that terminated with a response probe letter. Participants were instructed to press one of two response pad buttons, using right or left index finger, to indicate whether the probe letter was or was not presented in the preceding set. The response probe remained present for the duration of the response window, up to 3500 ms, and terminated at time of button press. Auditory feedback was given to indicate correct, incorrect, or time-out (after 2000 ms) on each trial. Feedback was followed by 1000 ms of black screen and a fixation “+” cross for another 1000 ms preceding the first stimuli of the next set. A total of 90 trials was administered over three blocks of 30 trials, each block lasting approximately 8 mins.

EEG signal processing

Data analysis was conducted using BrainVision Analyzer software v2.0 (Brain Products, Munich, Germany). SWMT EEG data was re-referenced offline to the average mastoid, broadband filtered from 1–70 Hz (12 dB/oct) with a notch filter at 60 Hz, and segmented according to four stages of processing (Fig. 1); pre-stimulus baseline (500 - 1200 ms relative to fixation), encoding (−200 - 8000 ms relative to fixation), retention (−3400 - 800 ms relative to probe), and retrieval (−200 - 800 ms relative to probe). The analysis window selected for the encoding stage spanned the first 5 letters (or all 4 when span = 4) of each trial. This window was selected to optimize the amount of information that could be consistently extracted across trials varying in length based on span.

Fig. 1 Example of Sternberg Working Memory Task (SWMT) trial depicting span of 4 items and time spans of pre-stimulus baseline, encoding, retention, and retrieval stages. Span ranged from 4-8 items, with span width and items selected randomly on a trial by trial basis Full size image

Following segmentation, ocular artifact correction was applied [32] and segments containing activity ±75 μV at electrodes Fz, Cz, and Oz were excluded. Time-frequency extraction was applied to single trial data using Morlet continuous wavelet transform (parameter c = 3.8) over 20 frequency steps from 4–50 Hz. Data at encoding and retrieval stages was averaged to extract event-related spectral perturbations (ERSP), elicited in response letter memory and probe stimuli, respectively. Encoding stage frequency extraction was baseline normalized to a window of −200 to -50 ms relative to fixation cross, while retrieval was normalized to a window of −200 to -50 ms relative to response probe onset. The same wavelet transform was applied to EEG data at pre-stimulus baseline and retention stages without normalization. Time-frequency data was output in the form of squared wavelet coefficients (μV2) binned and averaged according to response accuracy (correct vs. incorrect), and exported in five frequency bands at each of the four stages of processing: Theta 1 (θ 1 ), centered at 4.00 Hz (range: 3.12 - 4.88); Theta 2 (θ 2 ), centered at 6.42 Hz (range: 5.01 - 7.83); Alpha (α), centered at 11.26 Hz (range: 8.79 - 13.73); Beta (β), centered at 18.53 Hz (range: 14.46 - 22.59); Gamma (γ), centered at 40.32 Hz (range: 31.48 - 49.16). Time-frequency values were exported for statistical analysis based in the following windows: pre-stimulus baseline (500 - 1200 ms relative to fixation); encoding (1000 - 7000 ms relative to fixation); retention (−3000 - 0 ms relative to probe); and retrieval (0 - 600 ms relative to probe). All statistical analyses were conducted on spectral power measured at three midline electrode locations: Frontal (Fz), Central (Cz), and Occipital (Oz).

Machine learning feature selection

From a machine learning point of view, our analysis is a variable selection problem that aimed to identify the EEG features most relevant to SWMT performance and diagnostic group differences. Variable selection methods are often divided along two lines: filter and wrapper methods [33]. The filter approach of selecting variables serves as a preprocessing step to the model construction. The main disadvantage of the filter approach is that it ignores the effects of the selected variable subset on the performance of the classification algorithm. The wrapper method searches the optimal variable subsets using the estimated classification accuracy, as the measure of goodness, when the subset of variables is used in classification. Thus, the variable selection is being “wrapped around” a particular classification algorithm. Wrapper methods typically outperform filter methods [34].

For the current analysis variable selection was conducted using a wrapper method that is wrapped around the so-called 1-norm SVM [35]. SVM is a supervised learning method which has the ability to weigh input features according to their relevance to the classification target, as determined through the learning process. Most SVMs, including the one implemented in this study, construct a linear classifier that predicts, by thresholding the classifier real-valued output, whether new cases of data will fall into one of two categories. The classifier used in the current analysis was based on a linear function of the form of w T x + b, where w is the weight vector to be determined, x is the input vector representing EEG features and w T x represents the dot product between the two vectors. It obtains the best model coefficients in w by minimizing the following regularized risk function:

$$ {\displaystyle \sum_{j=1}^d}\left|{w}_j\right|+C{\displaystyle \sum_{i=1}^n}{\varepsilon}_i $$

where d represents the number of variables (i.e., EEG features) in total, n represents the number of records collected in the training set, and ε i = max{0, 1 − y i (w T x i + b)} denotes the so-called hinge loss to measure the training error [36], where y i represents the class label, such as “correct response” versus “incorrect response” of the record i that is numerically characterized by an input vector x i (i.e., the vector of features extracted from that record).

A record consisting of 60 features of EEG data was extracted for each participant, including five frequency bands (theta 1, theta 2, alpha, beta, and gamma), three scalp locations (frontal, Fz; central, Cz; occipital, Oz), and four information processing stages (pre-stimulus baseline, encoding, retention, and retrieval). Features were binned separately based on trial accuracy and assigned a binary label indicating whether trials received correct (+1) or incorrect (−1) responses. Accordingly, EEG features receiving positive valence weightings can be interpreted as more highly predictive of correct trial performance, with those receiving negative valence predictive of incorrect performance. The SVM algorithm was applied in two models: (1) to classify correct vs. incorrect trial performance within each sample, referred to hereafter as Model 1, and (2) to classify between SZ and HC groups across correct and incorrect trials, referred to as Model 2.

Although the current analysis was based on a small study (12 HC and 40 SZ), a large number of EEG features (60) were used to represent each case. This circumstance poses risk for over-fitting, meaning that the resultant classifier could achieve good accuracy during training but poor validation accuracy. According to statistical learning theory [36], regularization is the most effective way to control over-fitting. SVM methods optimize a regularized loss function for the best classifier where either the two-norm regularizer \( {\left|\left|w\right|\right|}^2={\displaystyle \sum_{j=1}^d}{w_j}^2 \) or one-norm regularizer \( {w}_1={\displaystyle \sum_{j=1}^d}\left|{w}_j\right| \) is used. In the current implementation, the 1-norm regularizer was chosen because this regularizer enforces sparsity of the weight vector w, meaning many entries of w will be zeros. More precisely, although 60 features were used in the SVM classifier training, when the classifier is built by SVM, only 3 ~ 10 features were actually used by the classifier because other features received zero weights in the model.

The parameter C in the 1-norm SVM was tuned in a 3-fold cross validation process where the respective data set was evenly split into 3 disjoint subsets. At each fold, we tested on a subset of the data the classifier obtained by SVM from the remaining data. Receiver operating characteristics (ROC) curves were used to examine the performance of the classifiers. Specifically, the area under the curve (AUC) was reported. We average the AUC values over the three folds for each choice of C in a range from 0.1 to 10 with a step size of 0.1. The value of C that produced the best cross validation performance was used to train the final classifier from all records. The cross validation performance for SVM with the chosen C value was also reported. In addition to AUC values, precision, recall, and F1 score were computed.

The analysis of Model 2 presented an unbalance classification problem due to far fewer HCs (n = 12) than SZs (n = 40). Therefore, a commonly used procedure in SVM was adopted to balance the sample size. Specifically, the analysis penalized errors that occurred in the HC samples 3 times more than the errors in the SZ samples. This created the similar effect as up-sampling HC three times. Mathematically, this procedure corresponded to revising the regularized loss function as follows: