Participants

Twenty-five patients undergoing intracranial electroencephalographic monitoring as part of clinical treatment for drug-resistant epilepsy were recruited to participate in this study. Data were collected as part of a multi-center project designed to assess the effects of electrical stimulation on memory-related brain function. Data were collected at the following centers: Thomas Jefferson University Hospital (Philadelphia, PA), University of Texas Southwestern Medical Center (Dallas, TX), Emory University Hospital (Atlanta, GA), Dartmouth-Hitchcock Medical Center (Lebanon, NH), Hospital of the University of Pennsylvania (Philadelphia, PA), and Mayo Clinic (Rochester, MN). The research protocol was approved by the Institutional Review Board at each hospital and informed consent was obtained from each participant. Electrophysiological data were collected from electrodes implanted subdurally on the cortical surface as well as depth electrodes within the brain parenchyma. In each case, the clinical team determined the placement of the electrodes so as to best localize epileptogenic regions. Subdural contacts were arranged in both strip and grid configurations. The types of electrodes used for recording and stimulation varied across the data collection sites in accordance with the preferences of the clinicians at each institution. Across the sites, the following models of depth and surface (strip/grid) electrodes were used (electrode diameters in parentheses): PMT Depthalon (0.8 mm); AdTech Spencer RD (0.86 mm); AdTech Spencer SD (1.12 mm); AdTech Behnke-Fried (1.28 mm); AdTech subdural strips and grids (2.3 mm).

Verbal memory task

Each subject participated in a delayed free-recall task in which they were instructed to study lists of words for a later memory test; no encoding task was used. Lists were composed of 12 words chosen at random and without replacement from a pool of high frequency English nouns (http://memory.psych.upenn.edu/WordPools). Each word remained on the screen for 1,600 ms, followed by a randomly jittered 750–1,000 ms blank inter-stimulus interval.

Immediately following the final word in each list, participants performed a distractor task (to attenuate the recency effect in memory, length = 20 s) consisting of a series of arithmetic problems of the form A + B + C = ??, where A, B, and C were randomly chosen integers ranging from 1 to 9. Following the distractor task participants were given 30 s to verbally recall as many words as possible from the list in any order; vocal responses were digitally recorded and later manually scored for analysis. Each session consisted of 25 lists of this encoding-distractor-recall procedure. Some subjects completed sessions of the free recall task using categorized word lists, which were included in the electrophysiological analyses. The categorized recall task is identical to the free recall task, with the exception that the word pool was drawn from 25 semantic categories (e.g., fruit, furniture, office supplies). Each list of 12 items in the categorized version of the task consisted of four words drawn from each of three categories. Subject counts by task: N = 17 free recall only; N = 2 categorized free recall only; N = 6 both free and categorized recall (in separate sessions).

Stimulation methods

At the start of each session, we determined the safe amplitude for stimulation using a mapping procedure in which stimulation was applied at 0.5 mA, while a neurologist monitored for afterdischarges. This procedure was repeated, incrementing the amplitude in steps of 0.5 mA, up to a maximum of 1.5 mA for depth contacts and 3.5 mA for cortical surface contacts. These maximum amplitudes were chosen to be below the afterdischarge threshold and below accepted safety limits for charge density50. For each stimulation session, we passed electrical current through a single pair of adjacent electrode contacts. As the electrode locations were determined strictly by the monitoring needs of the clinicians, we used a combination of anatomical and functional information to select stimulation sites. If available, we prioritized electrodes in lateral temporal cortex, in particular the middle portion of the middle temporal gyrus. To choose among these regions in cases in which more than one was available, we selected the electrode pair demonstrating the largest SME, in the high frequency range (70–200 Hz). In cases in which no lateral temporal cortex contacts were available, we selected an electrode pair at or near the largest SME elsewhere in the brain, targeting the hippocampus, MTL cortex, prefrontal cortex, and parietal cortex if available. Stimulation was delivered using charge-balanced biphasic rectangular pulses (pulse width = 300 μs) at (10, 25, 50, 100, or 200) Hz frequency and (0.25 to 2.00) mA amplitude (0.25 mA steps). The particular amplitude and frequency were chosen based on a pre-test in which we stimulated the brain at each parameter combination, while the patient was at rest (no experimental task). The frequency × amplitude combination that maximized the change in classifier output was used in the closed-loop memory task. In the memory task stimulation was always applied for 500 ms in response to classifier-detected poor memory states (see below). Participants performed one practice list followed by 25 task lists: lists 1–3 (plus the practice list) were used to collect baseline spectral power data for normalizing input features for the classifier; lists 4–25 consisted of 11 lists each of Stim and NoStim conditions, randomly interleaved. On NoStim lists, stimulation was not triggered in response to classifier output.

Anatomical localization

Cortical surface regions were delineated on pre-implant whole brain volumetric T1-weighted MRI scans using Freesurfer51 according to the Desikan–Kiliany atlas. Whole brain and high resolution MTL volumetric segmentation was also performed using the T1-weighted scan and a dedicated hippocampal coronal T2-weighted scan with Advanced Normalization Tools (ANTS)52 and Automatic Segmentation of Hippocampal Subfields multi-atlas segmentation methods53. Coordinates of the radiodense electrode contacts were derived from a post-implant computed tomography and then registered with the MRI scans using ANTS. Subdural electrode coordinates were further mapped to the cortical surfaces using an energy minimization algorithm54. Two neuroradiologists reviewed cross-sectional images and surface renderings to confirm the output of the automated localization pipeline. Targets that were localized to the left inferior, middle, and superior temporal gyri were classified as lateral temporal cortex. Any target outside these regions was classified as Non-lateral temporal.

Electrophysiological data processing

Intracranial data were recorded using one of the following clinical EEG systems (depending the site of data collection): Nihon Kohden EEG-1200, Natus XLTek EMU 128 or Grass Aura-LTM64. Depending on the amplifier and the preference of the clinical team, the signals were sampled at either 500, 1,000, or 1,600 Hz and were referenced to a common contact placed intracranially, on the scalp, or mastoid process. Intracranial electrophysiological data were filtered to attenuate line noise (5 Hz band-stop fourth order Butterworth, centered on 60 Hz). To eliminate potentially confounding large-scale artifacts and noise on the reference channel, we re-referenced the data using a bipolar montage [1]. To do so, we identified all pairs of immediately adjacent contacts on every depth, strip and grid and took the difference between the signals recorded in each pair. The resulting bipolar timeseries was treated as a virtual electrode and used in all subsequent analysis. We performed spectral decomposition (8 frequencies from 3 to 180 Hz, logarithmically spaced; Morlet wavelets; wave number = 5) for the 0–1,366 ms epoch relative to word onset. Mirrored buffers (length = 1,365 ms) were included before and after the interval of interest and then discarded to avoid convolution edge effects. The resulting time-frequency data were then log-transformed, averaged over time, and z-scored within session and frequency band across word presentation events.

Multivariate classification

We included only data collected in record-only sessions as input to a logistic regression classifier trained to discriminate encoding-related activity predictive of whether a word was later remembered or forgotten. We used spectral power averaged across the time dimension for each word encoding epoch (0–1,366 ms relative to word onset) as the input data (informal investigations suggested that spectral power outperformed features derived from phase-based connectivity measures). Thus, the features for each individual word encoding observation were the average power across time, at each of the eight analyzed frequencies × N electrodes. We used L2-penalization55 and set the penalty parameter (C) to 2.4 × 10−4, based on the optimal penalty parameter calculated across our large pre-existing dataset of free recall subjects37. We then computed AUC to quantify classifier performance and repeated this procedure for all subjects. AUC measures a classifier’s ability to identify true positives while minimizing false positives, where chance AUC = 0.50. To ensure the classifier learned equally from both classes (given the imbalance between recalled and not recalled exemplars), we also weighted the penalty parameter in inverse proportion to the number of exemplars of each class55. We assessed the significance of each classifier within-subject using a permutation test in which we randomized the labels of recalled/not recalled events in the training data, computed AUC and repeated the randomization 1,000 times to generate a null distribution of AUCs. Classification analyses were programmed using the Matlab implementation of the LIBLINEAR library56.

We generated a forward model for each subject34 to assess the importance of different features to classification

$$\textbf{A} = \frac{\Sigma_{\mathbf{x}}\mathbf{W}}{\sigma^{2}_{\hat{\bf{y}}}}$$

where \(\Sigma_{\bf{x}}\) is the data covariance matrix, \({\mathbf{W}}\) is the vector of weights obtained from the fitted classifier, \(\sigma^{2}_{\hat{\bf{y}}}\) is the variance of the logit-transform of the vector of classifier outputs across all events, \(\hat{\bf{y}}\). The magnitude and direction of the values in A reflect the strength of the relation between classifier output and each input feature in x. We computed A separately for each subject and plot the average across subjects in Fig. 2b.

Closed-loop decoding

The closed-loop free recall sessions consisted of one practice list followed by 25 task lists: lists 1–3 were unstimulated and used (in addition to the practice list) to collect baseline spectral power data for normalizing the input features to the classifier; lists 4–25 consisted of 11 lists each of Stim and NoStim conditions, randomly interleaved. On Stim lists, we extracted estimates of spectral power from the 0–1,366 ms relative to each word encoding period. We assessed power using parameters identical to those used to train the record-only classifier (see Electrophysiological data processing): 8 frequencies from 3 to 180 Hz, logarithmically spaced; Morlet wavelets; wave number = 5; log-transformed; averaged over time within frequency and electrode; and z-scored based on the mean and SD of the power features collected during the baseline lists. The distribution of powers used for normalization was updated following each NoStim list. The record-only classifier was applied to the resulting frequency × electrode features to derive an estimated probability of recall. If this probability fell below 0.5, we immediately triggered 500 ms of stimulation (mean interval between stimulation onset and presentation of the next word = 962 ms ± 347 ms). On NoStim lists, spectral power features and classifier output were computed identically to Stim lists, but stimulation was disabled.

GLME model

We used GLME models using MatLab’s fitglme.m function to estimate the effect of stimulation on memory performance. In the full model assessing the interaction of stimulation and group on memory, we modeled the recalled/not recalled status of each encoding trial for all subjects as a function of list type (Stim/NoStim) and group (Lateral temporal cortex/Non-lateral temporal), with random slopes and intercepts for the effect of stimulation for each subject and unique stimulation target site (three subjects were stimulated at two different targets in separate sessions). We included in the model stimulated words and matched words from NoStim lists. The NoStim words were matched based on whether classifier output during the closed-loop session was below threshold (i.e. stimulation would have been applied had it been a Stim list). To estimate the effect of stimulation within each of the Lateral temporal cortex/Non-lateral temporal groups we then fit the same model separately for each group without the group predictor.

We also used a GLME model to assess the influence of stimulation on memory for neighboring trials. Here we identified unrecalled stimulated words (and matched NoStim words) that were flanked in the forward and reverse directions by one unstimulated word (or words that based on classifier output would have been unstimulated, in the case of NoStim lists). We modeled the recall output using a binomial model with the following predictors: Stim/NoStim and Lateral temporal/Non-lateral temporal stimulation target, including random intercepts and slopes for the within-subject factor.

Analysis of post-stimulation classifier

To assess the effect of lateral temporal cortex stimulation on neural activity we used the classifier to decode the stimulation-evoked change in physiology. We fit a GLME model to predict classifier output for the word encoding period immediately following delivery of a stimulation train. We included matched intervals from NoStim lists by identifying periods following words that would have been stimulated, and modeled the Stim/NoStim list status of the observations, with separate slopes and intercepts for each subject and unique stimulation target site.

Statistics

Data are presented as mean ± SEM. Unless otherwise specified, all statistical comparisons were conducted as two-tailed tests. Data distributions were either visually inspected or assumed to be normal for parametric tests. We used linear mixed effects models of the trial-level data to estimate the effect of stimulation on behavior and classifier output (e.g., Figure 3), while accounting for repeated subject and subject-stimulation location across observations. Samples sizes were chosen to meet or exceed previously reported studies of invasive lateral temporal cortex stimulation.

Data availability

All de-identified raw data and analysis code may be downloaded at http://memory.psych.upenn.edu/Electrophysiological_Data.