Experimental setup

All surgical and animal care procedures were performed in accordance with National Institutes of Health guidelines and were approved by the Stanford University Institutional Animal Care and Use Committee. Experiments were conducted with adult male rhesus macaques (J and L) implanted with 96 electrode Utah arrays (Blackrock Microsystems Inc., Salt Lake City, UT) using standard neurosurgical techniques. Electrode arrays were implanted in dorsal premotor cortex (PMd) and primary motor cortex (M1) as visually estimated from local anatomical landmarks. Monkey J had two arrays, one in M1 and one in PMd, while Monkey L had one array implanted on the M1/PMd border. Monkey J was 11 years old, and Monkey L was 16-17 years old during experimentation, with arrays implanted 40 and 60 months before experimentation. The monkeys made point-to-point reaches in a 2D plane with a virtual cursor controlled by the contralateral arm or by a BMI. The experimental setup has been previously described28,29,30 and an illustration of the experimental setup is shown in Supplementary Fig. 1. The virtual cursor and targets were presented in a three-dimensional (3D) environment (MSMS, MDDF, USC, Los Angeles, CA). Hand position data were measured with an infrared reflective bead tracking system (Polaris, Northern Digital, Ontario, Canada). Spike counts were collected by applying a single threshold, set to −4.5 × root-mean-square of the spike voltage per neural channel. The raw neural observations used for analyses were binned threshold crossings counted in non-overlapping 15-ms bins. Behavioural control and neural decode were run on separate PCs using Simulink/xPC platform (Mathworks, Natick, MA) with communication latencies of 3 ms. This enabled millisecond timing precision for all computations. Neural data were initially processed by the Cerebus recording system (Blackrock Microsystems Inc., Salt Lake City, UT) and were available to the behavioural control system within 5±1 ms. Visual presentation was provided via two LCD monitors with refresh rates at 120 Hz, yielding frame updates of 7±4 ms. Two mirrors visually fused the displays into a single 3D percept for the user, creating a Wheatstone stereograph. All tasks presented in this study were restricted to a two-dimensional plane. Because this study deals with the dynamics of reaching, we used an animal model where the monkey was free to move the contralateral arm. In the context of BMI, we believe this animal model more closely mimics the neural state of a human subject that would be employing a BMI in a clinical study than a monkey with both arms restrained45. However, as noted in the Discussion, we believe that it is important to study the dynamics of imagined movements.

Tasks

For all experiments conducted in this work, two tasks were utilized. The first was a center-out-and-back reaching task, which was used as a training set for each decoder. The second was a grid task, which was used to evaluate the performance of each decoder. The grid task was used as the evaluation task because it is a selection task that can convey information in a clinically relevant way. The grid task allows the computation of an achieved bitrate which quantifies the rate at which the BMI can communicate information35.

Center-out-and-back task

In the center-out-and-back task, eight targets were placed with uniform spacing on the circumference of a 12-cm radius circle. The subject was required to acquire the center target, followed by one of the eight (randomly chosen) radial targets. The subject was given 2 s to acquire each prompted target. After successful acquisition of a radial target, or following the failure to acquire any target, the subject was prompted to acquire the center target. Each target had a 4 × 4 cm acceptance window centered around the target. For every target selection, the subject had to hold the cursor within the acceptance window of the target for 500 contiguous milliseconds. Training sets for the decoder were comprised of 500 successful trials during which the subject would repeatedly acquire peripheral and center targets. When necessary, a variant of the center-out-and-back task, with eight targets placed on the circumference of an 8-cm radius circle, was used for cross-validating results.

Grid task

The grid task is a generalization task previously used and described by our group which allows for the calculation of an achieved communication rate34. In this study, the grid task comprised a 6 × 6 array of targets, each with a 4 × 4 cm acceptance window. The targets were tiled end-to-end contiguously to create a workspace that was 24 × 24 cm. This grid of targets mimics a keyboard task34,35 where the subject can select any of 36 targets at any time by dwelling in the acceptance window of a target for 450 ms. Because any target can be selected at any time, a correct target selection conveys information; for example, the targets could be alphanumeric characters or symbols from a keyboard. To evaluate performance, the subject had to acquire 1 prompted target out of the potential 36 targets. Although only one target was prompted, every target was selectable if dwelt on for 450 ms. The subject was given 5 s to acquire the prompted target; if no target was selected in 5 s, no target selection would be made. Following target selection, a ‘lock-out’ time of 200 ms was enforced, during which the task would continue to run (that is, the next target would be prompted) but dwell time was not counted; this was done to account for the reaction time of the monkey. Targets were randomly chosen according to a uniform distribution, and therefore, the information conveyed per target selection is log 2 (36) bits. To be conservative in the estimation of achieved bitrate, we compensated every incorrect selection with a correct selection, much like an incorrect selection on a keyboard must be corrected by pressing the delete key. Therefore, the information conveyed on the grid task is calculated by considering the net number of correctly selected targets. Hence, performing the task at a success rate of 50% results in a bitrate of 0 bits per second (b.p.s.), so that no information is conveyed through the task. We calculated an achieved information rate (bitrate) by dividing the amount of information conveyed during target acquisition by the time taken to acquire the targets. Therefore, if in T seconds, c correct selections were made, while ℓ incorrect selections were made, the bitrate was calculated to be:

and if c≤ℓ. This is the achieved bitrate of the decoder on the grid task. To evaluate the performance of a decoder, the monkey performed the grid task in blocks of approximately 100 trials, from which a bitrate across those trials was calculated. Because the monkey’s motivation degrades as the experiment progresses, we evaluated the NDF as the last decoder in each block to ensure that benefits were not due to degrading motivation; the order of decoders tested in each block was therefore deterministic. Each block was run so that decoders were effectively tested in an A–B–A–B–A–... manner (A–B–C–A–B–C–A–... for three decoders, and so forth). The bitrates in each block were paired for statistical testing. We compared the mean performance of each decoder by calculating the mean bitrate across all experimental blocks. Because a positive bitrate, , can be approximated as (a scaled) sum of 100 binary random variables, which take on values 1 or −1, the distribution of their sum (that is, the bitrate within a block) will approach a normal distribution as more trials are collected. We used the paired t-test to test a difference in the means of the bitrates. Collecting the bitrate estimates in a blocked setting, and across experimental days, better justified an assumption of independence. We collected at least 1,000 trials (>10 samples) to determine the mean bitrate, based both on experimental constraints and to better justify an assumption of normality in the mean bitrate. Nevertheless, although we used the parametric t-test in this study, we also performed a Wilcoxon signed-rank test on the paired bitrate differences under the null hypothesis that the median bitrate difference is zero, as well as a Wilcoxon–Mann–Whitney rank sum test for a difference in bitrate distributions. All bitrate differences were significant under these non-parametric tests (P<0.05).

Contribution from the dynamical versus innovations process

As in Fig. 1f, we sought to calculate the contribution of the dynamics process versus the innovations process (taking into account the neural observations) in estimating the next neural state. By the definition of the linear dynamical system used in this work, we have that , where denotes expectation. That is, the progression of the neural state is on average given by the dynamics update. The innovations process, which is a process with zero mean, updates the estimate of the neural state, by using the observed neural data, y k . The innovations are what cannot be explained in the neural data by our observation process, i.e., . These innovations are projected by the Kalman filter gain, , where is the covariance of the estimate . Hence, the Kalman filter estimate of the neural state at time k is

We calculated the contribution of both the dynamics process and the innovations process in predicting the next state from the previous state, that is, . The contribution from the dynamics process is , while the contribution from the innovations process is . Therefore, to calculate the contribution from the dynamical state update process versus the innovations process, we calculated the ratio:

(An intuition for this ratio is that is the first order approximation to the continuous dynamics matrix, , where . Thus, this ratio calculates the relative contribution to the neural state velocity, as implied by Fig. 1f).

We measured the average of this ratio, where K is the number of neural observations (i.e., number of observations across time). Across 13 experimental days in Monkey J, where a new LDS was learned on each experimental day, we found the average of this ratio, , to be 0.37±0.02 (mean±s.d.), while for 16 experimental days in Monkey L, we found it to be 0.49±0.04.

Decode algorithms

We utilize the following abbreviations for decoders: the neural dynamical filter (NDF), the optimal linear estimator (OLE), the Wiener filter (WF), and the kinematic-state Kalman filter (KKF). In all decoders, the decoded kinematics are the 2D position ( ) and 2D velocity ( ) of a computer cursor. Neural spikes were counted in non-overlapping 15-ms bins, and were used as the observations for all decode algorithms. Our choice of bin width is informed by a previous result in online BMI experiments, which demonstrated that smaller bin widths lead to increased performance30. Given that the decoded position and velocity of the cursor at time k were and respectively, the decoded position shown to the subject, p k , was calculated as:

with α=0.975 and Δt being the bin width of the decoder. This indicates that the final decoded position is a weighted sum, with 2.5% contribution from the decoded position, and 97.5% contribution from the integrated velocity. The small position contribution in part stabilizes the position of the decoder in the workspace29. Other work has noted the importance of taking into account the position contribution of the signal28.

All decoders were trained using data collected while a subject made reaches on a center-out-and-back task for 500 successful trials. Although the decoders were trained using data collected while the subject performed a center-out-and-back task, all decoders were evaluated on the grid task.

Neural dynamical filter. To learn a NDF, we modelled the following latent state linear dynamical system:

where n k and r k are zero mean Gaussian noise terms with diagonal covariance matrices N and R, respectively. We learned this latent state linear dynamical system in an unsupervised fashion from the sequence of observed neural activity. The time series of neural observations {y k } k=1,2…,K were treated as the observed output of a latent state LDS. We did not perform any preprocessing steps on the binned spike counts, y k . We performed expectation maximization (EM) to learn the parameters (M, P, N, R) of the LDS, as described in a previous report24. Briefly, the E-step involves computing the expected joint-log likelihood of the neural state and the neural observations, which can be deduced from the graph structure of Fig. 5a:

where and N and d are the number of channels and the dimensionality of the latent state, respectively. The joint log-likelihood, given all parameters, can be computed via Kalman smoothing. The M-step then involves maximizing the parameters (M, P, N, R, π 1 , S 1 ) with respect to the joint log-likelihood. We note that while we computed π 1 and S 1 , they were of no practical consequence when running in closed-loop after several seconds. The E-step and M-step alternated to increase the log-likelihood of the observed data. More details can be found in the report by Ghahramani and Hinton24. When performing EM, we utilized an approximation in the E-step: we assumed that the Kalman smoothing parameters remained constant after convergence of the estimated state covariance matrix within reasonable tolerance. In the offline analyses of this study, the EM algorithm was initialized with factor analysis. In online prosthetics experiments, we also learned dynamical systems where the EM algorithm was initialized using previously learned dynamical systems. Initialization from a previously learned LDS decreased the convergence time and sometimes resulted in more optimal LDS’ (since EM is subject to local optima). We briefly evaluated the performance of NDF algorithms using each of the learned dynamical systems, and chose the one with the highest performance.

After learning the parameters of the latent state dynamical system via EM, we used the steady-state form of the Kalman filter to estimate the neural state, , at each point in time from the sequence of neural observations, y k , in the training data. It was reasonable to use the computationally efficient steady-state form of the Kalman filter, since convergence to steady-state occured on the order of seconds. We thus had a sequence of decoded neural states, and a corresponding sequence of observed training set kinematics, , where x k contains the position and velocity of the hand-controlled cursor at time k. We then found the matrix L s , which minimizes the mean squared error, ||X–L s [S; 1]||2, where 1 refers to a row of 1's appended to the bottom of S to allow for a bias to be learned. After defining S b =[S; 1], the solution is .

We note two related offline BMI studies; the study by Wu et al.40 utilized a latent state, while the study by Truccolo et al.41 modelled temporal interactions across the neurons. In the study by Wu and colleagues, a latent state model was learned in conjunction with the observed kinematics, so that the latent dynamical process is coupled to the kinematics. Interestingly, it was found that with this model, the parameters could not be identified with EM when the hidden state dimensionality was >3, which suggests that it does not adequately capture the relatively higher-dimensional neural dynamics10. In the study by Truccolo and colleagues, it was found that the interactions across neurons was only significant for 3–5 ms, which are of far shorter time scales than those used in this study.

Optimal linear estimator. The OLE31 was fit by solving the least-squares regression problem between the sequence of observed kinematics in the training set, X, and the corresponding sequence of observed neural data, Y=[y 1 ,y 2 ,…,y K ]. Analogous to the NDF case, we solved for the matrix L y minimizing the mean squared error ||X–L y [Y; 1]||2. We then defined Y b =[Y; 1], so that a row of 1's was appended to the bottom of the matrix to account for a bias term. The solution is . To allow for the sequence of neural data to have smoothness, we also convolved every row of Y with causal Gaussian kernels having standard deviations ranging from 25–200 ms.

Wiener filter. The WF incorporates neural history into the regression problem by finding the optimal coefficients for historical neural data. The Wiener–Kolmogorov filtering approach finds the optimal matrices, L 0 , L 1 ,…, L p−1 such that , where the difference between the decoded and observed kinematics is minimized in the least-squares sense. Hence, the WF operates on a history of neural data of length pΔt, where Δt is the bin size in which spikes were counted. To fit the WF, we first define X [i:j] =[x i x i+1 …x j ] for i<j, and the following matrix:

where p is a parameter denoting the amount of history used in the decoder, and K is the total number of bins observed. By defining L W =[L 0 L 1 …L p−1 b WF ], the horizontal concatenation of the matrices L j for j=0, 1,…,p−1 (and a bias term), we could solve for L W such that the error metric was minimized. After defining , the solution is . Analogously to least-squares, the term represents the time-averaged cross-correlations between the kinematics and neural activity up to p bins in the past, while the term represents the time-averaged autocorrelations of the neural data up to p bins in the past. To avoid overfitting, we also regularized the regression, so that we found . Both parameters p and λ were found by optimization in online control, where p and λ were swept and the performance of the WF evaluated. We found that the optimal amount of history to use was ∼250 ms for both subjects, although the minimum was shallow in the range of 200–300 ms. This is a significantly smaller history than that used in previous works33,37,51,52. We observed a degradation in performance when the history was greater than 500 ms due to a noticeable lack of fine-control in the cursor. This is because assigning significant weight to neural data relatively far into the past will cause the decoder to have significant lag in responding to the subject's changing intention.

Kinematic-state Kalman filter. The KKF models the kinematics at time k, x k , as the state of a linear dynamical system, while the simultaneously observed neural population spike counts, y k , are the corresponding output of the system. This is represented by the two equations,