Understanding the detailed dynamics of neuronal networks will require the simultaneous measurement of spike trains from hundreds of neurons (or more). Currently, approaches to extracting spike times and labels from raw data are time consuming, lack standardization, and involve manual intervention, making it difficult to maintain data provenance and assess the quality of scientific results. Here, we describe an automated clustering approach and associated software package that addresses these problems and provides novel cluster quality metrics. We show that our approach has accuracy comparable to or exceeding that achieved using manual or semi-manual techniques with desktop central processing unit (CPU) runtimes faster than acquisition time for up to hundreds of electrodes. Moreover, a single choice of parameters in the algorithm is effective for a variety of electrode geometries and across multiple brain regions. This algorithm has the potential to enable reproducible and automated spike sorting of larger scale recordings than is currently possible.

Although fully automated spike sorting has been of interest for many years (), and despite prior efforts to automate sorting algorithms (), the majority of laboratories still rely heavily on manual intervention. In this work, we set out to develop a fully automated spike sorting algorithm having error rates that are comparable to or lower than those of existing manual and semi-manual approaches, and with runtimes faster than acquisition times. We introduce MountainSort, a novel spike sorting algorithm and open-source software suite of processing, visualization, and curation tools.

From the standpoint of efficient and reproducible science, any human intervention has disadvantages. Manual sorting can have error rates in excess of 20% (), and there is substantial variability in labeling across different sorting sessions (). Furthermore, the human spike sorter could never keep up with the increasing volume of data arising from increasingly large electrode arrays applied over increasingly long durations ().

The majority of spike sorting algorithms comprise a sequence of steps such as band-pass filtering, spatial whitening, detection of threshold-crossing events, and clustering based on voltage waveform shape in a suitable feature space; this generally involves one or more manual processing steps (). At one extreme, the clustering itself is performed by a human operator, viewing the event features in two-dimensional projections, and drawing cluster boundaries with the assistance of specialized user interfaces (Xclust, M.A. Wilson; MClust, A.D. Redish; Offline Sorter, Plexon). In other situations, clustering is automated, but the user must curate the results by selecting which clusters to reject, merge, or even split (). There also exist post-processing steps that resolve overlapping spikes () and algorithms based on independent component analysis (ICA) that do not explicitly involve clustering (). Presently, despite many available packages and proposed algorithms, no generally adopted software packages offer fully automated sorting that can take in the raw time series data and output spike times and identities without the expectation of further curation.

A number of challenges make spike sorting more complex than clustering in many other disciplines. First, there is no simple noise model. The background signal arises from the combinations of multiple complex signals, including small spikes from hundreds of distant neurons, and can contain electrical noise mixed with true neural signals (). Second, variation in waveform shapes for a given cell can be highly non-Gaussian and skewed, particularly when bursting occurs () or when the cell positions drift over time relative to the physical electrode (). Multiple neurons may fire simultaneously, leading to time-overlapping spike signals, and, while this may not occur frequently in some brain areas or with small electrode arrays, for large arrays sampling tens to hundreds of neurons, individual spikes will often time-overlap with other events ().

Advances in our understanding of how populations of neurons represent and process information has been enabled by tightly packed multi-electrode arrays that allow for isolation of large numbers of simultaneously recorded neurons (). Data collected from these devices comprise multiple channels of continuously sampled extracellular voltages. A key step in making these data interpretable is spike sorting, the process of detecting spiking events and assigning those events to single units corresponding to putative individual neurons ().

To assess how processing times scale with the duration of recording and number of events, we processed subsets of the 7 hr, 16-channel probe dataset that contained ∼1.1 million events per hour. We found that processing times scaled roughly linearly with the data duration, with a preprocessing time of ∼1 min per hour of the recording and a sorting time of ∼1.3 min per hour ( Figure 7 A). We also wanted to assess the efficiency on machines with fewer processing cores. We varied the number of logical cores between four and 20 for processing a 4-hr subset of the same 16-channel data. We note that, even though the processing is fastest when using 16 or more threads, the processing speed is still much faster than real time when restricting to only four logical cores. Taken together, and even assuming linear scaling with the number of electrodes (sublinear scaling is expected due to parallelization if we also increase the number of cores), our results suggest that 320 channels from the same electrode array could be sorted in real time on a single machine; this is on the order of the graphical processing unit (GPU) rate reported infor KiloSort.

MountainSort was also able to cluster all of these datasets much faster than real time. We carried out timing tests on a Linux workstation with 192 GB RAM and 20 physical processors with hyper-threading capability (although not all cores were used in the experiments, as indicated). The 46-min, four-channel dataset used in Figures 2 and 3 had a total MountainSort runtime of 40 s (including 22 s preprocessing) when run using 16 threads. This is around 70 times faster than real time. The 6.6 hr, 16-channel dataset used in Figure 4 involved over 7 million detected events and had a total MountainSort runtime of 805 s (including 336 s preprocessing) using 16 threads. This is 30 times real time. The 128-channel publicly available dataset with juxtacellular ground truth () was sorted in 249 s (including 128 s preprocessing) using 40 threads. This represents 2.5 times real time.

(A–C) Synthetically generated spike waveforms were superimposed at random times on background signal taken from a real dataset (see STAR Methods ). Accuracies for each simulated cluster (see STAR Methods ) are plotted against the peak amplitude of the waveform. Clusters not matching any true units are not shown. Clusters automatically accepted by MountainSort (using the isolation and noise overlap metrics) are marked using a filled-in blue circle.

In addition, we applied the algorithms to a publicly available tetrode recording from a rat hippocampus () with a juxtacellular ground truth channel. MountainSort found the unit with 10% false-positives and 11% false-negatives. Spyking Circus had only 6% false-positives but 21% false-negatives. KiloSort could not be run (in the current version of the software) because the channel count was too low.

We then applied MountainSort, KiloSort, and Spyking Circus to a publicly available 128-channel dataset with independent juxtacellular firing information for one of the cells (). This dataset is one of ten datasets in the repository exhibiting varying levels of sorting difficulty. We found that only one of these featured a (ground-truthed) cell with sufficiently high amplitude-to-noise ratio to perform accurate sorting using any of the techniques. MountainSort identified this high amplitude unit with very high accuracy (>99%; 24 false-positives and 32 false-negatives out of 4,895 true events) and appropriately marked it as a bursting pair. KiloSort also split the unit into two pieces with >99% accuracy, but has no mechanism to report bursting pairs. Spyking Circus split the true cluster into two pieces, but the larger portion (60% of events) were merged with a low-amplitude cluster that included many false-positives. Instructions for running these algorithms on this publicly available dataset are provided in the software repository for MountainSort.

Based on autocorrelograms and place maps, we believe the clusters found by MountainSort and not by the other algorithms to be valid units, including MountainSort 12, a putative interneuron. Clusters in the KiloSort or Spyking Circus sorting that comprise events from multiple MountainSort clusters ( Figure 5 ) could be the result of a MountainSort false split or a KiloSort or Spyking Circus false merge. For simplicity, attention was focused on the cases where KiloSort and Spyking Circus had the same or overlapping MountainSort cluster subsets (KiloSort 48, Spyking Circus 6; KiloSort 8, Spyking Circus 23; KiloSort 2, Spyking Circus 2). Despite KiloSort and Spyking Circus having a degree of agreement, the MountainSort clusters showed appreciable differences in PC analysis (PCA) projections, waveforms, and spatial firing properties ( Figure S5 ), suggesting that the MountainSort clusters were more likely to correspond to well-isolated single units.

Confusion matrices comparing MountainSort with KiloSort and Spyking Circus on the hippocampal dataset are shown in Figure 5 . For visualization purposes, the obviously invalid clusters (based on autocorrelograms) were manually excluded from KiloSort and Spyking Circus prior to assembling the matrices. These matrices make it clear that the three algorithms find many of the same units but also highlight a number of clusters where the algorithms produce different results. These include one cluster that MountainSort identified but KiloSort did not (MountainSort 12) and nine clusters that MountainSort identified but KiloSort merged into other clusters (MountainSort 18, 7, 23, 25, 29, 11, 10, 27, 17), seven clusters that MountainSort identified but Spyking Circus did not (MountainSort 5, 27, 25, 12, 9, 8, 7), and seven clusters that MountainSort identified but Spyking Circus merged into other clusters (MountainSort 10, 23, 18, 17, 16, 11, 4). Taken together, neither KiloSort nor Spyking Circus identified MountainSort 12, and both KiloSort and Spyking Circus merged MountainSort 18, 17, 11, 10, and 23 into other clusters. The seven clusters found by KiloSort but not by MountainSort (appearing on the right side of Figure 5 A) correspond to low-amplitude clusters rejected in the automated annotation stage of MountainSort for having a high noise overlap score.

The comparison to Kilosort is shown in (A), and the comparison to Spyking Circus is shown in (B). See Figure S2 for details on interpreting confusion matrices. For readability, entries with fewer than ten events are marked with a period. MountainSort is fully automated, whereas a number of obviously invalid clusters needed to be manually excluded from the KiloSort and Spyking Circus runs before assembling the confusion matrix.

MountainSort aims to provide a fully automated spike sorting pipeline in the sense that it takes as input a raw time series and generates a set of well-isolated clusters. Most other software packages provide only a degree of automation by producing a set of clusters requiring further curation. This is oftentimes done using manual means: the expectation is that users will discard, merge, and sometimes even split clusters before using the results for downstream analyses. Setting aside the issue of human intervention, we compared MountainSort with two other spike sorting packages: KiloSort () and Spyking Circus (). The three algorithms were applied to (1) real data from our laboratory (the tetrode dataset described above), (2) a publicly available extracellular dataset with known ground truth (128-channel silicon polytrode together with a juxtacellular ground-truth measurement) (), and (3) simulated data obtained from superimposing synthetic waveforms on background signal taken from a real dataset. Each of the three software packages has parameters that can be modified to optimize performance for different applications. However, as our objective is to create a spike sorting environment that is well suited to a diverse set of problems without requiring parameter tuning, we used the recommended set of parameters in each of the three packages (i.e., default values or the settings used in the examples distributed with the software). Importantly, for each algorithm we used the same parameters in all three experimental settings.

Applying MountainSort to this dataset resulted in the identification of 37 putative single units ( Figure 4 B). Importantly, as in Figure 3 , the putative single units have few, if any, refractory period violations ( Figure 4 C). Putative noise clusters (noise overlap >0.03) are shown in Figure S4 . After removing clusters with high noise overlap, there was only one cluster pair flagged as non-isolated (isolation <0.95). This was the one identified bursting cell pair, MountainSort ID 32 and 33, 0.91 isolation ( Figure S4 C). The cluster pair with the next highest overlap was cluster pair 30 and 31, 0.97 isolation, a pair identified as putative single units during automated annotation ( Figure S4 D). These results demonstrate that MountainSort can be applied to a range of datasets without the need for parameter adjustments.

Recall that MountainSort independently applies spike sorting on small electrode neighborhoods and then consolidates across all channels. In the case of the 16-channel probe, each local neighborhood consisted of up to seven electrodes. As a consequence, the feature space in which each electrode’s clustering was done was derived from a different, sometimes non-overlapping, set of electrodes. This is a notable difference with the tetrode dataset where every channel was included in every neighborhood.

While tetrodes remain in use across many laboratories, new, high-density multielectrode arrays offer the ability to record from much larger ensembles of neurons (). MountainSort has a number of features designed specifically for such arrays. To demonstrate and evaluate these features, we applied our algorithm to 7 hr of data from a 16-channel, polymer probe ( Figure 4 A) () dataset with challenges similar to those of the tetrode dataset used in Figures 2 and 3 . Although this array was placed in a different brain region (prelimbic cortex), had four times as many channels as the tetrode recording shown above, and contained a full 7 hr of continuous recording, we used identical parameters for the clustering pipeline and for the quality metric thresholds.

Finally, it is worth noting that only one of the isolated units was able to be identified on the basis of the individual channel sorting. This was MountainSort cluster 21. When using a single channel instead of all four channels, we found a noise overlap of 0.052 versus 0.021, isolation 0.98 versus 0.99, and SNR 5.05 versus 3.40. Interestingly, this is an interneuron of relatively low amplitude (mean peak height of ∼110 μv). This suggests that single channels are insufficient to isolate single neurons, at least in hippocampal area CA1. We believe this would be true for any sorting algorithm.

Five clusters were tagged as overlapping with noise (using noise overlap >0.03). Visual inspection of the events in these clusters revealed four of the five having of broad and symmetric waveforms ( Figure S3 E), characteristic of what one might expect from the summation of activity from many distant neurons crossing the event detection threshold. Furthermore, all five of these clusters have events falling in the refractory period, suggesting that the noise overlap measure effectively identifies clusters with events that should indeed be considered noise ( Figure S3 F). In summary, the findings above indicate that MountainSort can produce high-quality automatic sorting of tetrode datasets.

Overall, we found that an isolation score <0.99 could reflect either activity originating from at least two different neurons ( Figures S3 A and S3B) or activity from a single, bursting neuron with substantial amplitude variation and a multimodal cluster distribution. One such bursting cluster pair was automatically identified in this dataset ( Figures S3 C and S3D cluster labels 14, 16; isolation = 0.97). Examination of their respective waveforms shapes, spatial firing properties ( Figure S3 C), and cross-correlation ( Figure S3 D), suggest that the two clusters come from a bursting neuron, with the higher amplitude cluster often spiking before the lower-amplitude cluster. Indeed, after an automatic merge of MountainSort clusters 14 and 16 based on an identification of burst pairs (see STAR Methods ), we find strong similarity to manually identified clusters (manual 1 cluster 5, manual 2 cluster 2, manual 3 cluster 5, Figure S2 B). Here, we note that automatically joining these two clusters required adding assumptions about relative spike timing and amplitudes into the post-clustering automated annotation stage. Nonetheless, while the merge is done in an automated fashion, the software maintains a record of that annotation alongside the original cluster assignments. This makes it straightforward for other scientists to assess all annotations made during sorting.

We further evaluated the quality of the isolation of these additional units by identifying the cluster pairs that had the most similar waveforms, as quantified by isolation score, and asking whether there was evidence of contamination or low-quality clustering in those pairs. In this dataset, clusters 7 and 9 have the lowest isolation score of 0.96, and clusters 6 and 4 have the next lowest, an isolation score of 0.97. Despite these lower isolation scores, both pairs of clusters have noticeable waveform differences, clear separation in the principal component (PC) space, and a difference in spatial firing preferences ( Figures S3 A and S3B). This suggests that these clusters all represent well-isolated single units that likely correspond to single neurons.

MountainSort also identifies a large number of clusters that were not identified by the manual sorters. For the tetrode data featured in Figure 2 , we used the following metric thresholds: noise overlap <0.03, isolation >0.95, firing rate >0.1 Hz, SNR >1.5, although identical classifications for this dataset would result from using only noise overlap and firing rate. After automated curation and one automated bursting-related cluster merge (below), this resulted in the identification of 24 putative single units ( Figure 3 A). Importantly, these isolated clusters have few, if any, refractory period violations even though MountainSort does not use time information for clustering decisions ( Figure 3 B). Furthermore, 22 of the 24 putative single units have spatially restricted firing properties, consistent with expected behavior from hippocampal CA1 principal cells ( Figure 3 C). MountainSort clusters 12 and 21 do not have spatially restricted firing properties and are relatively low amplitude; however, both have few refractory period violations and also have noise overlap scores that are below threshold (0.01 and 0.02, respectively; noise overlap threshold = 0.03). Further, the high firing rates of these units suggest that they are likely to correspond to one of the subtypes of inhibitory interneurons that can be found near the CA1 cell layer ().

Are these additional events likely to be true spiking events associated with the cell? Ground truth is not available for this dataset, or for the vast majority of other datasets, but in this case we can take advantage of the well-known “place fields” of hippocampal neurons () to infer the accuracy of the sorting. We therefore examined the animal’s location at the times the spikes were detected. If the additional events are correctly classified, then they should congregate in the same location as the bulk of the events. This is indeed the case. As shown in Figures 2 D and S2 B, events detected by MountainSort but not by manual operators have spatial distributions very similar to those of the jointly detected or individually detected events, suggesting that these additional events are likely to be correctly assigned to this cluster.

As shown in Figure 2 C, all six clusters that are identified in two or more manual clusterings are also identified by MountainSort (MountainSort labels 23, 15, 24, 5, 14/16, 28). For example, MountainSort cluster 24 corresponds to clusters 3, 4, and 3 in the first, second, and third manual sortings, respectively, with more than 97% of the manual events also detected by MountainSort. At the same time, MountainSort identifies a large number of events as part of these clusters that are not included in the manual sortings ( Figure 2 C). That is not surprising given that our approach to manual clustering aims to minimize the mistaken inclusion of incorrect events at the expense of missing true events; this approach was chosen because false-positives can lead to incorrect inferences about correlated activity ().

Three different manual operators clustered the dataset using drawn polygons across several different 2D projections (see STAR Methods ). As expected (), while there were clusters that all three operators identified, there was variability across operators, both in which clusters were sufficiently isolated to merit inclusion, as well as in the placement of the boundaries separating clusters, resulting in a range of unmatched events in each cluster ( Figure 2 B). MountainSort was then run; results of the comparison are shown in the confusion matrices of Figure 2 C. A confusion matrix (or contingency table;) summarizes the consistency between two sortings of the same data by showing the pairwise counts ( Figure S2 A). The entry arepresents the number of events that were classified as i in the first sorting and as j in the second. To handle the arbitrary ordering of labels, the rows and columns are permuted to maximize the sum of the diagonal entries (). A purely diagonal matrix corresponds to perfect agreement. For compact visualization, 11 of the 24 automatically sorted MountainSort clusters with the best match to the manual sorting results are shown.

If automated spike sorting is to be useful, it should provide cluster labels with accuracy comparable to or exceeding existing standards. We therefore began with a comparison of our automated approach to manual sorting for a dataset that poses serious spike sorting challenges: tetrode recordings from the CA1 region of rat hippocampus ( Figure 2 A). The pyramidal cells in CA1 are densely packed, and thus large numbers of cells can be detected on the same electrode. Furthermore, extracellular recordings from single CA1 neurons show substantial waveform variability as a result of bursting and other history-dependent effects (). We chose a dataset with some electrode drift (45-min recording session) and some artifact contamination resulting from animal movement. The data were derived from a novel exposure to a spatial environment where we expected to see neurons changing their firing rates substantially over time (). The standard in the field for such datasets is either fully manual clustering, or semi-automatic clustering where the algorithm over-clusters the data leaving the user to manually merge clusters or redraw cluster boundaries ().

(D) Occupancy-normalized spatial firing rate color maps for three clusters corresponding across MountainSort (MS) and manual operators. See also Figure S2 . Track outline is shown in gray. Note that the track has no walls, and we used the animal’s head for position tracking. As the animal often looked out over the edge of the track, many of the positions shown are outside the track outline. Scale bar, 50 cm. Note that color-bar scale varies across clusters.

(C) As in (B), except for purpose of compact visualization, only the MountainSort clusters that correspond to one or more manual clusters are shown.

(B) Pairwise confusion matrices ( Figure S2 A) for the three manual sortings. For each matrix, the numbers in the leftmost column and bottommost row correspond to cluster ID number for the respective manual sorter. Shading corresponds to the proportion of each column-labeled cluster that is classified into each row-labeled cluster. For the top matrix, this is the proportion of events in each manual 2 cluster that match the corresponding manual 1 cluster. Each number within the matrix corresponds to the absolute number of matching events. The final column corresponds to the number of events found in the row-labeled clustering not found in the column-labeled clustering. For the top matrix, this corresponds to the events in manual 1 clusters that are not found in any manual 2 clusters. Similarly, the second row from the bottom corresponds to the number of events found in the manual 2 clusters not found in any manual 1 clusters.

While events corresponding to a single unit almost always form a cluster that is well approximated by a unimodal distribution, there are instances where the underlying distribution is multimodal. From our data we see that this most often occurs when the first spike in a burst has a different shape and higher amplitude than subsequent spikes in the burst. Our sorting algorithm by design will separate these events into two or more clusters. Fortunately, such clusters can be readily identified using event timing information, as the smaller spikes will always occur within a short time after the first spike in the burst. In the case of our hippocampal data, this time window is on the order of 15 ms (). In the STAR Methods , we describe the criteria used to label a cluster as having a “bursting parent” cluster.

Depending on the nature of signal contamination in the dataset, some clusters may consist primarily of high-amplitude artifactual signals such as those that arise from movement, muscle, or other non-neural sources. In this case, the variation among event voltage clips will be large compared with clusters that correspond to neural units. To automatically exclude such clusters we compute cluster SNR, defined as the peak absolute amplitude of the average waveform divided by the peak SD. The latter is defined as the SD of the aligned clips in the cluster, taken at the channel and time sample where this quantity is largest.

The noise overlap and isolation metrics vary between 0 and 1, and in a sense, represent the fraction of points that overlap either with another cluster (isolation metric) or with the noise cluster (noise overlap metric). However, they should not be interpreted as a direct estimate of the misclassification rate but should rather be considered to be predictive of this quantity. Indeed, due to the way they are computed, these values will depend on factors such as the dimensionality of the feature space and the noise properties of the underlying data. Therefore, the annotation thresholds should be chosen to suit the application. With that said, in this study we used the same sorting parameters and annotation thresholds for all analyses.

Noise overlap estimates the fraction of “noise events” in a cluster, i.e., above-threshold events not associated with true firings of this or any of the other clustered units. A large noise overlap implies a high false-positive rate. The procedure first empirically computes the expected waveform shape for noise events that have by chance crossed the detection threshold. It assesses the extent of feature space overlap between the cluster and a set of randomly selected noise clips after correcting for this expected noise waveform shape.

The isolation metric quantifies how well separated (in feature space) the cluster is from other nearby clusters. Clusters that are not well separated from others would be expected to have high false-positive and false-negative rates due to mixing with overlapping clusters. This quantity is calculated in a nonparametric way based on nearest-neighbor classification.

As stated, a central goal of our approach is to minimize the number of modeling assumptions. Currently available metrics (), while useful, make assumptions about an underlying noise model. Here, we introduce two new metrics that make no such assumptions and are specifically suited to spike sorting: isolation and noise overlap. We also use a measure of cluster signal-to-noise (SNR) to exclude clusters contaminated by artifacts and employ a timing criterion to flag bursting situations.

As part of the overall automation, we developed metrics for determining which clusters are sufficiently isolated from noise and other clusters to be included in the final output. In this way only sufficiently isolated clusters are selected for downstream analysis. Our quality assessment categorizes the clusters into three groups: “single unit,” “noise,” and “non-isolated.” In addition, we automatically identify (based on timing information) clear cases where the events of a bursting unit are split into multiple unimodal clusters (see STAR Methods ). Such decisions have traditionally been handled via case-by-case curation by manual operators. In contrast, our strategy only requires the operator to set thresholds on cluster quality metrics, as defined below. These metrics can be adjusted based on the type of analysis that will be done. Furthermore, the metrics can also be exported alongside the event times and labels, allowing analyses sensitive to unit isolation and noise contamination to be repeated with different thresholds or weighting criteria.

As indicated in Figure 1 , sorting is first performed independently on electrode neighborhoods (one neighborhood per electrode) based on the geometric layout of the array. Redundant clusters are then removed during the next phase entitled “consolidation across neighborhoods” ( Figure S1 ). There are a number of advantages of using such a consolidation approach rather than a more error-prone merging procedure (see STAR Methods ). Importantly, this approach allows scaling to very large electrode arrays as the neighborhood sizes will remain roughly constant.

Technical details for ISO-SPLIT are provided in STAR Methods . The algorithm comprises a series of nonparametric statistical tests for unimodality and makes no assumptions about the shapes of clusters aside from having unimodal one-dimensional projections. It therefore involves few adjustable parameters. Essentially one needs only to specify a statistical threshold for rejecting the null hypothesis of unimodality; the clustering output is largely insensitive to this threshold due to the test being repeated at every iteration. The method is also insensitive to the initialization (see STAR Methods ). We use the same set of parameters for all examples in this study. Moreover, the algorithm needs no a priori information about the expected number of clusters nor the expected cluster densities.

At the heart of our sorting pipeline is a new, efficient, nonparametric, density-based clustering algorithm termed ISO-SPLIT ( Figure 1 B), used to sort spike events based on their representations in a low-dimensional feature space. The algorithm makes only two general assumptions about cluster distributions in this space. First, we assume that each cluster arises from a density function that, when projected onto any line, is unimodal, having a single region of highest density. Second, we assume that any two distinct clusters may be separated by a hyperplane, in the neighborhood of which there is a relatively lower density. In our experience, the unimodality hypothesis appears to hold for the large majority of neurons taken from a variety of brain areas—this assumption is also implicit in most neural clustering methods, even those that do not assume a Gaussian shape (). In a minority of cases, we have observed a multimodal cluster distribution, reflecting more complex firing properties of a single neuron. Our strategy for handling this challenging scenario is discussed below.

To satisfy these requirements, we developed the spike sorting pipeline shown in Figure 1 A. This involves preprocessing, sorting on electrode neighborhoods, consolidation across those neighborhoods, fitting, derivation of cluster metrics, and automated annotation based on those metrics. The last step replaces manual curation (deciding on which clusters to accept) but importantly does not involve corrections to clustering as with other approaches. Details of all processing steps are provided in STAR Methods . Here, we give an overview of the most critical steps.

(B) Illustration of the final six iterations of the ISO-SPLIT clustering algorithm for a synthetically generated set of points. At each iteration, two clusters are compared using a one-dimensional projection of the union of the two clusters (shown in the histograms at each iteration). The ISO-CUT procedure is applied to the projected data to determine whether the two clusters should be merged (single color in the histogram) or whether the points should be redistributed according to an optimal cut point (two colors in the histogram).

(A) Flow diagram. After preprocessing, sorting is performed on individual electrode neighborhoods. Clusters are then consolidated across neighborhoods (see also Figure S1 ). Clusters are either accepted or rejected in the automatic annotation phase based on the computed cluster metrics.

A central goal of our overall spike sorting strategy has been to develop a single algorithm that can be applied to data from different brain regions without the need for brain-region specific models or parameters. This requires that the procedure be insensitive to differences in dataset properties such as non-Gaussian cluster distributions, electrode densities, and firing rates. Further, selecting parameters such as thresholds and regularization constants is time consuming and can call into question the objectivity of results. Therefore, our guiding philosophy is to minimize both the number of user-defined parameters and the number of modeling assumptions, while maintaining high spike sorting accuracy and efficiency.

Discussion

The MountainSort software suite provides fully automated spike sorting from electrode arrays of varied sizes and geometries in multiple brain regions, with accuracies comparable to or exceeding existing standards, and computational times much faster than acquisition times on non-specialized hardware. Importantly, MountainSort is also a fully functional package providing an intuitive graphical user interface with the ability to visualize subsets of selected clusters and annotation tools for exporting the data (see the online source code repository for documentation and tutorials). This stands in contrast to many previously developed clustering algorithms, where it would be difficult for users to test out algorithms on their own data. Thus, it is relatively straightforward to incorporate MountainSort into a laboratory’s data processing pipeline.

To our knowledge, this is the first time that a fully automated spike sorting approach has been demonstrated to have comparable error rates to existing manual and semi-manual standards. Moreover, MountainSort sorted a dataset from the hippocampus better than humans, despite the presence of both complex spike bursts and noise events that arise from muscle and movement-related artifacts. Indeed, MountainSort identified many more putative single units from a hippocampal CA1 tetrode than manual sorters (24 versus 5–10). We note here that there are principled reasons to expect that MountainSort would outperform manual sorting. Typical manual sorting, such as in this study, is done in a static feature space of four to 16 dimensions (such as amplitude on channel 1, peak-to-trough ratio on channel 3, etc.), computed from filtered waveforms. Moreover, manual sorting usually involves drawing polygons in two dimensional projections. In contrast, MountainSort operates in a ten-dimensional space corresponding to the first ten PCs, computed from spatially whitened waveforms (effectively it is more than ten dimensions when considering the branch method, see STAR Methods ). Moreover, each of the ten dimensions does not correspond to only one channel at a time, as with some other clustering approaches, but instead each is a PC across all of the channels of the neighborhood. Further aiding in separation is the recomputation of the PCs when doing cluster comparisons. Due to these differences, it is highly likely that MountainSort can more reliably separate clusters than could a human operator, the current gold standard for tetrode spike sorting.

Comparison of MountainSort to two other recently released packages, KiloSort and Spyking Circus, suggests that MountainSort also performs better than these other packages. When applied to the hippocampal dataset, all three algorithms identified a subset of well-isolated units, but MountainSort found units that were not identified by Spyking Circus, and both KiloSort and Spyking Circus merged units with distinct PCA features and spatial firing patterns that MountainSort separated. Comparisons on simulated data revealed the same trends: MountainSort consistently identified units with greater accuracy than the other algorithms.

These comparisons highlight two other features of MountainSort. First, these comparisons were based on using the default parameter values for all three algorithms. While it is possible that different parameters could have improved the performance of KiloSort and Spyking Circus, one of our major goals was to develop an algorithm with essentially no free parameters, and our success in sorting data from multiple brain regions as well as multiple simulated dataset using the same parameters illustrates the power of this approach. Second, while these comparisons used the fully automated output from MountainSort, manual curation remained necessary for both KiloSort and Spyking Circus. It is important to note that these algorithms were not designed to be fully automatic, so the need for manual curation is not surprising, but it remains the case that, at the time of this publication, only MountainSort is successful in a fully automated mode.

Harris et al., 2001 Harris K.D.

Hirase H.

Leinekugel X.

Henze D.A.

Buzsáki G. Temporal interaction between single spikes and complex spike bursts in hippocampal pyramidal cells. Schmitzer-Torbert et al., 2005 Schmitzer-Torbert N.

Jackson J.

Henze D.

Harris K.

Redish A.D. Quantitative measures of cluster quality for use in extracellular recordings. Schmitzer-Torbert and Redish, 2004 Schmitzer-Torbert N.

Redish A.D. Neuronal activity in the rodent dorsal striatum in sequential navigation: Separation of spatial and reward responses on the multiple T task. Hill et al., 2011 Hill D.N.

Mehta S.B.

Kleinfeld D. Quality metrics to accompany spike sorting of extracellular signals. Automation provides clear benefits for reproducibility among sortings of the same data and comparability across datasets. This is accomplished using novel isolation and noise overlap metrics. By contrast, commonly used cluster metrics, such as isolation distance () and L-ratio (), utilize Mahalanobis distance calculations, which are dependent upon the dimensionality of the selected feature space, making it difficult to compare cluster quality between datasets. Furthermore, the commonly used L-ratio metric, among others (), makes the assumption of a multivariate normal cluster distributions, which can be problematic because this is often not valid as in cases of burst firing or electrode drift. By taking into account dataset-specific noise and nearest neighbor distances, our noise overlap and isolation metrics provide a nonparametric means to compare cluster quality across datasets.

Harris et al., 2016 Harris K.D.

Quiroga R.Q.

Freeman J.

Smith S.L. Improving data quality in neuronal population recordings. Deng et al., 2015 Deng X.

Liu D.F.

Kay K.

Frank L.M.

Eden U.T. Clusterless decoding of position from multiunit activity using a marked point process filter. Kloosterman et al., 2014 Kloosterman F.

Layton S.P.

Chen Z.

Wilson M.A. Bayesian decoding using unsorted spikes in the rat hippocampus. This ability to compare quality across datasets allow these cluster quality metrics, combined with the annotation and provenance strategy built in to MountainSort, to be propagated to downstream analyses. Maintaining access to all clusters and their cluster quality metrics, including those that likely correspond to multiunit spiking, has a number of important advantages. First, thresholds could be established by examining how known properties of a given set of units (such as the number and size of hippocampal place fields) vary as a function of cluster quality (). Second, simple changes in inclusion thresholds can be used for direct assessment of whether cluster quality influences a specific finding. Cluster quality could also be used to weight the contribution of each unit to a given analysis, thereby ensuring that the best isolated clusters have a proportionally greater influence on the findings. Third, there are cases where including multiunit spiking improves the quality of the results, as is the case for clusterless decoding of animal position from unsorted hippocampal spiking activity (). Inclusion of all clusters regardless of quality, greatly simplifies the application of these techniques to the data.

Harris et al., 2001 Harris K.D.

Hirase H.

Leinekugel X.

Henze D.A.

Buzsáki G. Temporal interaction between single spikes and complex spike bursts in hippocampal pyramidal cells. Quirk et al., 2001 Quirk M.C.

Blum K.I.

Wilson M.A. Experience-dependent changes in extracellular spike amplitude may reflect regulation of dendritic action potential back-propagation in rat hippocampal pyramidal cells. Quirk and Wilson, 1999 Quirk M.C.

Wilson M.A. Interaction between spike waveform classification and temporal sequence detection. While MountainSort’s assumption of unimodality works well even in the case of the hippocampal cells (with non-Gaussian cluster distributions) we applied it to, there are still cases where bursting results in multiple unimodal clusters that should be merged. MountainSort therefore involves a post-processing step for automatically detecting clusters that are bursting pairs. The software also provides annotation tools to allow the user to indicate that he or she believes that the clusters should be merged, but importantly the original automatic clustering is also preserved in the sorting output, allowing for identification of all user decisions. This occasional need for manual merging could be further mitigated by using additional post-clustering analyses beyond our current relatively simple automated bursting pair criteria ().

Pachitariu et al., 2016 Pachitariu M.

Steinmetz N.

Kadir S.

Carandini M.

Harris K. Kilosort: realtime spike-sorting for extracellular electrophysiology with hundreds of channels. Yger et al., 2016 Yger P.

Spampinato G.

Esposito E.

Lefebvre B.

Deny S.

Gardella C.

Stimberg M.

Jetter F.

Zeck G.

Picaud S.

et al. Fast and accurate spike sorting in vitro and in vivo for up to thousands of electrodes. Barnett et al., 2016 Barnett A.H.

Magland J.F.

Greengard L.F. Validation of neural spike sorting algorithms without ground-truth information. MountainSort also runs quickly, even on large datasets. Clustering is always performed on local electrode neighborhoods, so clustering time theoretically scales linearly with the number of channels. Importantly, the clustering stage, along with most other stages of the processing, is implemented as parallel computations that can be run simultaneously on multiple central processing unit (CPU) cores. As a result, the total time required for sorting and metric computation was much shorter than the recording times for the same data on standard computer hardware for both tetrodes (70× real time) and a 16-channel polymer probe (30× real time). These speed-ups are in line with other approaches () where strategies such as hardware acceleration have brought computational times down significantly. However, these alternative approaches lack an automated curation component. As computers become increasingly powerful, the time spent by manual operators remains the largest bottleneck to spike sorting. With the elimination of manual curation, total sorting time becomes faster than acquisition time, and we can explore further directions that would otherwise be infeasible. One is the ability to run the clustering algorithm multiple times, allowing for cross-validation in the absence of ground truth (). Such stability metrics ensure that clustering results are not overly sensitive to realistic noise perturbations.

Bar-Hillel et al., 2006 Bar-Hillel A.

Spiro A.

Stark E. Spike sorting: Bayesian clustering of non-stationary data. Calabrese and Paninski, 2011 Calabrese A.

Paninski L. Kalman filter mixture model for spike sorting of non-stationary data. Chestek et al., 2007 Chestek C.A.

Batista A.P.

Santhanam G.

Yu B.M.

Afshar A.

Cunningham J.P.

Gilja V.

Ryu S.I.

Churchland M.M.

Shenoy K.V. Single-neuron stability during repeated reaching in macaque premotor cortex. Emondi et al., 2004 Emondi A.A.

Rebrik S.P.

Kurgansky A.V.

Miller K.D. Tracking neurons recorded from tetrodes across time. Dhawale et al., 2015 Dhawale A.K.

Poddarr R.

Kopelowitz E.

Normand V.

Wolff S.

Olveczky B. Automated long-term recording and analysis of neural activity in behaving animals. MountainSort naturally handles some level of drift, where there are systematic changes in waveforms thought to result from movement of electrodes relative to the tissue or perhaps due to glial cells migrating along electrode shanks (). We expect that MountainSort will correctly identify cluster boundaries as long as the resultant time-collapsed cluster is unimodal and does not overlap with another cluster. Errors are expected when these time-collapsed clusters overlap with one another, however. That said, because MountainSort can be run on overlapping sections of data, it is conceptually straightforward to include augmentations that link clusters across time slices using segmentation fusion based algorithms (). Cases where clusters drift in and out of noise or other clusters are more problematic, both for MountainSort and for all other current approaches. Here, the new cluster metrics we have developed can be helpful, as clusters that drift into other clusters or the noise will tend to have poor isolation and noise overlap scores, allowing identification of times the cluster is sufficiently uncontaminated.

Taken together, our findings demonstrate that MountainSort is a sensible and efficient automated solution to the spike sorting problem. With a combination of minimal assumptions, fast run times, and a powerful graphical user interface, MountainSort can greatly shorten the time required for spike sorting while increasing the reproducibility and transparency of the process. With one electrode channel neighborhood allocated per available logical core, MountainSort is naturally extensible to distributed computing and large electrode arrays. Our approach is compatible with future refinements and extensions for resolving overlapping spikes and tracking units in the presence of drift.