Although the classification algorithm presupposes the availability of precisely delineated brain regions, our findings suggest that patterns of morphological variation across brain surfaces, extracted from MRI scans alone, can successfully diagnose the presence of chronic neuropsychiatric disorders. Extensions of these methods are likely to provide biomarkers that will aid in identifying biological subtypes of those disorders, predicting disease course, and individualizing treatments for a wide range of neuropsychiatric illnesses.

In MRI datasets from persons with Attention-Deficit/Hyperactivity Disorder, Schizophrenia, Tourette Syndrome, Bipolar Disorder, or persons at high or low familial risk for Major Depressive Disorder, our method discriminated with high specificity and nearly perfect sensitivity the brains of persons who had one specific neuropsychiatric disorder from the brains of healthy participants and the brains of persons who had a different neuropsychiatric disorder.

We have developed an automated method to diagnose individuals as having one of various neuropsychiatric illnesses using only anatomical MRI scans. The method employs a semi-supervised learning algorithm that discovers natural groupings of brains based on the spatial patterns of variation in the morphology of the cerebral cortex and other brain regions. We used split-half and leave-one-out cross-validation analyses in large MRI datasets to assess the reproducibility and diagnostic accuracy of those groupings.

Diagnoses using imaging-based measures alone offer the hope of improving the accuracy of clinical diagnosis, thereby reducing the costs associated with incorrect treatments. Previous attempts to use brain imaging for diagnosis, however, have had only limited success in diagnosing patients who are independent of the samples used to derive the diagnostic algorithms. We aimed to develop a classification algorithm that can accurately diagnose chronic, well-characterized neuropsychiatric illness in single individuals, given the availability of sufficiently precise delineations of brain regions across several neural systems in anatomical MR images of the brain.

One alternative to VBM, termed “surface morphometry”, may be more accurate in identifying the point-to-point correspondences that are required to define the feature space upon which classification algorithms operate. Surface morphometry first delineates the surface of each brain region precisely and independently from other brain regions, and then it spatially registers independently points on the surface of each brain region to the corresponding points on the surface of the same region in the template brain. One of its major advantages is that point correspondences on brain surfaces are independent of the morphological features of points remote from those surfaces. In addition, its brain measures are defined on fewer voxels compared to VBM-like measures that are defined on all points throughout the entire three-dimensional volume of the brain, thereby dramatically reducing the dimensionality of the feature space on which the classification algorithm is developed and applied. Despite these limitations, the VBM-based algorithms afford the promise of providing completely automated techniques for diagnosing neuropsychiatric disorders, making them potentially attractive for applications within clinical settings. Therefore, comparing the performance of the VBM-based methods with ours and others is essential for future applications of imaging-based diagnostic algorithms. In the future, therefore, we will rigorously compare the performance of our algorithm with that of VBM-based classifications.

Extant methods for machine-based classification of individual brains in imaging datasets generally [8] – [19] use quantitative imaging data and known clinical diagnoses to learn optimal decision boundaries in the feature space of the imaging dataset that best separate individuals who have differing illnesses. The quantitative features that enter the training dataset (e.g., the Jacobian matrix of the deformation field) have usually been extracted from images using voxel based morphometry [20] , [21] (VBM), a technique that provides automated, quantitative, and fine-grained morphological information about the brain. To ensure that the imaging measures are smoothly varying across the brain, the estimated deformation fields are constrained to be locally smooth. These smoothness constraints, however, create difficulties for the imaging measures when trying to classify individuals accurately into diagnostic categories. First, they assume that a voxel in template space (a brain from a healthy individual to which all other brains in the sample are point-wise matched) represents the corresponding anatomical region in the brains across all other individuals in the dataset. This assumption is unlikely to be true, even when the brains of differing individuals have been spatially normalized to the template using high-dimensional deformations, because the smoothness constraints employed when warping a brain into template space, and the variability in anatomy across individual brains, limit the ability of the normalization procedures to match precisely the corresponding anatomical regions across individuals. Second, the smoothness constraint causes deformations in a brain region to be influenced by deformations at all neighboring brain regions, and therefore brain measures derived from these deformation fields do not provide accurate measures of local changes in the brain region of interest. Third, brain measures derived using VBM are defined at each voxel in the brain, and therefore the dimensionality of the feature space is very high, requiring either implementation of techniques that can reduce the dimensionality substantially or else datasets from a much larger number of participants, in order to adequately train and validate the classification algorithms. Together, these limitations of techniques for machine-based classification introduce imprecision when identifying the point-to-point correspondences across brains that are required for accurate classification, as they allow measures from any point in the brain that is warped to a template space to be influenced by the variable features of brain regions at a distance remote from that point. Although more recently developed algorithms for high-order nonlinear warpings [22] significantly reduce these inaccuracies, brain features close to but outside of the regions of interest likely still influence the smoothed deformation fields.

The fundamental assumption that we make in this paper, and the assumption that constitutes the foundation of our algorithm for diagnostic classification, is that identifying patterns of variation in morphological features that extend over many sets of contiguous voxels and across numerous brain subregions will capture individual disturbances in the morphological features of neural circuits that are unique to each neuropsychiatric disorder being classified. We also believe that rather than using variation at individuals voxels for diagnostic classification, basing the classifications on these patterns of regional variation will reduce the adverse effects of noise on the stability of the diagnostic algorithm, particularly when classifying brains that are independent of the datasets used to generate the algorithm. Herein we demonstrate that a classification using anatomical MR images of the brain if sufficiently precise delineations of brain regions are available across sufficiently distributed neural systems in a sufficiently large number of individuals drawn from healthy and patient populations.

These measures of local variation in morphological features of the brain have been used to build algorithms that classify the brains of participants according to diagnostic group. For several reasons, however, those diagnostic algorithms have met with limited success when attempting to classify brains that are independent of the imaging dataset used to generate the diagnostic algorithm. First, the morphological measures at each voxel of the brain generally have been assumed to be statistically independent variables when building the classifiers, even though they are not independent of one another. Second, measures at each voxel of the brain generally have been treated as equally informative when building the diagnostic classifiers, when in fact the imaging measures at some voxels are much more diagnostically informative than at other voxels. More important, however, is that diagnostic information is more likely to be represented by the within-subject spatial variation in morphological measures across voxels and across brain subregions than by the between-subject variation within voxels. For it is the spatial variation across voxels and across brain subregions that is most likely to represent the circuit-based morphological abnormalities that produce and uniquely define a given neuropsychiatric illness. The value of identifying spatial patterns of variation in brain measures across voxels and across brain subregions is much like the value of identifying the spatial patterns for variation in the epidermal ridges of a fingerprint, which define more efficiently and more accurately the identity of an individual than do independent, point-wise measures of displacement of the dermal surface on the tip of a finger. Third, voxel-wise measures of morphological variation yields a vastly larger feature space on which the classification algorithm must operate compared with the feature space for the pattern of spatial variation across voxels, either compromising the validity and reproducibility of the algorithm for a given number of observations (participants) or requiring a much larger number of participants to yield comparable validity and reproducibility of the algorithm.

Clinicians and researchers have long sought to use brain imaging measures as an aid in clinical diagnosis. In prior attempts to use magnetic resonance imaging (MRI) in the diagnosis of neuropsychiatric illnesses, conventional anatomical measures of putative pathological involvement, such as the overall volume of a brain region or a combination of brain regions, have not proven particularly useful, probably because brain regions that can be identified on MRI scans are anatomically and functionally heterogeneous, which means that for any given brain region, opposing measures of pathological involvement in its various subregions (such as volume loss in one subregion, compensatory hypertrophy in another, and normal volumes in yet another), when combined into an overall volume are dilute and highly variable, producing substantial overlap between diagnostic groups in the distributions of overall volumes. The overlap in distributions, in turn, yields poor sensitivity and specificity when trying to use those measures for clinical diagnosis. [1] However, recent methods in image processing permit measurements of local variation in the morphological features of brain subregions that are thought to be more anatomically and functionally homogeneous than are conventional overall volumes, [1] and that are associated significantly and more uniquely with various neuropsychiatric disorders than are overall volumes of the brain or a brain subregion. [2] – [7]

Methods

We present a method to identify valid, naturalistic groupings of the brains based on the spatial patterns of local variation in the morphological features across the surfaces of numerous cortical and subcortical brain regions, with the aim of identifying neural circuit-based disturbances that are unique to specific neuropsychiatric disorders. We capture the patterns of spatial variation in these fine-grained, local morphological features across the surfaces of numerous brain regions in an attempt to represent neural circuit-based patterns of variation in local morphology. The patterns of spatial variations are analyzed using machine-based learning techniques to identify natural groupings of brains. We show that these naturalistic groupings map with high sensitivity and specificity to specific neuropsychiatric disorders. We use computer-generated datasets to validate this general approach, and use human MRI datasets comprising many individuals across a variety of neuropsychiatric disorders to validate and reproduce the specific diagnostic algorithms that classify brains with high sensitivity and specificity as belonging to persons of one specific diagnostic group rather than another.

Isolating the Brain and Defining ROIs We isolated the brain from non-brain tissue and defined various brain regions using valid and highly reliable procedures to preclude any rater bias in region definitions. [23]–[25] We first removed inhomogeneities in image intensity [26] and then rotated the brains into a standard head orientation. Next, to ensure the absence of rater bias, we flipped images randomly in the left-right (ear-to-ear) direction prior to region definition and reversed the flips following region definition. We then isolated the brain from non-brain tissue [27] together with manual editing. Therefore, the imaging data were not influenced in any way by knowledge of illness labels prior to generation and testing of the classification algorithm. We defined cortical gray matter using a combination of automated thresholding of gray and white matter and manual editing in orthogonal views. The boundary of each brain region was manually delineated by an expert neuroanatomist using detailed, validated, and well-documented procedures published elsewhere, providing surface delineations that were independent of the influences of other regions [2], [3], [25], [28], [29]. The manual editing needed to isolate the brain from non-brain tissue was 8 hours, to define the cortical mantle was 6 hours, to delineate the amygdala and hippocampus was 4 hours, and to define the basal ganglia was 6 hours of rater time. Therefore, up to 24 hours of a trained, bachelor's level rater was needed to provide initial definitions for all the brain regions used in these classification algorithms. An additional 8 hours was required to check all these definitions in their entirety and to make the necessary revisions by another trained rater before the definition was considered final. Although manual delineations were onerous and time-consuming for a trained expert, they provided valid, precise, and highly reliable region definitions that we believe was important in identifying individual variability in circuit-based morphological characteristics in each participant's brain that was important for the success of our automated classification algorithm. The various brain regions that we used to train and validate the classification algorithm were defined by more than thirty trained raters over a period of fifteen years. During this 15-year period, several NIMH grants supported the costs for image acquisition and post-processing, with each grant providing support for a five-year period. Trained groups of raters delineated individual brain regions concurrently and while blind to clinical diagnosis and other demographic characteristics for all participants, and region definitions were interleaved randomly across diagnoses over time in our dataset (Fig. S1). Therefore, any potential rater bias is unlikely to have confounded the gold-standard clinical diagnoses used to train and validate the automated classification algorithms. Moreover, a senior, more experience rater verified and, when necessary, corrected the definitions before they were made available for further processing, further reducing the likelihood of rater bias. In addition, we reestablished the reliability of each rater every four months in the same set of 10 standardized brains to ensure the absence of drift in delineation of regions over time or raters. When a rater left the laboratory, we trained a new rater who continued to define brain regions on the remaining brains in the dataset. Intraclass correlation coefficients calculated from 10 brains defined by 2 or more raters were greater than (1) 0.91 for the hippocampus [2], (2) 0.89 for the amygdala [2], (3) 0.95 for the caudate and putamen [29], (4) 0.90 for the globus pallidus [29], (5) 0.99 for the cerebral hemispheres [29], (6) 0.95 for the thalamus [30], and (7) 0.98 for cortical thickness [3].

Quantifying Local Variation in Surface Morphology We used previously described methods [1]–[3], [28], [31] to quantify precisely the local variations in morphological features across the surfaces of all of the brain regions that we have defined. These methods permit a much finer-grained subdivision of cerebral regions than is possible using more conventional measures of overall volume, thereby providing much greater power to detect localized abnormalities within the region. We first coregistered the cerebrum to a template brain using a similarity transformation that maximized mutual information across the images. [32] We then used a rigid body transformation to coregister individual brain regions to the corresponding template region. Next we warped each brain region nonlinearly to the corresponding template region using a high-dimensional warping algorithm. [33] Each brain region was thus warped to the exact same size and shape as the template region, allowing us to identify points on the surface of each region that corresponded precisely with those of the template region. We then calculated across the entire surface of each region the Euclidean distance of each point from the corresponding point on the template. These distances were positive in sign for protrusions and negative for indentations relative to the template surface. For any given group of participants, this set of signed Euclidean distances constituted a smooth random field on the surface of the template region that quantified local variation in surface morphological features of that region for each participant. A single representative brain was selected as a template rather than an averaged brain because tissue interfaces, such as CSF gray matter or gray-white matter interfaces, are well defined in a single brain. In contrast, in an average brain these interfaces are blurred, thereby increasing registration errors that are subtle but important when distinguishing subtle effects across populations. In addition, precise surface morphometry requires smooth surface devoid of topological defects, which can only be ensured by using a single brain as a template. We measured the thickness of the cortical mantle in each brain using a 3-dimensional morphological operator to distance-transform the surface of the white matter to the surface of the cortex. [34], [35] This operation calculated cortical thickness as the smallest distance of each point on the external cortical surface from the outermost surface of the white matter. Because these thicknesses were measured in template space, their values inherently accounted for generalized scaling effects in the brain.

Conformal Mapping of Local Variations in Brain Measures We used conformal mapping to prepare the surface measures (Euclidean distances and cortical thickness) for spherical wavelet analyses. The purpose of conformal mapping was to transfer the surface measures from the template region onto the surface of a unit sphere, while preserving the angles between vectors on the template and spherical surfaces. Because conformal mapping preserved angles, the shape of a small region on the brain surface was preserved when it was mapped onto the unit sphere. We first used a Marching Cubes algorithm to extract the surface of the template region as a triangular mesh. [36] The extracted surface was embedded into the three dimensional space and was assumed to be of genus 0, i.e. the surface was topologically equivalent to a sphere and does not intersect itself. By removing a single point p from the surface and a point p′ from the sphere , the conformal mapping was estimated by solving the partial differential equation where is the Laplace-Beltrami operator, is the Dirac delta function at point p, and u and v are the conformal coordinates in the neighborhood of the point p. This equation was solved using a method for finite element analysis by first mapping the coordinates of the template surface to the coordinates of a plane while preserving the angles between all measures on that surface. [37] Next, a stereographic projection mapped the coordinates of the plane to the coordinates of the sphere. We then used those coordinates to transfer the brain measures from the template surface to the corresponding locations on the sphere's surface.

Spherical Wavelet Representation Morphological features are typically measured at more than 10,000 voxels across the surface of a template region. Although a classification algorithm can in principle diagnose an individual in a high-dimensional feature space, an imaging dataset containing many thousands of brains would be required to generate valid classification boundaries that have sufficient reproducibility and generalizability to be potentially useful in a clinical setting. Because even the largest imaging datasets contain images from only a few hundred participants, however, the dimensionality of the feature space must be reduced to generate an algorithm that has valid classification boundaries. Therefore, we elected to use morphological measures defining the two-dimensional manifold of regional surfaces rather than morphological measures for the entire three-dimensional volume of the corresponding regions. Moreover, our interest was not in classifying individual points on the surface, but instead in classifying the spatial patterns of variation in those measures across the surface of each region. Therefore, before applying clustering (i.e., classification) techniques to the two-dimensional surface dataset, we first captured that spatial pattern in variation of our measures in a manner that also reduced further the dimensionality of the imaging dataset. We accomplished this by applying a spherical wavelet transform to surface measures that had been conformally mapped onto a unit sphere. This wavelet transform generated scaling coefficients that encoded the spatial variation in our measures on the unit sphere at varying degrees of spatial resolution. Because wavelets have local support – i.e., they equal zero outside a small region – in both the spatial and frequency domains, they have been used to represent functions at multiple levels of spatial detail. A wavelet transform, through dyadic translations and dilations of a mother wavelet and scaling function, generates coefficients for a function such that only a small subset of the coefficients can represent the function with a high degree of precision, thereby permitting compression and efficient processing of imaging data. The wavelet function and the scaling , at a resolution and an index (K(j) is a set of integers at resolution indexing the position of the wavelet and scaling function), is a linear combination of the scaling functions at the higher resolution . Therefore, the wavelet and scaling functions are self-similar at differing spatial resolutions. If is the space of all scalar functions having finite energy over the sphere , a multi-resolution analysis generates a sequence of closed subspaces , such that: (1) , that is , (2) is dense in , and (3) the set of scaling functions is a Riesz basis of . The wavelet functions form the bases of the difference space between two successive levels of representation, that is . In addition to local support, the wavelet functions have vanishing moments: If wavelets have N vanishing moments, then there exist N independent polynomials such that , for all , , where is an index set. Historically, wavelet transforms were initially limited to infinite spaces [38], but they subsequently were extended to finite spaces. [39] More recently, they have been extended to analyze scalar functions defined on a sphere using lifting schemes. [40], [41] A lifting scheme builds biorthogonal wavelets that are smoother and that have more vanishing moments than do simple scaling and wavelet functions. A forward analysis of a spherical wavelet transform begins computing coefficients for a function at the highest spatial resolution, continuing until it reaches the lowest possible spatial resolution. At each resolution, unlifted wavelet coefficients are computed, and then these coefficients are lifted to compute the wavelet coefficients, , and the scaling coefficients, . Synthesis, or the inverse transform, in contrast, begins by computing the coefficients at the lowest resolution and ends at the function with the highest resolution. We used two differing methods of wavelet analyses to compute scaling coefficients for our surface measures: (1) the linear lifted method, which used information from the two nearest neighbors at a location on the mesh of the unit sphere to interpolate the transformation, and (2) the lifted butterfly method, which used information from all eight neighbors on the mesh of the sphere to compute the scaling coefficients (Fig. 1). PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Figure 1. Scaling coefficients at decreasing spatial resolutions. The numbers of vertices in the triangulated mesh at each level of resolution are indicated at the bottom of the figure. The meshes with 12 and 162 vertices correspond to resolutions 0 and 2, respectively, of the spherical wavelet transformation. Top Row: Approximating a unit sphere at decreasing spatial resolutions. Middle and Bottom Row: Examples of scaling coefficients at decreasing resolutions for local variations in the surface morphological features of the right hippocampus in a representative healthy participant (NC) and a person with Schizophrenia (SZ). The scaling coefficients are color encoded and displayed at the vertices of sphere at the various resolutions. Protrusions with respect to a template surface are encoded in Red and Yellow, and indentations in the surface are encoded in Violet and Blue. Green indicates no difference in the surface. The scaling coefficients were computed by first using conformal mapping to map local variations in surface morphological features onto a unit sphere and then applying the lifted interpolate transformation to the mapped variations. At the lowest resolution, i.e. Resolution 0 of the wavelet transformation, scaling coefficients at the 12 vertices of the icosahedron very coarsely approximate the high-resolution variations in local morphological features. Using scaling coefficients at low spatial resolutions for classification greatly reduces the dimensionality of the feature space. https://doi.org/10.1371/journal.pone.0050698.g001

Controlling for Nuisance Variables Brain measures may differ because of differing ages, gender, or whole brain volume (WBV) across individuals, in addition to differences caused by pathological processes in neuropsychiatric illnesses. To ensure that our method generated naturalistic groupings that represent only the underlying pathological processes and not age and sex effects, we (1) matched our groups on age and sex, and (2) computed brain measures that were corrected for age, gender, and WBV. We had also assessed the effects of higher-order age terms but the linear effects alone were significant, and therefore, we only used linear terms to control for the effects of age. Because brain measures were computed on brains that were normalized into the coordinate space of a template brain of a healthy individual, they were inherently corrected for differing WBV across individuals. These brain measures were further corrected for age and gender effects by applying linear regression with age and gender as the independent variables and brain measure as the dependent variable.

Machine Learning We employed a semi-supervised method for machine learning to identify natural groupings of people in the spatial variation of fine-grained, local morphological features across their brains. Machine-based learning and pattern classification seek to construct algorithms that automatically learn decision rules for classification from experimental datasets, and then it applies the learned rules to classify individuals in other datasets. [42] These methods generally belong to either supervised or unsupervised classes of learning. Consider the pairs of data points , where are -dimensional feature vectors and are scalar-valued labels. For our purposes, the vectors are brain measures and the labels are clinical diagnoses. Supervised learning uses a training sample to learn the mapping between and using a parametric or nonparametric function, . This function encodes a decision rule, or boundary, that optimally separates the feature vectors using the labels, . If the labels are missing, then methods for unsupervised learning (for example, hierarchical clustering procedures) can be used to discover natural groupings within the data. Our method is semi-supervised because we first applied leave-one-out cross validation to select a set of features that differed significantly between groups of individuals who were already clinically diagnosed, and then we applied hierarchical clustering to the feature vectors to discover naturalistic groupings of individuals in the dataset. We then assessed the validity of these naturalistic groupings by applying leave-one-out and split-half cross-validation procedures to datasets that were independent of those that have been mined to generate the groupings. Discovering Natural Groupings Using Hierarchical Clustering. Hierarchical clustering is a powerful, unsupervised tool for discovering natural structure within a dataset, especially when the groupings of data are unknown a priori. Groupings of brains that are generated from clinical diagnoses alone could frequently misclassify brains if their features do not map tightly to a clinical diagnosis, especially when disturbances affect neural pathways that are common across disorders. To minimize these classification errors, we used hierarchical clustering procedures to discover naturalistic groupings of brain features. We presumed that those common features supported similar computational and behavioral functions, and that the brains that shared them would map more accurately to biologically based groupings of illnesses than would groups of brains classified on the basis of morphological features associated with clinical symptoms as identified by prior classification schemes. A clinical diagnostic label was assigned by a simple majority rule to each of the two naturalistic groupings, so that the known clinical diagnosis affecting the majority of participants in each group provided the diagnostic labels for those groupings. We then used those groupings and labels to classify new brains from a separate dataset. Thus, although the features that generated the classification algorithm were defined by the clinical diagnoses, hierarchical clustering generates groupings based solely on the morphological features and therefore the brains from participants were clustered in groups that shared similar morphological features. One method for hierarchical clustering generated a sequence of clusters that partitions an imaging dataset from participants, using a measure of dissimilarity between any two groups of feature vectors. Starting with clusters at level 1, with each cluster containing data from exactly one participant, the dataset was partitioned into clusters at level 2, clusters at level 3, and so on, such that only one cluster was present at level . In this sequence of clusterings, any two feature vectors x and x′ were grouped into one cluster at some level, and they remained together at all higher levels. Furthermore, dissimilarity between any two clusters increases with increasing levels, and therefore the groupings of feature vectors that were generated by hierarchical clustering were visualized at various levels using a nonintersecting, binary tree called dendrogram. A dendrogram was drawn to scale to show the dissimilarity between the groups of feature vectors and to help identify natural groupings within the dataset. An unusually large difference in the similarity values across levels may indicate a natural clustering at the lower level. If the similarity values for various levels were evenly distributed across the range of possible values, however, then no particular clustering was more natural than any other. The use of hierarchical clustering to generate a correct natural grouping of participants depends heavily on the similarity measure used to compare feature vectors, as well as on the method that groups these features using those similarity measures. We defined the similarity between features (sets of scaling factors) as either the Euclidean distance for the feature vectors encoding local variations in the surface morphological measures of a single brain region, or the standardized Euclidean distance for the feature vectors encoding local variations in the surfaces from multiple brain regions. We then used either the average linkage or Ward's linkage on the similarity measures to group the feature vectors. The average linkage calculated the average of the distances between all pairs of feature vectors in the two clusters and then merged as level k the two groups that had the smallest distance between averages. In other words, the average distance between clusters and was computed as Ward's linkage, in contrast, calculated the distance between two clusters and as the increase in the error sum of squares (ESS) of the new cluster , obtained as the union of the individual clusters. For a cluster , the was computed as and therefore Ward's distance was computed as Ward's linkage therefore generated cohesive groupings of participants by minimizing the increase in within-group variance at each level. Hierarchical clustering as we implemented it may converge to local rather than global optima. Starting with n sets of features from n brains, hierarchical clustering iteratively combines the features from two sets of brain features to generate a new set of features, thereby reducing the number of sets by one in each step of the iteration. Also at each step, among all possible combinations of two sets, only those two are combined such that the new set optimizes a pre-specified criterion. In our implementation of hierarchical clustering, we used for this pre-specified criterion Ward's distance, which computes within-set variance of brain features. Therefore, in our implementation of hierarchical clustering, two sets of features were combined such that the within-set variance (i.e., Ward's distance) of the new set was smallest among all possible other combinations of two feature sets. This strategy is a greedy one, in that the two sets to be combined at each step are selected from all sets available in that step, and therefore the two groups generated in the last step may not have the least possible within-group variance among all possible two-group combinations of n brains. Higher variances in the generated groupings would typically indicate that some brains are assigned to a group erroneously. Such groupings may limit the effectiveness of classification because use of these groupings would possibly create an erroneous assignment of a new participant. Although this is a risk using our implementation of hierarchical clustering, the superb performance and reliability of our classification algorithm suggests that local optima and their attendant errors in classification were very few.

Independent Validation We validated the naturalistic groupings using both leave-one-out (LOO) cross-validation analyses and 10 independent split-half replication analyses. In addition, we computed misclassification rates using LOO cross-validation applied to the random halves of data generated in the split-half analyses. In each of the 10 split-half analyses, we (a) partitioned the imaging datasets randomly into two halves, the training set and the test set, (b) generated the classification algorithm using the training set, and then (c) evaluated the performance of the classification algorithm using the second, independent test half of the dataset. Automated, computer-based procedures assigned a brain in the second test dataset to one of the two putative diagnostic groups identified in the training dataset by assessing which of the two groups had morphological features most similar to its own. To assign the brain to one of the two previously defined groups, we first computed the distance between its scaling coefficients and the average coefficients of both groups, and then we assigned the brain to the diagnostic group having the smaller distance from its scaling coefficients. The brain was considered to be misclassified if its diagnosis differed from that of the group label assigned in the training dataset. Assigning each brain in the test dataset one at a time to a putative diagnostic label allowed us to compute misclassification rates in a dataset that was entirely independent of the training dataset that generated the classification algorithm. These split-half procedures were repeated independently for every pair of the 10 training and test datasets, thereby generating 10 independent estimates of the misclassification rates for each pair of clinical diagnoses that we tried to discriminate (e.g. TS vs HC). We computed the means and standard deviations of the misclassification rates across these multiple split-half and LOO cross-validations, which we then used to calculate the sensitivity and specificity of the diagnostic algorithms. It is important to note that, even though the features entering the classification algorithm and the labels assigned to the subsequent naturalistic groupings of brains were determined using previously known clinical diagnoses, it was the hierarchical clustering algorithm alone, operating without information about clinical diagnoses and based solely on the morphological features shared across brains within each dataset, that generated the naturalistic groupings of brains used in the following validation of MRI-based diagnoses.

Validation using Computer-Generated Datasets We generated two synthetic datasets, each with increasing levels of complexity in their surface morphologies, by superimposing spherical indentations or protrusions of known size positioned precisely in the dorsolateral prefrontal cortex (DLPFC) or occipital cortex (OC). [31] These two datasets were used to assess the construct validity of our procedures. The deformation in DLPFC was centered in the midportion of the pars triangularis over the inferior frontal gyrus. The deformation in OC was placed immediately to the left of the interhemispheric fissure on the lowermost portion of the occipital lobe, such that the deformation was tangential both to the cistern immediately superior to the cerebellum and the interhemispheric fissure (Figs. 2 & 3). We confirmed the accuracy of the placement of these deformations by viewing the deformed brains in the coronal, axial, and sagittal planes. The first dataset was created from copies of a single brain, and the second dataset was created from the brains of 20 different individuals. The second dataset was used to determine the optimal resolution of the scaling coefficients that best discriminate brains of healthy individuals from those with neuropsychiatric illnesses. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Figure 2. Warping a deformed brain to the template brain. We added deformations to copies of the template brain and then normalized those deformed brains to the undeformed template. The deformed brains were spatially normalized using a method that maximizes mutual information in the gray scale values across the images [32] and then warped the coregistered brain using a method based on fluid dynamics. [33] Because a deformed brain was identical to the template brain except for the added deformation, the deformation field only shows a large spatial deformation in the region of the added deformation. https://doi.org/10.1371/journal.pone.0050698.g002 PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Figure 3. Deformations at the dorsolateral prefrontal cortex (DLPFC) in the template brain. Top Row: In copies of the template brain we added a 15 mm wide protrusion or indentation in the DLPFC by centering the deformation over the midportion of the pars triangularis in the inferior frontal gyrus. We placed the deformed brains randomly in the coordinate space of the template brain. The deformed brains were coregistered to the template brain and then we computed the signed Euclidean distances between the surface of the coregistered brain and the surface of the template brain. The signed Euclidean distances were color-encoded and displayed on the template brain, and were mapped onto a unit sphere using a conformal mapping. Red: protrusion on the surface; Violet: indentations on the surface. Bottom Row: The distances on the unit sphere were transformed by applying spherical wavelet transformation to compute scaling coefficients at decreasing resolutions. https://doi.org/10.1371/journal.pone.0050698.g003 Synthetic Datasets Constructed from a Single Brain. To create the first set of 10 brains, , we first generated a deformed brain by adding a 15 mm indentation in the DLPFC in a copy of , and then placed 10 copies of that brain, each with a differing amounts of translations and rotations, in the coordinate space of the template brain (Figs. 2 & 3). Similarly, for the second set of 10 brains, , we first generated a deformed brain by adding a 15 mm protrusion to the same location of the DLPFC in a copy of , and then we placed 10 copies of that same brain at 10 random locations in the coordinate space. Using identical procedures, we generated a third set and a fourth set of 10 images by adding an indentation and protrusion, respectively, in the OC region in the copies of and then placing them at 10 random locations in the space. From these four sets of deformed brains we generated three new sets: (1) of 20 brains, (2) of 20 brains, and (3) of 40 deformed brains. We then applied our wavelet transform and classification procedures to discover natural groupings in each of these three synthetic datasets. Because brains in these datasets were identical to except for the added protrusion and indentation, we expected the brains to be clustered into four groups: one with protrusions in the DLPFC, a second with indentations in the DLPFC, a third with protrusions in the OC, and a fourth group with indentations in the OC. Synthetic Datasets Constructed from Different Individuals. From the brains of 20 healthy individuals, , we generated 4 sets of synthetic data: (1) the first set of 20 deformed brains contained a 15 mm indentation in the DLPFC, (2) the second set of 20 brains contained a 15 mm protrusion in the DLPFC, (3) the third set of 20 brains contained a 15 mm indentation in the OC, and (4) the fourth set of 20 brains contained a 15 mm protrusion in the OC. From these 4 sets of synthetic data we created 3 new subsets of brains: (1) the first subset contained 10 brains with indentations and another 10 brains with protrusions in the DLPFC, (2) the second subset of 10 brains with indentations and another 10 brains with protrusions in the OC, and (3) the third subset of 10 brains with indentations at DLPFC, another 10 brains with protrusion in the DLPFC, 10 more brains with indentations in the OC, and yet another 10 brains with protrusions in the OC. We then applied the method to automatically discover the natural groupings within these three sets of brains. For each of the two sets and , we expected to cluster the brains into two groups: one with only indentations and the other with only protrusions. We expected to cluster brains from the set into four groups, one with only protrusions in the DLPFC, another with only indentations in the DLPFC, another with only protrusions in the OC, and a last one with only indentations in the OC.

Validation using Real-World Data and Gold-Standard Clinical Diagnoses We validated the natural groupings identified in our classification algorithm using surface morphological measures in a large group of healthy individuals and groups of patients with known clinical diagnoses established by senior clinicians using research-based diagnostic interviews, either the Kiddie-Schedule for Affective Disorders and Schizophrenia (K-SADS) [43] in participants younger than 18 years or the Structured Clinical Interviews for DSM Disorders (SCIDs) [44] in participants older than 18 years. Final clinical diagnoses were assigned following a best-estimate consensus procedure conducted by two board-certified psychiatrists using all available research records but while blind to prior clinical diagnoses. [45], [46] All participants were free of a history of neurological illness, substance dependence, sustained loss of consciousness, and significant medical illness. All participants provided written informed consent for their participation in these studies. Healthy Participants. We acquired imaging data in 42 healthy children (18 males, age 10.5±2.43 years) and 40 healthy adults (22 males, age 32.42±10.7 years). [47] Additional exclusion criteria for healthy participants included a lifetime or a current DSM-IV Axis 1 or 2 disorder, and IQ<80 as measured by the WISC-III [48], WAIS [49], or Kaufmann-Brief Intelligence Test. [50] Tourette Syndrome (TS) Participants. Imaging data was acquired in 71 TS children (59 males, age 11.19±2.2 years) and 36 TS adults (21 males, age 37.34±10.9 years). [47] They were ascertained through the local chapter of the Tourette Syndrome (TS) Association and through the Tic Disorder Clinic of the Yale Child Study Center, New Haven, Conn. Diagnoses were supplemented using the Schedule for Tourette and Other Behavioral Syndromes [51]., and ratings of current and worst ever severity of tic symptoms using the Yale Global Tic Severity Scale. [52] ADHD Participants. We imaged 41 ADHD children (33 males, age 12.6±3.18 years) [28] who were recruited through the general outpatient clinic at the Yale Child Study Center or through advertisements with a local chapter of ChADD (Children with Attention Deficit Disorder). Diagnostic assessments were supplemented using the Conners ADHD Parent and Teacher Rating Scales [53], [54] and the DuPaul-Barkley ADHD rating scale [55]. ADHD subjects with lifetime of Obsessive Compulsive Disorder (OCD), Tourette Syndrome or Tic disorder, and premature birth (gestation ≤36 weeks) were excluded from this group. Bipolar Disorder (BD) I Participants. This group comprised 26 adults with BD (11 males, age 37.65±10.35 years) who were identified from general psychiatry outpatient clinics [56] at the Yale School of Medical Center or the Veterans Affairs Connecticut Healthcare System, or referred by the practitioner. [57] During the history of their illness, all participants met DSM-IV criteria for having had a clearly defined manic episode lasting at least one week. Schizophrenia (SZ) Participants. We acquired imaging data in 65 adults with SZ (41 males, age 42.16±8.71 years). [58] They were identified from general psychiatry outpatient clinics at Yale University Medical Center. All participants had been on their medication for at least 30 days and had not abused substances for at least 60 days. Diagnostic assessments were supplemented using the Positive and Negative Symptom Scale for Schizophrenia. [58], [59] Participants at High or Low Risk for Major Depressive Disorder (MDD). These individuals belonged to a 3-generation cohort in which the first two generations have been followed for more than 22 years. [3] In the first generation (“G1”), one group of adults was clinically ascertained during treatment of moderate to severe, recurrent, and functionally debilitating MDD; the other group was a sample of matched control adults from the same community who had no discernible lifetime history of depression. The biological offspring of the first generation comprised the second generation (“G2”), and the offspring of the second generation comprised the third generation (“G3”). [3], [60] The participants identified at “high risk” for developing MDD were those members of G2 and G3 who were biological descendants of the MDD group in G1 and those identified at “low risk” were the G2 and G3 biological descendants of the unaffected control group in G1. We acquired imaging data for 131 individuals in G3: 66 (12 children, 54 adults) in the high risk group, and 65 in the low risk group (31 children, 34 adults).

MRI Pulse Sequence T1-weighted MR images were acquired on a 1·5 Tesla GE scanner using a sagittal spoiled gradient recall sequence (TR = 24 msec, TE = 5 msec, 45° flip, frequency encoding S/I, no wrap, 256×192 matrix, FOV = 30 cm, 2 excitations, slice thickness = 1.2 mm, 124 contiguous slices). This sequence was selected to provide superior signal-to-noise and contrast-to-noise ratios in high-resolution images having nearly isotropic voxels (1.171×1.171×1.2 mm). For the participants who were at either a high (HR) or a low (LR) familial risk for depression, imaging data were acquired on a 1.5T Siemens Sonata scanner using a 3D MPRAGE sequence with the same parameters [3]. We computed feature vectors from several brain regions, according to the availability of their manual tracings. The feature vector was composed of the scaling coefficients that were derived from the measures of surface morphology. For discriminating brains of individuals with either TS or ADHD from those of healthy comparison individuals and for discriminating TS children from ADHD children, we used feature vectors computed for the morphology of the cortex, globus palladus, putamen, caudate, thalamus, amygdala, and hippocampus. For discriminating brains between persons with other clinical diagnoses, however, we used feature vectors computed from the cortex, amygdala, and hippocampus. The brains of individuals who were at either high or low risk for depression, we used feature vectors computed from the cortical thickness. We applied our classification procedures to discriminate (a) 41 children with ADHD from 42 healthy children, (b) 71 children with TS from 41 children with ADHD, (c) 26 adults with BD from 40 healthy adults, (d) 65 adults with schizophrenia (SZ) from 36 TS adults, (e) 65 adults with SZ from 26 BD adults, (f) 65 adults with SZ from 40 healthy adults, (g) 36 adults with TS from 40 healthy adults, (h) 71 children with TS from 42 healthy children, (i) 65 adults with SZ, 36 adults with TS, and 40 healthy adults, and (j) 65 adults with SZ, 26 adults with BD, and 40 healthy adults. We applied our classification scheme to the scaling coefficients that we determined differed at high levels of statistical significance (P-values ) between persons with a specific neuropsychiatric disorder and healthy comparison persons. The p-value for the statistical significance was determined empirically by first applying LOO analysis to scaling coefficients that differed at decreasing p-values and then selecting the p-value associated with the lowest misclassification rates (Fig. 4). Using feature vectors that differed at this stringent statistical threshold both reduced the dimensionality of the feature space and identified the features that best discriminated feature vectors for brains in each of the two groups. We then applied hierarchical clustering techniques to the selected scaling coefficients to identify two natural groupings of brains based solely on the similarity of their morphological features. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Figure 4. Optimal P-value threshold for scaling coefficients. Naturalistic groupings of the brains were generated using scaling coefficients that differed significantly with at most a specified P-value between groups of participants in our cohort. The optimal P-value of the statistical significance was selected from the plots of sensitivity and specificity, and the number of scaling coefficients, for various P-value thresholds in our cohort of 42 healthy children (HC) and 71 children with Tourette's Syndrome (TS). The scaling coefficients were computed for the right and left amygdalae, hippocampi, global pallidus, putamina, caudate nuclei, thalami, and hemisphere surfaces. At each P-value threshold, we applied hierarchical clustering to all coefficients that differed with at most the specified P-value to generate groupings of the brains. These groupings were analyzed using leave-one-out (LOO) cross validation to compute the sensitivity and specificity of our method for classifying an individual as a healthy child or a child with TS. We independently computed sensitivity and specificities for various P-value thresholds and plotted the sensitivity (SE, solid line) and specificity (SP, dashed line) (Left), and the number of coefficients (Right), as a function of P-value thresholds. For a P-value threshold<10−7, the method classified an individual with both high sensitivity and high specificity. At this P-value threshold, moreover, the number of coefficients was sufficiently reduced, thereby reducing the dimensionality of the feature space. We therefore applied a P-value<10−7 as a threshold for classifying an individual among various neuropsychiatric illnesses. SE, sensitivity; SP, specificity. https://doi.org/10.1371/journal.pone.0050698.g004 Each group of brains that hierarchical clustering generated was assigned the diagnosis of the majority of participants contained in that group. Furthermore, although we empirically selected the feature vectors upon which hierarchical clustering was performed, that selection was subsequently validated and their reproducibility was assessed using split-half cross validation. The selected feature vectors therefore should generalize to other similar datasets.