Abstract In this paper, we propose a novel framework for IQ estimation using Magnetic Resonance Imaging (MRI) data. In particular, we devise a new feature selection method based on an extended dirty model for jointly considering both element-wise sparsity and group-wise sparsity. Meanwhile, due to the absence of large dataset with consistent scanning protocols for the IQ estimation, we integrate multiple datasets scanned from different sites with different scanning parameters and protocols. In this way, there is large variability in these different datasets. To address this issue, we design a two-step procedure for 1) first identifying the possible scanning site for each testing subject and 2) then estimating the testing subject’s IQ by using a specific estimator designed for that scanning site. We perform two experiments to test the performance of our method by using the MRI data collected from 164 typically developing children between 6 and 15 years old. In the first experiment, we use a multi-kernel Support Vector Regression (SVR) for estimating IQ values, and obtain an average correlation coefficient of 0.718 and also an average root mean square error of 8.695 between the true IQs and the estimated ones. In the second experiment, we use a single-kernel SVR for IQ estimation, and achieve an average correlation coefficient of 0.684 and an average root mean square error of 9.166. All these results show the effectiveness of using imaging data for IQ prediction, which is rarely done in the field according to our knowledge.

Citation: Wang L, Wee C-Y, Suk H-I, Tang X, Shen D (2015) MRI-Based Intelligence Quotient (IQ) Estimation with Sparse Learning. PLoS ONE 10(3): e0117295. https://doi.org/10.1371/journal.pone.0117295 Academic Editor: Kewei Chen, Banner Alzheimer's Institute, UNITED STATES Received: July 21, 2014; Accepted: December 19, 2014; Published: March 30, 2015 This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication Data Availability: All data files are available from the ABIDE database http://fcon_1000.projects.nitrc.org/indi/abide/ Funding: This work was supported in part by NIH grants EB006733, EB008374, EB009634, AG041721, MH100217, and AG042599. Also, this work was partially funded by the National Natural Science Foundation of China (No. 81271568). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of manuscript. Competing interests: The authors have declared that no competing interests exist.

Introduction Intelligent Quotient (IQ) is a score, which is generally derived from a variety of tests, to assess human intelligence. Although the test-takers show varying scores when taking the same test at different occasions or taking different tests at the same age, clinical psychologists in general regard IQ score as a statistically valid metric for clinical purposes [1,2]. However, the current standard IQ tests are not applicable to infants or young children because of their questionnaire-based test series. Should we develop a more systematic technique to estimate current IQ or to predict future IQ, it would hold great promises for identifying infants or young children who may undergo unusual intellectual development, thus providing a chance to conduct early interventions such as specialized and tailored educations for them. Uncovering human intelligence has always been of major interest in cognitive neuroscience. With the advent of brain imaging, there have been efforts to investigate the relation between brain anatomy and intelligence [3,4], and substantial understanding has been achieved in the field. For example, Supekar et al. showed that the size and circuitry of certain parts of children’s brains could be a potential predictor for how well they would respond to intensive math tutoring [5]. Chen et al. [6] demonstrated that the volumetric analysis of gray matter (GM) from structural Magnetic Resonance Imaging (MRI) could be used to predict a subsequent decline in IQ in children with sickle cell disease. McDaniel et al. [3] found that the volume of the brain is positively correlated with IQ according to MRI-based experiments. Frangou et al. [7] reported positive correlations between IQ score and GM density of the orbitofrontal cortex, cingulate gyrus, cerebellum, and thalamus, but negative correlation between IQ score and the caudate nucleus. On the other hand, Navas-Sanchez et al. [8] investigated the relationship between IQ score and microstructure of white matter (WM) tracts using diffusion tensor imaging (DTI), and found that IQ score is positively correlated with fractional anisotropy (FA). Kim et al. [9] found that lower performance in verbal IQ score is correlated with the decrease of FA values. In another DTI-based study, Welcome et al. [10] discovered that the volume of WM fiber tracts is correlated with nonverbal IQ score. Inspired by these strong correlations between brain anatomy and IQ score, we propose, in this study, a novel framework to estimate IQ by using GM and WM features extracted from structural MRI. In the proposed framework, a machine learning technique is particularly designed to better estimate IQ score of a testing subject. Here, we treat the IQ estimation as a regression problem by taking the GM and WM features derived from MRI images as predictors and the corresponding IQ scores as target responses. However, in the context of neuroimaging data analysis, one of the most crucial and challenging issues is to build a generalized model for the cases with high feature dimensionality and small sample size [11–13]. Dimensionality reduction or feature selection has been considered as a promising approach to circumvent this limitation. While the former finds a new low-dimensional space to which the features in an ambient space are projected, the latter selects task-related features in the original feature space. Therefore, it is in general more natural and intuitive for a feature selection approach to interpret and understand the results. Hence, we pursue the feature selection strategy in this work. The existing feature selection methods can be broadly categorized into three types: filter-based, wrapper-based, and embedded-based approaches [11,14]. The filter-based approach selects subsets of features as a pre-processing step, but often ignores interaction among selected features. On the other hand, the wrapper-based approach uses a certain function to rank subsets of features according to their predictive power, but usually requires a huge computational cost. The embedded-based approach performs feature selection during optimization process, and is specific to the corresponding classification method. This approach usually proceeds more efficiently by directly optimizing a two-part objective function, with a goodness-of-fit term and another penalty term, for selection of a large number of variables. This also means that we can develop feature selection methods by simply adjusting the penalty term in the objective function. Thus, in this paper, we focus on the embedded-based feature selection approach. Recently, multi-task learning based feature selection methods have attracted increasing attention in machine learning, computer vision and artificial intelligence [15–19]. A task is usually referred to feature selection for a modality or for a type of target responses. Multi-task learning utilizes the intrinsic relationship among different tasks during a learning process [20,21], and thus achieved better performances than the counterpart single-task learning method, i.e., learning each task separately. Specifically, recent emergence of sparse least square regression method penalized by a L 2,1 -norm regularizer, called group sparse learning, allows us to select variables that can be jointly used for multiple tasks [17,22]. Hereafter, we use the terms of “variable” and “feature” interchangeably. The main limitation of the group sparse learning arises from its strong assumption that different tasks should share the same features, which often contradicts with the real situations, without considering the task-specific characteristics [23]. To mitigate this limitation, Jalali et al. [24] proposed a dirty model by integrating a L 1 -norm regularizer so that different tasks could share the same features but still have chance to preserve their respective characteristics. Concretely, this model decomposes the weight coefficient matrix into two parts, i.e., group-wise feature sparsity and element-wise feature sparsity. Note that the L 1 -norm based regularization tends to randomly select only a single feature from a group of highly correlated features [25]. Since the dirty model uses a L 1 -norm based regularization, it has the same problem. In this paper, we propose a novel feature selection method by extending the dirty model. Specifically, we devise a new regularization term with a squared Frobenius norm of the element-wise sparsity matrix to circumvent the problem of randomly selecting one feature from a group of highly correlated features. In this study, we treat feature selections for WM and GM features, with a shared target such as IQ score, as two different tasks. Thus, multi-task feature selection can be used in our application of IQ estimation with selected WM and GM features [21]. According to Reiss’s report, age is correlated to brain tissue volumes [26]. Thus, we also study the effect of age on our estimators in a supplementary experiment. The remainder of the paper is organized as follows. In the Materials and Preprocessing section, we provide information on the image data and the preprocessing pipeline. Then, the mathematical detail of the proposed feature selection method is described in the Method section. Finally, in the Experiment and Results section, we demonstrate the validity of the proposed method in estimating IQs with MRI image features by comparing with the state-of-the-art methods. Finally, we discuss our findings and conclude our work in the Discussions and Conclusion section, respectively.

Materials and Preprocessing Subjects We downloaded the data from Autism Brain Imaging Data Exchange (ABIDE) (available at http://fcon_1000.projects.nitrc.org/indi/abide/). Specifically, we used MRI samples of 164 (male/female: 130/34) typically developing children between 6 and 15 years old (11.1±2.1). MR images were scanned at 5 different sites: New York University Langone Medical Center (NYU: 59 samples), Kennedy Krieger Institute (KKI: 31 samples), Stanford University (Stanford: 20 samples), Oregon Health and Science University (OHSU: 15 samples), and University of California at Los Angeles (UCLA: 39 samples), using different scanning parameters and protocols. Concisely, two different datasets (each with 26 and 13 respectively) were scanned at UCLA. But, due to the limited number of samples, in this paper, we considered them as being from one site. Also, due to relatively small numbers of samples from Stanford and OHSU, we combined them and considered as a ‘SOHSU’ dataset of 35 samples. Table 1 summarizes the demographic characteristics of subjects used in this paper. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Table 1. Demographic characteristics of the used subjects. For age and IQ scores, we show the mean and corresponding standard deviations (SD). https://doi.org/10.1371/journal.pone.0117295.t001 Data Acquisition and Preprocessing For the details of data protocols and scanning parameters, please refer to ‘http://fcon_1000.projects.nitrc.org/indi/abide/’. Since the data used in this paper is publicly available, it does not require any ethics statement. For MR images, we performed image preprocessing by following the common pipeline of skull stripping [27], cerebellum removal, tissue segmentation (into gray matter (GM), white matter (WM), cerebrospinal fluid (CSF)), and registration to a template. For the registration, we used HAMMER [27,28], which have been successfully applied to a variety of datasets. We used the anatomical automatic labeling (AAL) atlas with 90 predefined regions. We then computed GM and WM tissue volumes of each of the 90 regions and used them as features, i.e., 90 GM features and 90 WM features.

Methods In this section, we propose a novel framework for IQ estimation using structural MRI features. As explained in the section of Data Acquisition and Preprocessing, the MRI datasets used in this paper were obtained from multiple imaging centers with different scanning parameters and protocols. Hence, there exists an inevitable high inter-dataset variability. For this reason, we use a two-step procedure in our framework as shown in Fig. 1. Specifically, given the MR images scanned at multiple scanning sites and their respective IQs, we first extract two types of imaging features, i.e., GM volumes and WM volumes, by going through the image preprocessing procedure as described above. We then select informative features with the proposed extended dirty model (which will be described below) to build an IQ estimator using a Support Vector Regression (SVR) model [20]. Here, it should be noted that the feature selection and SVR model learning are performed independently for different datasets. That is, for our four datasets, we will have their respective selected feature sets and SVRs. Besides feature selection models and estimators, we also construct a classifier to identify the scanning site at which a MR image was scanned. In the testing phase, given a testing MR image, we first perform the same procedures of image preprocessing and feature extraction, and then feed the extracted features to the site classifier to identify the scanning site. It is worth noting that the testing samples are not restricted to the predefined sites. Actually, for any given sample even from an unknown site, the site classifier can assign it to a site whose data is most similar to the testing sample. Based on the identified site (labeled as l in Fig. 1), we can finally estimate the testing subject’s IQ score by using the corresponding selected feature set and SVR estimator (SVR-l). It should be noted that, due to the lack of available longitudinal data, in this work, we only focus on the estimation of the current IQ score, not the predication of the future IQ score, but the proposed framework can be extended to predict a subject’s future IQ score. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Fig 1. A schematic diagram of the proposed IQ estimation framework using structural MRI data. https://doi.org/10.1371/journal.pone.0117295.g001 In the following, we will first describe the proposed feature selection method along with the training of an IQ score estimator, followed by a classifier to identify MRI data scanning site. Throughout the paper, we denote matrices, vectors, and scalars as boldface uppercase, boldface lowercase, and normal italic letters, respectively, and use a superscript T for a vector/matrix transpose. Feature Selection via Extended Dirty Model Due to the relatively small number of samples compared to the feature dimensionality, it is of importance to reduce the dimensionality for avoiding the over-fitting problem. Among various dimensionality reduction methods, in this paper, we focus on using the popular sparse least squared regression method, which has been successfully applied to diverse applications [20,29,30]. For clarity and simplicity, let us omit a notation of a scanning site; but we should note that, in this paper, the feature selection method described below is applied independently to the dataset of each scanning site. Hereafter, let us denote G and W for GM and WM, respectively. Let and denote, respectively, a set of D-dimensional feature vectors from GM, a set of D -dimensional feature vectors from WM, and the respective IQ scores of N subjects. In this paper, we assume that the target IQ scores y can be represented by a linear combination of the features, i.e., GM features X(G) and WM features X(W), as follows: (1) (2) Where w(G) ∈ RD and w(W) ∈ RD denote weight coefficient vectors of the respective feature vectors, and e(G) ∈ RN and e(W) ∈ RN are the noise vectors drawn independently from a standard Gaussian distribution. Since we parcellate a human brain into multiple regions and extract regional GM/WM tissue volume features, it is natural to assume the existence of a shared structure between two feature types, and thus group lasso [22] can be used: (3) Where W = [w(G) w(W)]∈ RD×2, and λ is a regularization parameter. It is, however, too strong to leverage the parameter overlap across all the features by means of group lasso [24,31]. Meanwhile, we believe that it is reasonable to use a dirty model [24] that can efficiently formulate the regularization scheme of 1) penalizing parameter overlap when it exists and 2) not penalizing parameter overlap when it doesn’t exist by using two separate parameter sets as follows: (4) where P ∈ RD×2 and Q ∈ RD×2 are two parameter matrices that encourage element-wise sparsity and group-wise sparsity, respectively. However, it is known that the solution of P for the element-wise sparsity tends to randomly select one feature from a group of highly correlated features. To this end, we propose to extend the original dirty model by further regularizing the parameter matrix P with a squared Frobenius norm as follows: (5) where ‖⋅‖ F denotes a Frobenius norm. In this paper, we call this new model as ‘extended dirty model’. By combining the relaxations of ‖P‖ 1 and in our objective function, we can jointly select the highly correlated features, but still encourage the group-wise feature selection, i.e., jointly selecting or unselecting regional GM/WM features, because of the L 2,1 -norm penalization on Q, i.e., ‖Q‖ 1,2 . In this way, we can efficiently handle not only the shared inter-feature-type structure, but also the pairwise intra-feature-type correlations (as shown in Fig. 2). PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Fig 2. Comparison of weight coefficient matrices for three different feature selection methods. Each colored square corresponds to a non-zero element after feature selection. Circled squares (with the yellow ellipse outlines) correspond to the selected group-wise features, and circled squares (with black rectangle outlines) correspond to the selected pair-wise correlated features. (A) Group lasso. (B) Traditional dirty model. (C) The proposed extended dirty model. https://doi.org/10.1371/journal.pone.0117295.g002 After solving the optimization problem in Eq. (4) via an accelerated proximal gradient method [32–34], we select the informative GM and WM features based on the non-zero entries of the respective weight coefficient vectors W = [w(G) w(W)]. Multi-Kernel Support Vector Regression. The selected features are then fed into a multi-kernel support vector regression (SVR) model [20], in which we fuse the complementary information of the two feature types, i.e., GM and WM volumes. After feature selection, given dimension-reduced N training samples along with the corresponding target response , the multi-kernel SVR solves the following primal formulation that uses the ε-insensitive loss function: (6) where w(G) and w(W) are the weight vectors, ϕ(G) and ϕ(W) denote the kernel-induced mapping functions of the two feature types (GM and WM), βi is a mixing coefficient with the constraint of βi ≥ 0 and ∑ i∊{G, W} βi = 1, ξ n and are the two sets of slack variables, and b is a bias. We then derive the dual function form of the multi-kernel SVR as follows: (7) where is the kernel function of the two training subjects in the feature type i, and are Lagrangian multipliers. We use a weighted linear combination of the kernel matrices as follows: (8) where is a new dimension-reduced testing subject. In this paper, we use a polynomial function for . After training a multi-kernel SVR, we can estimate a testing subject’s IQ as follows: (9) Construction of Site Classifier Due to the inevitable inter-dataset variability caused by varying scanning parameters and protocols across different scanning sites, we propose to construct a site classifier for identifying the scanning site at which a testing MR image was scanned. Specifically, we use a sparse multinomial logistic regression (SMLR) model formulated as follows: (10) where is an augmented feature vector that concatenates the original two (GM and WM) feature vectors of the n-th training sample, z(l) is a weight vector for the scanning site l, p(Z) is a prior on the parameter matrix , L is the total number of scanning sites, and is a site label of the n-th sample, represented by a “1-of-L” encoding vector such that if x n belongs to the scanning site l and otherwise. In this paper, l ∊ {NYU, KKI, UCLA, SOHSU} and L = 4. Regarding the prior p(Z), we use a Laplacian function (p(Z) = exp[−γ‖Z‖ 1 ], where γ is a sparsity control parameter) that is most widely used in the literature. The rationale of using SMLR as our classifier is that, unlike other classifiers, it automatically selects class-discriminative features and learns a separating hyper-plane. Please refer to [35] for a detailed explanation on SMLR.

Discussion Because of the inapplicability of the current questionnaire-based IQ tests to the infants or young children, in this paper, we proposed a novel framework to estimate children’s IQ scores using structural MR images. To the best of our knowledge, this is a pioneering work for estimating a subject’s IQ score from neuroimaging data. For neuroimaging data analysis, the high dimensionality of features overwhelms in general the number of samples available. Hence, dimension reduction or feature selection has been of great interest and of importance. In this paper, we use two types of features, i.e., GM and WM, and proposed a feature selection method for IQ score estimation. Since each GM feature and its corresponding WM feature are extracted from the same ROI, it is reasonable to assume that they are highly correlated, and also reasonable to utilize multi-task learning to incorporate the complementary information among different types of features [15,36]. Accordingly, we designed a new feature selection method based on a dirty model [24] with a newly devised regularization term, which can preserve advantages of the conventional dirty model but efficiently tackle the main disadvantage of the method, i.e., random selection of features from a group of highly correlated features. To validate the proposed method, we performed two sets of experiments with the MRI data obtained from 164 typically developing children. In the first experiment, which focused on validating the efficacy of the proposed feature selection method by comparing with the state-of-the-art feature selection methods, our proposed method achieved the best performance with CC of 0.718 and RMSE of 8.695, outperforming all the comparison methods. We believe that this favorable performance was resulted from the well-designed regularization terms, allowing both group-wise and element-wise feature selection, as well as joint selection of a group of features that are highly correlated in a pairwise manner. Most of the regions selected by our method have been reported in previous studies and are highly associated with cognitive ability and memory. The selected regions include the right opercular part of the inferior frontal gyrus, the left hippocampus, the bilateral thalamus, and the bilateral transverse temporal gyri (Heschl's gyri). It has been found that the hippocampus, an important component in limbic system, play an important role in memory and spatial navigation [37,38], and thalamus is thought of as a switchboard of information that processes and relays the sensory information [39]. The inferior frontal gyrus is also found related to semantic task processing [40]. The Heschl’s gyri is found related to auditory processing and semantic task [41], and its abnormalities has been shown as one of the main reasons for the impairment of human cognitive abilities [42,43]. Since memory and cognitive abilities are the two important components that are commonly assessed in IQ tests [44], changes of GM/WM tissues in these ROIs may affect the quantification of human intelligence. In a supplementary experiment, we treat age as an independent type of features and further combine it with GM and WM features by using multi-kernel SVR, for the purpose of investigating whether it will affect the performance of our estimators. However, we did not observe any significant improvements compared to their counterparts only using WM and GM features. In the second experimental paradigm, we proved the validity of assigning different weights to different feature types by comparing the estimators trained with a single-kernel SVR and a multi-kernel SVR. Again, the proposed method achieved better performances with CC of 0.684 and RMSE of 9.166 than the competing methods. However, the overall performances were degraded for all the methods compared to the case of using a multi-kernel SVR. Because of the unavoidable variability among datasets scanned at different sites with different protocols and scanning parameters, we also designed a site classifier, which achieved an average classification accuracy of 98.5%, to identify the potential scanning site of a test image, before constructing multiple site-specific IQ score estimators. In our experiment with one general estimator built by the whole datasets, i.e., no consideration of scanning sites, the performances were 0.511 for CC and 10.873 for RMSE, which were much inferior to any of the methods via our site-specific estimator after identifying the scanning site. Here, it should be emphasized that our framework is not limited to estimate the test images scanned at one of the predefined sites. That is, in real application, the site classifier can play a role of identifying a scanning site, which has similar scanning parameters or protocols to the real scanning site of the test image.

Conclusion In this paper, we proposed a novel framework for the estimation of a subject’s IQ score based on the neuroimaging features. Methodologically, since the number of features in neuroimaging data usually overwhelms the number of available samples, feature selection has been always an important role in the field. To this end, considering the strong relationship between GM and WM features in MR images, we devised a feature selection method based on a dirty model [24] that efficiently considered the coupling of different feature types, but still alleviated the strong parameter overlap across features. Specifically, we penalized an objective function with a squared Frobenius norm of the element-wise sparsity matrix. Using the MR Images acquired at different scanning sites with their own scanning parameters and protocols, we designed a two-step procedure, by which we first identified the scanning site of a test image and then estimated the test subject’s IQ by using the respective estimator. Also, we performed comparison between multi-kernel SVR and single-kernel SVR by two sets of experiments. From a practical point of view, although the current framework is not limited to apply for the MR images obtained from only a predefined site, it would be our forthcoming research issue to develop a more generalized method for efficiently handling the inter-site variability and thus constructing a single generalized estimator model for all subjects by skipping the scanning site identification step. Furthermore, thanks to the availability of various imaging modalities, it would be beneficiary to integrate their complementary information for more precise IQ score estimation. It should be emphasized again that our work paves a new way for a research on predicting an infant’s future IQ score by using neuroimaging data, which can be a potential indicator for parents to prepare their child’s education if needed.

Author Contributions Conceived and designed the experiments: LW CYW DS. Performed the experiments: LW. Analyzed the data: LW CYW DS. Contributed reagents/materials/analysis tools: LW CYW. Wrote the paper: LW CYW HIS XT DS.