This mapping process is intentionally exploratory, and given this novel application of the analytical approach alongside a unique sample, it is difficult to make clear predictions about what the algorithm will learn. The children attending the clinic completed assessments of the cognitive skills known to be impaired in children with learning‐related problems including measures of phonological processing, short‐term and working memory, attention and fluid reasoning (nonverbal IQ). Children with deficits in reading or language, or associated diagnoses of dyslexia or SLI often have phonological processing problems (Bishop & Snowling, 2004 ; Joanisse, Manis, Keating, & Seidenberg, 2000 ; Ramus et al., 2010; Vellutino, Fletcher, Snowling, & Scanlon, 2004 ). In contrast, those with specific problems in maths or diagnosed dyscalculia are typically characterized by more severe deficits in spatial short‐term and working memory (Geary, 2004 ; Holmes, Adams, & Hamilton, 2008 ; McKenzie, Bull, & Gray, 2003 ; McLean & Hitch, 1999 ; Rasmussen & Bisanz, 2005 ; Simmons, Singleton, & Horne, 2008 ; Swanson & Sachse‐Lee, 2001 ) and broader executive functions (Bull, Espy, & Wiebe, 2008 ; Bull, Espy, Wiebe, Sheffield, & Nelson, 2011 ; Szucs, Devine, Soltesz, Nobes, & Gabriel, 2013 ; Van der Ven, Kroesbergen, Boom, & Leseman, 2012 ). So, a reasonable prediction is that our large sample of struggling learners will include subgroups of children with either phonological problems or spatial short‐term/working memory difficulties, and that these children will predominantly struggle with reading or maths respectively. Below average nonverbal reasoning is common among individuals with reading (Duranovic, Tinjak, & Turbic‐Hadzagic, 2014 ; Gathercole et al., 2016 ; Pointus, 1981 ; Winner et al., 2001 ) and maths problems (Gathercole et al., 2016 ; Swanson & Beebe‐Frankenberger, 2004 ; Fuchs et al., 2005 , 2006 , Cirino et al., 2015 ), as well as those with ADHD (Holmes et al., 2013). So, another reasonable prediction is that our sample of struggling learners will include a subgroup of children with low fluid reasoning skills, and this will be associated with problems in both reading and maths.

A second aim of the current study was to use the information from the SOM to identify data‐driven groups within the sample. Even though it is likely that the dimensions that distinguish children are continuous, there may be important reasons to need to group children according to their shared cognitive profile: (a) to identify shared etiological mechanisms, which will be easier with data‐driven homogenous groups; and (b) to identify groups for a particular intervention. To do this the SOM was combined with another form of machine learning, k‐means clustering (Lloyd, 1982 ). This combination identified groups of children with similar cognitive profiles. Having grouped the children with the cognitive data, we then explored the learning and behavioral profiles of these groups. We also explored differences in white‐matter connectivity between the data‐driven groups. White matter maturation is a crucial process of brain development that extends into the third decade of life (Lebel, Treit, & Beaulieu, 2017 ) and relates closely to cognitive development (Clayden et al., 2012 ; Stevens, Skudlarski, Pearlson, & Calhoun, 2009 ). The brain can be modeled as a network of brain regions connected by white matter, commonly referred to as a connectome (Hagmann et al., 2008 ). We derived whole‐brain connectomes and compared them across the groups produced by the machine learning. In short, our second aim was to use machine learning to identify groups of children with shared cognitive profiles, and then test whether these groups differ on learning and behavioral measures, and in terms of brain organization.

We applied this technique to a large heterogeneous sample of struggling learners. Children were referred to a research clinic, the Centre for Attention Learning and Memory (CALM), by health and education professionals for ongoing problems in attention, memory, language, or poor school progress in reading and/or maths. Recruitment was deliberately broad to capture the wide range of poor learners in the school population. Children were accepted into the study irrespective of diagnosis or comorbidity: only non‐native English speakers and those with uncorrected sight or hearing problems were excluded. Our first aim was to test whether the multidimensional structure learnt by the map reflects in different sample characteristics, such as the primary reason for referral to the research clinic (e.g., problems in attention, learning, memory, or language).

In this study, we use a different data‐driven approach—machine learning. Machine learning methods have rarely been applied to understanding developmental disorders (e.g., Fair, Bathula, Nikolas, & Nigg, 2012 ). Typical applications use supervised machine learning (Peng, Lin, Zhang, & Wang, 2013 ) in which the algorithm attempts to learn about predefined categories of children. Here, we use an unsupervised learning approach whereby the algorithm attempts to learn about the structure of the data itself rather than which data correspond to predefined groups. Specifically, we used Self Organising Maps (SOMs; Kohonen, 1989 ), a type of artificial neural network. Due to their efficacy in visualizing multidimensional data, SOMs have been successfully applied to a variety of tasks including textual information retrieval (Lin, Soergel, & Marchionini, 1991 ), the interpretation of gene expression data (Tamayo et al., 1999 ), and ecological community modeling (Giraudel & Lek, 2001 ). SOMs use an algorithm that projects the original data from a multidimensional input space onto a two‐dimensional grid of nodes called a “map”, while preserving topographical information. This produces an intervariable representational space, wherein the geometric distance between nodes corresponds to the degree of similarity in the input data. Within the current context, input data are individual children from our sample. The map will represent the cognitive profiles of the children; the closer the children are represented within the map, the more similar their cognitive profiles. In this way, SOMs enable us to map the multidimensional space of our sample—the map will represent how different children group together because of their similar profiles, and in doing so it also learns about the dimensions that most reliably distinguish children.

For these reasons a number of researchers have advocated empirically based quantitative classification systems (Archibald, Cardy, Joanisse, & Ansari, 2013 ; Coghill & Sonuga‐Barke, 2012 ; Ramus, Marshall, Rosen, & van der Lely, 2013 ; Sonuga‐Barke & Coghill, 2014 ), although few studies have done this. The aim of this approach is to move away from identifying highly selective discrete groups and instead focus on identifying continuous dimensions that distinguish individuals and can be used as potential targets for intervention. Dimensions are derived through data‐driven explorations of the data, with no a priori assumptions about group membership. For example, factor analysis, a statistical method that groups variables based on shared variance, is used most commonly to derive underlying dimensions from sets of symptoms or measures (e.g., Kotov et al., 2017 ). This technique has been used to identify dimensions of phonological and nonphonological skills in children with diagnosed SLI and dyslexia (Ramus et al., 2013 ) and separate latent constructs for inattention and hyperactivity in children with ADHD (Martel, Von Eye, & Nigg, 2010 ). An alternative approach, as yet rarely used, is to cluster children together according to shared profiles based on empirical data. In turn this can be used to inform classification systems, and consequently treatment approaches. Clustering algorithms have been used to identify groups of children with distinct learning (Archibald et al., 2013 ) and behavioral profiles (Bathelt, Holmes, the CALM Team, & Astle, in press ).

Using strict exclusionary criteria also overemphasizes similarities within groups, and the distinctiveness between groups (Coghill & Sonuga‐Barke, 2012 ; Kotov et al., 2017 ). It is widely documented that symptoms vary between children with the same diagnosis. For example, performance on cognitive tasks within ADHD groups is notoriously variable (Castellanos et al., 2005 ; Nigg, Willcutt, Doyle, & Sonuga‐Barke, 2005 ). Symptoms also co‐occur across groups. For example, symptoms of inattention are common in children with poor literacy and maths skills (Hart et al., 2010 ; Loe & Feldman, 2007 ; Zentall, 2007 ), ADHD, autism spectrum disorder (ASD; Rommelse, Geurts, Franke, Buitelaar & Hartman, 2011), SLI (Duinmeijer, Jong & de Scheper, 2012 ), and dyslexia (Germano, Gagliano, & Curatolo, 2010 ; Willcutt & Pennington, 2000 ). Finally, this approach of selectively grouping children does not capture the majority of struggling learners—they often do not receive a diagnosis or are characterized by complex and comorbid difficulties that would rule them out of studies with strict inclusion criteria.

However, this approach can fail to accommodate the high rates of comorbidity within developmental disorders (Coghill & Sonuga‐Barke, 2012 ; Kotov et al., 2017 ) and learning‐related difficulties (e.g., Angold, Costello, & Erkanli, 1999 ). Over 80% of children with ADHD meet criteria for at least one additional diagnosis (e.g., Faraone & Biederman, 1998 ; Willcutt & Pennington, 2000 ) and 15%–45% have co‐occurring reading difficulties (e.g., Biederman et al., 1991 ; Faraone et al., 1993 ; Semrud‐Clikeman et al., 1992). Reading difficulties also co‐occur 50% of the time with maths (Moll et al., 2014 ) or language problems (McArthur et al., 2000 ).

Our understanding of the causes of learning difficulties comes largely from studying children with a specific diagnosis (e.g., ADHD or SLI) or those selected from community or clinical samples on the basis of strict inclusion criteria (e.g., children with poor reading skills, but age‐typical IQ and maths abilities). Most studies recruit children with “pure” problems (e.g., children with ADHD without comorbid dyslexia, or children with maths problems in the absence of reading problems or low IQ). There are practical advantages to this approach: it outlines clear criteria to inform practitioner decision‐making about primary areas of weakness that can be used to identify intervention options.

Prevalence rates of developmental disorders linked with learning difficulties, including attention deficit hyperactivity disorder (ADHD), dyslexia, dyscalculia, and specific language impairment (SLI), range from 3% to 8% (American Psychiatric Association, 2013 ; Norbury et al., 2016 ; Polanczyk, Willcutt, Salum, Kieling, & Rohde, 2014 ; Shalev & Gross‐Tsur, 2001 ). However, the number of children who struggle at school is far higher. In the UK for example, around 30% of the school population fail to meet expected targets in reading or maths at age 11 (Department for Education United Kingdom, 2017 ). The long‐term outcomes for children who struggle at school include continued educational underachievement, poor mental health (Roeser & Eccles, 2000 ), and underemployment (Bynner & Parsons, 2005 ; de Beer, Engels, Heerkens, & van der Klink, 2014 ).

For ROI definition, T1‐weighted images were submitted to nonlocal means denoizing in DiPy, robust brain extraction using ANTs v1.9 (Avants et al., 2011 ), and reconstruction in FreeSurfer v5.3 ( http://surfer.nmr.mgh.harvard.edu ). Regions of interests (ROIs) were based on the Desikan‐Killiany parcellation of the MNI template (Desikan et al., 2006 ) with 34 cortical ROIs per hemisphere and 17 subcortical ROIs. The cortical parcellation was expanded by 2 mm into the subcortical white matter. The parcellation was moved to diffusion space using FreeSurfer tools.

First, MRI scans were converted from the native DICOM to compressed NIfTI‐1 format. Next, correction for motion, eddy currents, and field inhomogeneities was applied using FSL eddy (see Figure 2 for an overview of processing steps). Furthermore, we submitted the images to nonlocal means de‐noising (Manjon, Coupe, Marti‐Bonmati, Collins, & Robles, 2009 ) using DiPy v0.11 (Garyfallidis et al., 2014 ) to boost signal‐to‐noise ratio. The diffusion tensor model was fitted to derive maps of fractional anisotropy (FA) using dtifit in FSL v.5.0.6 (Behrens et al., 2003 ). A constant solid angle (CSA) model was fitted to the 60‐gradient‐direction diffusion‐weighted images using a maximum harmonic order of 8 using DiPy. Whole‐brain probabilistic tractography was performed with 8 seeds in any voxel with a General FA value higher than 0.1. The step size was set to 0.5 and the maximum number of crossing fibers per voxel to 2.

Magnetic resonance imaging data were acquired at the MRC Cognition and Brain Sciences Unit in Cambridge, on the Siemens 3 T Tim Trio system (Siemens Healthcare) using a 32‐channel quadrature head coil. T1‐weighted volume scans were acquired using a whole brain coverage 3D Magnetization Prepared Rapid Acquisition Gradient Echo (MP RAGE) sequence acquired using 1 mm isometric image resolution. Echo time was 2.98 ms, and repetition time was 2,250 ms. Diffusion scans were acquired using echo‐planar diffusion‐weighted images with an isotropic set of 60 noncollinear directions, using a weighting factor of b = 1,000s × mm −2 , interleaved with a T2‐weighted ( b = 0) volume. Whole brain coverage was obtained with 60 contiguous axial slices and isometric image resolution of 2 mm. Echo time was 90 ms and repetition time was 8,400 ms.

254 children participated in the MRI part of the study. 64 scans were not useable due to excessive motion (>3 mm movement during the diffusion sequence estimated through FSL eddy or visual inspection of T1‐weighted images). The finally sample for MRI analysis consisted of 184 children (123 male, Age [months]: M = 117.62, SE = 1.938). The ratios of SOM‐defined groups did not differ from the behavioral sample (Cluster 1: n = 48, Cluster 2: n = 44, Cluster 3: n = 51, Cluster 4: n = 41, χ 2 = 0.01, p > 0.999). There were no significant differences between the groups in residual movement (see Table 1 ). For an additional comparison with a typically developing sample, we selected children from a concurrent study about risk and resilience in education that shared many of the same cognitive assessments and used an identical neuroimaging protocol (Ethical approval number: Pre.2015.11). This sample included children attending mainstream school in the UK with normal or corrected‐to‐normal vision or hearing and no history of brain injury who were recruited via local schools and through advertisements in public places (childcare and community centers, libraries). Children were selected to use in the current analysis if they fell within the right age‐bracket, had useable MRI data (i.e., good quality T1, movement during the diffusion sequence <3 mm and 69 diffusion‐weighted volumes) and had cognitive scores within the normal range—scores above the 40th percentile for their age on assessments of fluid reasoning, vocabulary, verbal, and visuospatial short‐term and working memory were selected. This additional comparison sample consisted of 36 children (18 male, Age [months]: M = 117.79, SE = 3.129, range: 83.02–150.05, mean fluid IQ = 53 [age expected = 50 ± 10 SD ]).

The cognitive profiles of the clusters were compared to identify the ways in which they differ (it is necessarily the case that they will differ). Importantly the groups were then compared on other measures not included in the machine learning, namely learning and behavioral assessments and in terms of brain organization. For all of our assessments we corrected for multiple comparisons using a Bonferroni Correction within each data type (i.e., cognition, learning, and behavioral measures).

To identify data‐driven clusters the node weight values from the SOM were submitted to k‐means clustering. Once the nodes were grouped according to the similarity of their weights, we identified children assigned to each group of nodes. This provided us with clusters of children based on nodes they were assigned to in the original mapping. This process was repeated 1,000 times, with the map retrained on every iteration and the k‐means clustering recalculated, to check that the clusters were robust. Inevitably some children sit on the arbitrary cluster boundary within the map and thus fall inconsistently into multiple different clusters on each iteration. Across the 1,000 iterations we were able to identify the children's modal cluster, which was used for subsequent analyses. There was a clear modal cluster for 529 children (chi‐squared test, p s < 0.05). To check the clustering, each cluster distribution was plotted on the original map. If the process had worked then all cluster members ought to sit on neighboring nodes within the original map.

There is no clear theoretical rationale for how many clusters the map should be carved into. By definition, the map is fully continuous without clear boundaries. One way to validate the clusters is to test whether they generalize to data not included in the initial machine learning—this could be other cognitive data, learning measures, behavioral questionnaires, or brain data. For example, if clusters cannot be distinguished with unseen data then it suggests that the machine learning is over‐fitting the data and/or the number of clusters is too high. In this case, the maps would need to be trained with fewer repetitions, a reduced set of nodes, or most likely a reduced number of clusters. To foreshadow our results, in the current sample we can identify four clusters of children. This is the maximum number of clusters that yield generalizable unique profiles. The Supplementary Materials includes a five cluster solution, which replicates the clusters from the four cluster solution, and a statistical comparison between the two. The Supplementary Materials also includes an alternative means of grouping children that is not reliant on machine learning—community detection via a network analysis (e.g., Bathelt et al. 2018 ).

The artificial neural network maps cognitive profiles in a continuous 2D plane of nodes, where space indicates similarity. We carved our map into sections and grouped the children who fell within that section, thereby clustering children with similar cognitive profiles. Clustering children who sit close together ought to yield groups with relatively homogenous cognitive profiles that are necessarily distinct from children in other clusters.

To do this, the BMU was tested for each different group. The topographical distribution of this was tested statistically using a version of the Kolmogorov–Smirnov test adapted for 2‐dimensional data from two samples (Peacock, 1983 ). The statistic ( D ) tests whether the two samples are drawn from the same or different 2‐dimensional distributions. In each case we compared the distribution of members of a particular category (e.g., referred for language problems) with that of nonmembers (e.g., those not referred for language problems). A significant statistic indicates that the two distributions are not drawn from the same underlying population—i.e., that this particular way of categorizing children is significantly predictive of the cognitive profile that they have. Conversely a nonsignificant result indicates that the category's members are equally likely to appear anywhere within the map.

Once the map had been trained we tested whether different groups of children cluster together. For example, if a child's diagnosis predicts their cognitive profile, then children with the same diagnosis ought to cluster together. That is, they ought to sit on nodes that are near one another. However, if there is no systematic relationship between this characteristic and a child's cognitive profile then this group will be randomly scattered across the map. We tested this both for diagnosis (ASD, dyslexia and ADHD) and the referrer's primary reason for sending the child to the CALM clinic (problems with attention, language, memory, or poor school progress).

At the end of the training process: (a) the weight vector for each individual node reflects the scores of the children for whom that node was the BMU; (b) neighboring nodes have similar weights, such that children with similar cognitive profiles are allocated to nodes that are near each other. In essence, the machine learning process generates a model of the multidimensional cognitive data set on which the SOM was trained.

SOMs were trained using the neural network toolbox in MATLAB v2017a (MathWorks Inc., Natick, MA). SOMs consist of a predefined number of nodes laid out on a two‐dimensional grid plane. Each node corresponds to a weight vector with the same dimensionality as the input data. We initialized the node weight vectors using linear combinations of the first two principal components of the input data. SOMs were then trained using a batch implementation, in which each node i is associated with a model m i and a “buffer memory”. One cycle of the batch algorithm can be broken down into the following: Each input vector x ( t ) is mapped onto the node with which it shares the least Euclidean distance at time t . This node is known as its Best Matching Unit (BMU). Each buffer sums the values of all input vectors x ( t ) in the neighborhood set belonging to node i and divides this by the total number of these input vectors to derive a mean value. All m i are then updated concurrently according to these values. In this way, neighboring nodes become more similar to one another. This cycle is repeated, clearing all the buffers on each cycle and distributing new copies of the input vectors into them. The neighborhood size ( ND ) decreases as a function of t over n steps in an “ordering” phase, from the initial neighborhood size ( INS ) down to 1 (Equation 1 ). In the “fine tuning” phase the neighborhood size is fixed at <1, meaning that the node weights are updated according only to the input vectors for which they are the BMU. This node adjustment process is the mechanism by which the SOM learns about the input data. In the current training process, we used 5 “ordering” runs and a single final fine tuning run.

A SOM consists of a predefined number of nodes laid out on a two‐dimensional grid plane; each node corresponds to a “node‐weight vector” with the same dimensionality as the input data. In our case, each node will have seven weights associated with it (one for each cognitive task). A rule of thumb for determining map size, is to use a number of nodes equal to around 5 times the square root of the number of observations (Tian, Azarian, & Pecht, 2014 ). In this case, we used a 10 by 10 grid of nodes.

Spelling, reading (Word Reading), and maths (Numerical Operations) measures were taken from the Wechsler Individual Achievement Test (WIAT; Wechsler, 2001 ). Educational assessments were available for 98% of the sample. The maths fluency subtest from Woodcock Johnson III Test of Achievement (WJ; Woodcock, Schrank, & Mather, 2007 ) was administered to the first 68 children attending the CALM clinic, instead of the numerical operations measure. We had initially chosen this over the numerical operations subtest from the WIAT because the fluency measure is timed, and therefore quicker to administer. However, when the maths fluency scores were very low we switched to the WIAT—we wondered whether the timed nature of the maths fluency measure was underestimating children's maths ability in this sample. Scores were slightly better with the WIAT, but not significantly so. A two sample Kolmogorov–Smirnoff test indicated that the maths fluency scores from the first 68 participants and the numerical operations scores from the next 68 participants are drawn from the same continuous distribution ([ D = 0.2144, p = 0.0874]). Age standardized scores were converted to z scores using the normative sample mean and standard deviation for all learning measures.

A large battery of cognitive, learning, and behavioral measures are administered in the CALM clinic (full protocol: http://calm.mrc-cbu.cam.ac.uk/protocol/ ). Seven cognitive tasks meeting the following criteria were used for the machine learning: (a) data were available for all 530 children; (b) accuracy was the outcome variable; and (c) age standardized norms were available. For all measures, age standardized scores were converted Z scores using the mean and standard deviation from the respective normative samples to put all measures on a common scale (original age norms were a mix of scaled, t, and standard scores). The following measures of fluid and crystallized reasoning were included: Matrix Reasoning, a measure of fluid intelligence (Wechsler Abbreviated Scale of Intelligence [WASI]; Wechsler, 2011 ); Peabody Picture Vocabulary Test (PPVT; Dunn & Dunn, 2007 ). Phonological processing was assessed using the Alliteration subtest of the Phonological Awareness Battery (PhAB; Frederickson, Reason, & Firth, 1997 ). Verbal and visuo‐spatial short‐term and working memory were measured using Digit Recall, Dot Matrix, Backward Digit Recall, and Mr X subtests from the Automated Working Memory Assessment (AWMA; Alloway, 2007 ).

Children were recruited with single, multiple or no diagnosis. The majority did not have a diagnosis (340, 64%). The prevalence of diagnoses were: ASD = 6%; dyslexia = 6%; obsessive compulsive disorder (OCD) = 2%. Twenty‐two percent of the sample had a diagnosis of ADD or ADHD, and further 11% were under assessment for ADHD (on an ADHD clinic waiting list for a likely diagnosis of ADD or ADHD). Finally, 19% of the sample had received support from a Speech and Language Therapist (SLT) within the past 2 years, but did not typically have a diagnosis of SLI.

The initial sample consisted of 550 children. Twenty children (3.6%) were subsequently removed because of missing data on any one of the seven tasks used for the machine learning. All subsequent details refer to the remaining 530 children (see Figure 1 for recruitment). Thirty three percent were referred for problems with attention, 11% for language difficulties, 10% for memory problems, and 43% for problems with poor school progress (for 3% of children referrers did not provide a primary referral reason). The final sample (mean age = 111 months, range = 65–215 months) contained 366 boys (69%). A high proportion of boys is consistent with prevalence estimates for different developmental disorders within cohort studies (e.g., Russell, Rodgers, Ukomunne, & Ford, 2014 ).

Children were referred by practitioners working in educational or clinical services to the Centre for Attention Learning and Memory (CALM), a research clinic at the MRC Cognition and Brain Sciences Unit, University of Cambridge. Referrers were asked to identify the primary reason for referral, which could include ongoing problems in “attention”, “learning”, “memory”, or “poor school progress”. The only exclusion criteria were uncorrected problems in vision or hearing and English as a second language.

Regional comparison indicated a significant reduction for Cluster 1 (Broad Deficits) compared to both comparison groups for the right inferior frontal gyrus (see Figure 6 , C1: M = 0.59, SE = 0.027; C3: M = 0.65, SE = 0.025; C0: M = 0.72, SE = 0.028; t (82) = −3.19, p corrected = 0.018), the right lateral orbitofrontal gyrus (C1: M = 0.66, SE = 0.021; C3: M = 0.72, SE = 0.021; C0: M = 0.75, SE = 0.025; t (82) = −2.91, p corrected = 0.032), the left fusiform gyrus (C1: M = 0.69, SE = 0.017; C3: M = 0.76, SE = 0.017; C0: M = 0.79, SE = 0.021; t (82) = −3.69, p corrected = 0.011), and the left precentral gyrus (C1: M = 0.95, SE = 0.019; C3: M = 1.01, SE = 0.019; C0: M = 1.04, SE = 0.016; t (82) = −3.42, p corrected = 0.013). The comparison between Cluster 4 (Phonological Deficits) and both comparison groups indicated significantly lower connection strength in the left precentral gyrus (C4: M = 0.97, SE = 0.016; C3: M = 1.01, SE = 0.019; C0: M = 1.04, SE = 0.016; C4 vs. C0: t (75) = −3.03, p corrected = 0.013) and left rostral anterior cingulate gyrus (C4: M = 0.30, SE = 0.009; C3: M = 0.33, SE = 0.01; C0: M = 0.35, SE = 0.009; C4 vs. C0: t (75) = −3.51, p corrected = 0.006). There were no significant differences between Cluster 2 (Working Memory Deficits) and the comparison groups, once controlling for multiple comparisons.

Differences in white matter connections between the SOM‐defined groups were investigated to uncover the neurobiological correlates of the grouping. Each of the deficit groups (Clusters 1, 2 and 4) was compared to the Age Appropriate group (Cluster 3) and an independent sample of typically developing children (TD). Statistical comparison of connection strengths by region indicate significantly lower connection strengths for frontal, temporal, parietal, and subcortical connections in Cluster 1 compared to Cluster 3 and TD (see Table 3 and Figure 6 ). There was no significant difference in regional connection strength between Cluster 2 and Cluster 3 or between Cluster 2 and TD. The comparison of Cluster 4 and Cluster 3 indicated significantly lower strength of parietal connections and the comparison with TD indicated significantly different frontal connections.

There were also two factors within the CCC‐2, explaining 74% of the variance. The rotated factor solution can be found in Table S4 . Subscales tapping pragmatic aspects of communication load most highly on Factor 1: coherence, inappropriate initiation, stereotyped language, context, nonverbal social skills, and interests subscales. Factor 2 was comprised of scales measuring structural language skills: speech, syntax, semantics, and coherence. These factors were labeled “Pragmatic Communication” and “Structural Language” respectively. There were no significant group differences in Pragmatic Communication factor scores. The groups did, however, differ significantly on Structural Language factor scores ( p < 0.001). Post hoc tests revealed children in the Broad Cognitive Deficits or Phonological Deficits groups were rated as having significantly greater structural language problems than either of the other two groups (all p s < 0.001). The respective Structural Language difference between the Age Appropriate and Working Memory Deficit groups was marginal ( p = 0.043), as was that between the Broad Cognitive Deficits and Phonological Deficits groups ( p = 0.043).

The subscale scores for both BRIEF and CCC‐2 questionnaires, split by group, can be seen in Table 2 . Correlation matrices for both the BRIEF and CCC‐2 can be found in Tables S1 and S2 . Before comparing the groups a PCA was conducted separately for the subscales of each questionnaire to reduce the number of comparisons. These analyses identified two factors in the BRIEF, which together explained 76.1% of the variance. The rotated factor solution and scale loadings can be found in Table S3 . The first factor captured the working memory, initiate, planning, organization, and monitor subscales. The emotional control, shift, inhibit, and monitor subscales loaded most highly on the second factor. The first factor therefore corresponds to “Cold” executive functions associated with behavioral regulation, while the second corresponds more closely to “Hot” cognitive aspects of executive function. Factor scores were saved and compared across groups: there were no significant differences in behavior across the clusters (all p s > 0.05).

The four clusters also have important differences on other measures not included in the machine learning, which are also shown in Table 2 . Age Appropriate children had age appropriate learning skills across spelling, reading, and maths. Children in the Broad Cognitive Deficits group had severe problems on all learning outcomes, being more than 1.5 standard deviations below the age expected levels. The other two groups, despite their highly contrasting cognitive profiles did not differ in their learning profiles—moderate phonological problems or working memory difficulties were associated with very similar learning profiles. This is reflected in the statistics—all measures show a significant group difference (all p s < 0.001), and all the post hoc tests are significant at p < 0.001, except between the Phonological and Working Memory deficit groups (spelling, p = 0.22; reading, p = 0.41; maths, p = 0.79).

Children referred primarily for problems with attention, poor learning, or memory were equally likely to be assigned to each group. Similarly, a diagnosis did not predict group membership. The only category predictive of group membership was whether the child was under the care of an SLT. These children were disproportionately likely to be members of either the Broad Cognitive Deficits or Phonological Deficits groups. All of these statistics can be found in Table 2 .

The four groups are roughly equivalent in size, and although the children in the Phonological Deficits group tend to be younger, there are no significant age differences. The Broad Cognitive Deficit group contains a disproportionate number of girls, relative to the rest of the sample (χ 2 = 6.12, p = 0.0133). Conversely the Age Appropriate group contains more boys than expected (χ 2 = 6.80, p = 0.009). The Working Memory deficit and Phonological Deficit groups contain the proportions of boys and girls expected (χ 2 = 0.01, p = 0.91; and χ 2 = 0.01, p = 0.92 respectively).

The profiles of the four clusters can be seen in Figure 5 , with scores and group comparisons presented in Table 2 . All measures differed significantly across groups (all p s < 0.001). Post hoc Tukey tests were used to identify the underlying pairwise comparisons that produce these significant effects. For Matrix Reasoning all post hoc tests were significant at p < 0.001, except between clusters 2 and 4 (Working Memory vs. Phonological Deficits groups). The Working Memory Deficits and Phonological Deficits groups have equivalent Matrix Reasoning scores ( p = 0.86). For Vocabulary, all post hocs were significant at p < 0.001. For the Phonological Awareness task, all post hocs were significant at p < 0.007. For Verbal STM, all post hocs were significant at p < 0.001. For Spatial STM, all post hocs were significant at p < 0.001, except between clusters 1 and 2 (Broad Deficits vs. Working Memory deficits groups). The Broad Cognitive Deficits and Working Memory Deficits groups have equivalent Spatial STM scores ( p = 0.51). For Verbal WM, all post hocs were significant at p < 0.001, except for between cluster 2 and 4 (Working Memory vs. Phonological Deficits groups). The Working Memory Deficits and Phonological Deficits groups had equivalent Verbal WM scores ( p = 0.68). And finally, for Spatial WM all post hocs were significant at p < 0.001, except for between clusters 3 and 4 (Age Appropriate vs. Phonological Deficits groups). The Age Appropriate and Phonological Deficits group had equivalent Spatial WM performance ( p = 0.13).

Each group necessarily has a distinct cognitive profile. The first cluster includes children with broad and severe cognitive difficulties—these children are around a standard deviation or more below the age‐expected level on all cognitive measures. The third cluster includes children with age‐typical cognitive abilities, performing close to age‐expected levels on all tasks. These two clusters are subsequently referred to as the “Broad Cognitive Deficits” and the “Age Appropriate” groups respectively. The remaining two clusters have intermediate profiles. They have similar moderate difficulties with Matrix Reasoning, but distinct profiles on the remaining measures. The second cluster has difficulties on the spatial STM, and verbal and spatial WM measures. This group is called the “Working Memory Deficits” group. The fourth cluster has difficulties tasks with a verbal component: vocabulary, phonological awareness, verbal STM, and verbal WM. This cluster is called the “Phonological Deficits” group.

The top panel shows the distributions of children assigned to each of the four clusters. Beneath each map the statistic indicates that all four clusters occupy a nonrandom set of nodes within the map. Beneath the maps the cognitive profile of each cluster is shown, ordered by cluster number. The scale indicates performance as a z score relative to age expected levels. The dots indicate individual children with the shade indicating the child's consistency within that cluster over the 1,000 iterations—the darker the shade the more consistent the child

To explore whether different sample characteristics (diagnostic status, referral reason) are reflected within the map, the best matching node for different groups of children was selected. If category membership significantly predicts a child's cognitive profile then these children should sit together in the map. Conversely, if membership is not predictive then the distribution of members should not differ significantly from that of nonmembers. Figure 4 shows the distribution of all children within our network, then for each category of primary referral reason and then each of the major diagnoses. The statistics are shown under each topography. None are significant. That is, children are evenly scattered regardless of the primary reason for referral or diagnosis; each of these characteristics provides no information about a child's cognitive profile on our measures.

If tasks discriminate children in similar ways they should have similar node weight topographies. This was quantified by correlating the weight vectors. The resulting correlation matrix can be seen in the bottom right corner of Figure 2 . There are some noteworthy relationships. For example, the two measures traditionally combined to produce a full‐scale IQ score, the Matrix Reasoning and PPVT vocabulary measure, have very highly correlated weights. Tasks that share a phonological component have highly correlated weight matrices: alliteration, verbal STM, and verbal WM measures. Finally, spatial STM and WM measures are somewhat distinct from other measures, with weight matrices that are only moderately correlated with the other tasks.

4 DISCUSSION

We used machine learning to identify the cognitive profiles within a large heterogeneous sample of children with learning‐related problems. These profiles were represented as topographical maps. None of the known characteristics of the children (e.g., diagnosis or referral route) were predictive of the cognitive profiles identified by the machine learning. To highlight the cognitive profiles that exist within the dataset, we subsequently carved the topographical maps into four sections. The children that correspond to these four sections will necessarily have distinct cognitive profiles, but they could also be distinguished in terms of learning and behavioral scores, and patterns of brain organization. The four groups cut across any traditional diagnostic groups that existed within the data.

More than half of the sample fell into two extreme groups, one with age‐appropriate cognitive abilities and the other with widespread cognitive deficits that were at least one standard deviation below age‐typical levels across all tasks. There was no evidence that children with age‐expected scores on the cognitive measures had learning difficulties. Their performance was in the age‐typical range across all measures of learning and their structural communication skills were rated as normal for their age. But we should be very cautious in regarding these children as typically developing; they have been referred by professionals in children's services, and as a group they have elevated behavioral difficulties. For this reason in our neuroimaging analysis we used an additional external comparison group.

The learning scores of the broad deficit group place them within the bottom 5% of the population on measures of spelling, reading, and maths, and they were rated as having difficulties in both structural and pragmatic aspects of communication. Generalized cognitive deficits therefore appear to constrain multiple aspects of learning. They also had behavioral problems related to executive function, although this was true for all four groups. Relative to both comparison groups, this group also had reduced structural connectivity in the left precentral gyrus, right inferior frontal gyrus, right lateral occipital cortex, and the left fusiform. These areas have been previously identified as playing a key role in multiple higher order cognitive skills. For example, the right inferior frontal gyrus is implicated in multiple different executive functions, most commonly measures of inhibitory control (Aron, Robbins, & Poldrack, 2014); the lateral occipital cortex has been found to be modulated by visual attention (Sprague & Serences, 2013); left premotor areas have been linked to language‐related difficulties in both children and adults (Mayes, Reilly, & Morgan, 2015; Scott, McGettigan, & Eisner, 2009); and the fusiform gyrus has been suggested as a locus of immature processing of word forms in dyslexia (Tamboer, Vorst, Ghebreab, & Scholte, 2016). These general struggling learners are rarely studied, but our data suggest that they are common amongst those coming to the attention of children's specialist services. Their relative under‐representation in studies of learning‐related problems means that we have little understanding of the key underlying deficits, mechanisms or potential routes to effective intervention. It is also interesting to note that girls were disproportionately common in this group, relative to the sample as a whole or indeed relative to most studies of learning difficulties. Conversely very few girls appeared in the age‐appropriate cognitive profile group. In short, the girls referred to the study tended to have more severe cognitive and learning difficulties. One possibility is that there is a gender bias in the reason for children coming to the attention of children's specialist services, with boys being identified more commonly for behavioral difficulties (which may be less closely tied to cognitive and learning profiles), whereas more severe cognitive or learning difficulties are needed for girls to come to the attention of specialists.

Two intermediate groups, both with fluid reasoning scores in the low‐average range, were also identified. One intermediate group was characterized by problems on tasks requiring phonological processing, with performance around three quarters of a standard deviation below age‐expected levels on measures of phonological awareness, and verbal short‐term and working memory. These children had significant problems with structural aspects of communication, mirroring the well‐documented link between phonological processing difficulties and specific difficulties with language (Bishop & Norbury, 2002; Bishop & Snowling, 2004; Ramus et al., 2013). However, the learning profile demonstrates equivalent and large deficits across measures of reading, spelling, and mathematics. Poor phonological processing is associated with both poor reading (Carroll & Snowling, 2004; Snowling, 1995; Wagner & Torgesen, 1987) and mathematical development (De Smedt, Taylor, Archibald, & Ansari, 2010; Hecht, Torgesen, Wagner, & Rashotte, 2001; Swanson & Sachse‐Lee, 2001). A consistent finding within the field of learning difficulties is that phonological problems are linked selectively with reading. The majority of these findings come from studies that select poor readers, but this is not the same as demonstrating that phonological impairments will always result in selective reading difficulties. Our data suggest that children selected on the basis of phonological difficulties will actually have more widespread learning problems. Membership of the phonological deficit group was associated with reduced structural connectivity in the left precentral gyrus and rostral anterior cingulate, relative to both comparison groups. The precentral gyrus has been implicated in language processing and is thought to be involved in speech production and also decoding via articulatory simulation (Scott et al., 2009). This area has also been implicated in selective language impairment (Mayes et al., 2015). Furthermore, tracts of the perisylvian language network that connect temporal and frontal language areas deficits are passing the precentral gyrus and may be substantially contributing the connectomics differences. Differences in white matter properties of these tracts have been repeatedly implicated in language deficits (Rimrodt, Peterson, Denckla, Kaufmann, & Cutting, 2010; Roberts et al., 2014). This would also mirror the structural communication difficulties that these children demonstrate. Indeed, this is the only behavioral measure that aligns well with the cognitive profiles—children who perform poorly on phonological tasks are also rated as having significant structural language problems by their parents. Other behavioral measures of executive control do not align well with cognitive profiles.

The fourth group had a somewhat contrasting profile of cognitive deficit to the phonological deficit group. They were characterized by similar fluid IQ scores but had more pronounced difficulties in working memory. Their spatial short‐term memory scores were over a standard deviation below age‐expected levels, and half a standard deviation down on the verbal and spatial working memory measures. Their phonological abilities were less impaired, they were not rated as having the structural language difficulties reported for the phonological deficit group, and their neural profile was less homogenous. One possibility is that multiple different etiological routes can result in this profile of difficulties.

Despite contrasting cognitive and neural profiles, the learning profiles of the working memory and phonological deficit groups were nearly identical. This diverges strongly from a preceding literature that emphasizes a marked association between phonological difficulties and problems with literacy (Lyytinen et al., 2004; Snowling, Bishop, & Stothard, 2000; Tanaka et al., 2011), and an emerging literature that suggests strong associations between spatial short‐term and working memory problems and numeracy difficulties (Bull et al., 2008; Raghubar, Barnes, & Hecht, 2010; Szucs et al., 2013). These previous studies all recruit on the basis of highly selective learning profiles (e.g., maths problems in the absence of reading difficulties) or diagnostic group, which will have overestimated the distinctiveness of these impairments within the general population of struggling learners.

Despite their utility, machine learning approaches to exploring cognitive profiles have limitations. The current combination of a multidimensional mapping method with a data‐driven clustering algorithm suffers from the drawback that the number of groups within the data is underspecified. The mapping process is continuous, with no obvious boundaries, which makes it difficult to have a clear rationale about the formation of groups. Inevitably some children will sit close to a group boundary within the map. Our approach was to add clusters until the clusters did not differ on measures not included in the machine learning. This is how we arrived at four clusters. This is a relatively conservative approach, since different cognitive profiles could exist that genuinely have identical learning, behavioral, and neural correlates. Furthermore, we suspect that datasets with higher dimensionality, stemming from a more widespread battery of measures, could have greater success in identifying more widely differing cognitive profiles.

An alternative to machine learning is to use a network analysis with a community detection algorithm (e.g., Bathelt et al., 2018; Fair et al., 2012). An example of this approach applied to our data can be found in our Supplementary Materials section. This represents the children as nodes and the correlation between their profiles as edges. It is possible to use this approach to identify communities of clusters that maximize the correlation within cluster and the distinctiveness across clusters. This iterative process includes a quality of separation metric (Q) which the clustering algorithm is designed maximize. A major advantage of this approach is that no a priori assumptions about the number of clusters need to be made. However, there are also drawbacks to this alternative. The primary limitation is that a network analysis clusters children on the basis of a correlation matrix. As such it is blind to overall severity. The current sample contains a large number of children with relatively consistent poor scores across all cognitive measures and many children with stable age‐appropriate scores. A network analysis would not be able to distinguish these two groups because the two profiles are highly correlated (this is indeed the case, see Supplementary Materials). The SOM uses Euclidean Distance as its primary means of locating children within the 2D topographical space, and as such is able represent both selective cognitive impairments and overall differences in severity. A further limitation is sample size. Whilst we included 530 children in the topographical mapping process, only 220 children were used in the structural neuroimaging comparison. This likely means that we only have sufficient power to detect the largest and most consistent group differences. More diffuse but equally important differences in whole brain connectome organization might exist, but a larger sample would be needed to identify them.

In summary, we used a machine learning approach that represents high‐dimensional data as a 2D topography, to map the profiles of struggling learners. We combined this with a clustering algorithm to identify particular cognitive profiles represented within the map. Specifically, four profiles could be identified that comprise children with: (a) general and severe deficits, (b) age‐appropriate performance, (c) working memory deficits, (d) phonological deficits. Furthermore, these data‐driven groups are likely to align closely with underlying etiological mechanisms, as evidenced by differences in brain organization across two of the deficit groups, and provide the opportunity to devise interventions that more specifically target the cognitive difficulties faced by individuals with particular profiles.