A positive relationship between brain volume and intelligence has been suspected since the 19th century, and empirical studies seem to support this hypothesis. However, this claim is controversial because of concerns about publication bias and the lack of systematic control for critical confounding factors (e.g., height, population structure). We conducted a preregistered study of the relationship between brain volume and cognitive performance using a new sample of adults from the United Kingdom that is about 70% larger than the combined samples of all previous investigations on this subject ( N = 13,608). Our analyses systematically controlled for sex, age, height, socioeconomic status, and population structure, and our analyses were free of publication bias. We found a robust association between total brain volume and fluid intelligence ( r = .19), which is consistent with previous findings in the literature after controlling for measurement quality of intelligence in our data. We also found a positive relationship between total brain volume and educational attainment ( r = .12). These relationships were mainly driven by gray matter (rather than white matter or fluid volume), and effect sizes were similar for both sexes and across age groups.

From logical reasoning to grasping new concepts, humans differ in cognitive capacities. A substantial part of this variance is captured by psychometric measures such as fluid-intelligence tests or the general intelligence factor (g), which aggregates test results across various domains of cognitive performance. These measures are reliable, are stable across the life span (Deary, Whalley, Lemmon, Crawford, & Starr, 2000), and are associated with important life outcomes, including educational attainment (Deary, Strand, Smith, & Fernandes, 2007), job performance, and health (Batty et al., 2009).

Much research has been devoted to understanding how individual differences in cognitive performance arise and whether they can be accounted for by environmental, developmental, genetic, and neuroanatomical factors. A classic hypothesis proposes a positive association between intelligence and total brain volume (TBV; e.g., Galton, 1889). For decades, the only way to test this hypothesis was empirical studies using proxies of TBV such as head circumference. However, this work was controversial because of methodological issues (Stott, 1983) and concerns about racial and cultural bias.

The introduction of MRI in the late 1980s led to a burst of studies that directly examined the relationship between TBV and intelligence. The first published study reported a correlation (r) of .51 in a sample of 40 college students (Willerman, Schultz, Neal Rutledge, & Bigler, 1991). However, the reported association has declined as sample sizes have grown: The first meta-analysis of the literature (k = 14, N = 858) estimated an average correlation of .37 (Gignac, Vernon, & Wickett, 2003). A later, more comprehensive meta-analysis (k = 37, N = 1,530) estimated a smaller correlation of .29 (McDaniel, 2005). The largest meta-analysis to date, which included unpublished data, reported an even smaller correlation of .24 (k = 88, N = 8,036; Pietschnig, Penke, Wicherts, Zeiler, & Voracek, 2015).

Scholars have been debating the reliability, size, and meaning of a relationship between TBV and cognitive ability for many years (e.g., Stott, 1983). Finding consensus is impeded by three main limitations. First, researchers in only a few studies systematically controlled for confounding factors such as height, age, and socioeconomic status. A second concern is population stratification, that is, systematic biological differences across groups that might correlate with environmental and cultural factors.1 If not properly controlled for, population stratification can induce a spurious relationship between biomarkers and phenotypes (Cardon & Palmer, 2003). For example, individuals of northwest European descent may be slightly taller, have slightly larger brains, and perform slightly better in intelligence tests. But this effect could be primarily driven by more favorable environments (e.g., better schools, better health care) that could confound the relationship between TBV and intelligence. Genetic-association studies have shown that self-reported ethnicity is often not sufficient to correct for such confounds. However, controlling for the first few principal components from the genetic data of the study participants has proven to be an effective strategy that is now standard in genetic-association studies (Price et al., 2006; Rietveld, Conley, et al., 2014).

A third issue is a bias toward publication of positive, statistically significant results and effect sizes that overestimate the true values. The most recent meta-analysis on intelligence and TBV by Pietschnig et al. (2015) found evidence for publication bias and showed that the correlation in published reports was .30 (k = 53, N = 3,956) but was only .17 in a larger set of unpublished studies (k = 67, N = 2,822). In contrast, Gignac and Bates (2017) did not find evidence for publication bias. However, their analysis was restricted to published studies of healthy participants only. Although several analytical techniques have been proposed to detect such bias, their capacity to estimate the true effect size is controversial, and their power to reject the null hypothesis of no publication bias is low in small samples (Ioannidis, Munafò, Fusar-Poli, Nosek, & David, 2014). A clean approach to avoid publication bias is to conduct a well-powered study following a preregistered analysis plan (Gonzales & Cunningham, 2015).

We addressed these three shortcomings of the current literature here. Specifically, we conducted a preregistered analysis of the relationship between measures of cognitive performance and TBV using data from the UK Biobank (UKB; Miller et al., 2016; Sudlow et al., 2015). The UKB is a data collection of unprecedented richness and scale that was not part of any previous study on the relationship between TBV and cognitive performance. Our final sample contained 13,608 genotyped individuals with anatomical MRI brain scans. The sample was an adult population (> 40 years old) of European decent, all of whom completed at least one test of cognitive performance. This sample is approximately 70% larger than the combined samples of all previous studies associating in vivo TBV and intelligence (Pietschnig et al., 2015); it permits novel ways to control for confounds and allows comparing effect sizes across various demographic groups.

Our investigation provided the opportunity for two additional contributions. First, we investigated the differential contributions of gray matter (neuronal cell bodies, dendrites, unmyelinated axons, glial cells, synapses, and capillaries), white matter (myelinated axons, or tracts), and cerebrospinal fluid to the association between TBV and intelligence. Both gray- and white-matter volumes are genetically correlated with general intelligence (Sniekers et al., 2017) and are thought to contribute to the association on the basis of small-sample studies (e.g., Haier, Jung, Yeo, Head, & Alkire, 2004); understanding their differential contributions is essential for further theoretical development of accounts of the relationship between TBV and intelligence.

Second, we examined the association between TBV and educational attainment, an important real-life outcome that crucially impacts individuals’ income, health, and longevity (Lager & Torssander, 2012). To date, this association has been investigated in only a few small-sample studies of elderly or clinical populations (e.g., Coffey, Saxton, Ratcliff, Bryan, & Lucke, 1999).

Discussion Our results indicate that there is a robust positive relationship between TBV and intelligence that is similar across sex and various age strata. When we accounted for the relatively low reliability of the cognitive measures in the UKB, the estimated effect sizes were comparable with previous recent meta-analyses on this topic. Yet TBV accounts for a relatively small share in overall variation in cognitive performance (ΔR2 ≈ 2%). Importantly, our results are free of publication bias and come from a sample that is approximately 70% larger than the combined samples of all previous investigations on this topic, and our analyses systematically controlled for important potential confounds. Our analysis shows that the lion’s share of the association between TBV and intelligence is explained by individual differences in gray-matter volume. Furthermore, we document that TBV is also positively associated with educational attainment, although the association is substantially smaller than for intelligence (ΔR2 ≈ 0.9%). Although our study demonstrates that the association between TBV and cognitive performance is solid, our work and the literature as a whole have limitations that provide avenues for further research. First, our results are based on a large population sample of adults and the elderly that overrepresented individuals of higher socioeconomic status, and the sample consists almost entirely of individuals of European descent from the United Kingdom. The positive, linear relationship between TBV and fluid intelligence that we observed was driven by the large majority of individuals in that sample who had brain volumes and measures of fluid intelligence in the normal range. At the extreme ends of the distributions, the relationship between TBV and fluid intelligence seems to be weaker or even nonexistent (see Fig. 1). It is reasonable to expect that the positive relationship we observed would not hold for people affected by chronic or degenerative neurological problems (e.g., dementia, Alzheimer’s disease, Parkinson’s disease) or other medical conditions that are known to be linked to abnormal brain development or physiology. Furthermore, the results may not generalize to children. Although we have no reason to believe that the results depend on other characteristics of the participants, materials, or context, continuous exploration of the generalizability of the results to other populations is worthwhile. A second important limitation concerns causal inference. The empirical work on the relationship between TBV and intelligence and between TBV and educational attainment, including our study, is based on nonexperimental data, so we cannot rule out reverse causation or the influence of unobserved confounds. Although it may be most intuitive that brain anatomy causes cognitive performance and educational attainment, a reverse relationship may also exist (e.g., via brain plasticity that adapts the brain to how it is used; e.g., May, 2011). Furthermore, although we controlled for more potential confounding factors than did authors of earlier studies, the identifying assumption of regression analysis that the error term is independent from the regressors may still be violated. For example, people with larger brains may have access to better schools and health-care systems in a manner that is not captured by our genetic and demographic controls. In addition, brain anatomy and cognitive performance are both highly heritable (h2 ≈ .8; Posthuma et al., 2002), and the coheritability between the two (r g ≈ .3; Sniekers et al., 2017) suggests that both are partially influenced by the same genetic factors (Okbay, Beauchamp, et al., 2016; Posthuma et al., 2002). Investigating these relationships further would be of interest. Third, the low measurement quality of behavioral phenotypes in large data sets is a limitation that is the result of a trade-off between sample size and measurement accuracy, both of which are costly. Whereas using a crude measure of a construct in a very large sample often allows obtaining greater statistical power than a perfect measure in a small sample (Okbay, Baselmans, et al., 2016), measurement error leads to attenuated (standardized) effect-size estimates. We addressed this challenge by reporting disattenuated effects that divided sample estimates by the square root of the retest reliability of the cognitive measures. Fourth, it is likely that structural differences in specific brain regions differentially contribute to individual differences in cognitive performance, over and above what is captured by TBV. Of note, despite a strong correlation between sex and TBV in our sample (r = .62), all of the cognitive measures in our sample showed sex differences that were meager (see Table S1), suggesting the possibility that sex differences in other brain characteristics compensate for the discrepancy in TBV (e.g., women have greater cortical thickness; Ritchie et al., 2018). Fifth, the relationship between anatomical brain features and cognitive performance is likely mediated by neural processes that are better captured by measures of functional brain activity than by volumetric measurements. Furthermore, many distinct mental processes (e.g., attention and memory) contribute to performance in intelligence tests. Therefore, our understanding of how individual differences in cognition arise may benefit greatly from more detailed, possibly nonlinear, mappings between anatomical and functional brain measures and individual differences in distinct mental capacities. Finally, further theoretical accounts for what the association between TBV and intelligence might imply about the evolution of human intelligence are needed (e.g., González-Forero & Gardner, 2018). Many previous investigations have been motivated by an implicit assumption that humans have particularly large brains and are also exceptionally cognitively flexible, relative to other species (Gonda, Herczeg, & Merilä, 2013). However, there are no agreeable means to quantify intelligence between species, and although some recent efforts reported cross-species correlations between TBV and cognitive traits such as self-control (MacLean et al., 2014) and problem solving (Benson-Amram, Dantzer, Stricker, Swanson, & Holekamp, 2016), this emerging literature is in its early days and is not without controversies (Kabadayi, Taylor, von Bayern, & Osvath, 2016). Furthermore, humans are by no means the species with the largest brain size (cetaceans and elephants have much larger brains), ratio of brain to body size, or relative number of neurons, and empirical evidence suggests that our species is also not superior when it comes to various cognitive phenotypes, including working memory (Inoue & Matsuzawa, 2007). We hope that future studies will shed further light on how individual differences in cognitive capacities arise by exploring the associations between cognitive abilities and additional biomarkers (such as functional brain measures) as well as their interactions with environmental conditions.

Acknowledgements The research was conducted using the UK Biobank resource under Application No. 11425.

Action Editor

Ralph Adolphs served as action editor for this article. Author Contributions

G. Nave and P. D. Koellinger developed the study concept and design, analyzed and interpreted the data, and wrote the manuscript. W. H. Jung and R. Karlsson Linnér preprocessed the brain-imaging data. All the authors provided comments and approved the final manuscript for submission. Declaration of Conflicting Interests

The author(s) declared that there were no conflicts of interest with respect to the authorship or the publication of this article. Funding

P. D. Koellinger acknowledges financial support from a European Research Council Consolidator Grant (647648 EdGe). G. Nave acknowledges the financial support of the Wharton Neuroscience Initiative and The Wharton School’s Dean Research Fund. Supplemental Material

Additional supporting information can be found at http://journals.sagepub.com/doi/suppl/10.1177/0956797618808470 Open Practices

All data and materials are available via UK Biobank at http://www.ukbiobank.ac.uk/. Data scripts for the present analyses can be found on the Open Science Framework (OSF) at https://osf.io/x8rnq/. The design and analysis plans were preregistered on the OSF at https://osf.io/fvm7p/register/565fb3678c5e4a66b5582f67. The complete Open Practices Disclosure for this article can be found at http://journals.sagepub.com/doi/suppl/10.1177/0956797618808470. This article has received the badges for Open Data, Open Materials, and Preregistration. More information about the Open Practices badges can be found at http://www.psychologicalscience.org/publications/badges.

Notes 1.

Population stratification is a well-known concern in genetic-association studies. For example, a spurious relationship between the lactase gene that codes for the enzyme lactase and educational attainment is found if analyses in genetic-association studies do not properly control for population stratification (Rietveld, Conley, et al., 2014). Lactose intolerance is unrelated to cognitive ability and is much more frequent in southeastern parts of Europe than in northwestern parts. 2.

Similar results (i.e., significant coefficients for TBV and substantial overlap in the 95% CIs) were obtained when we repeated the analyses for each of the test-taking instances in isolation, in the subsample that took all four tests (N = 708; see Table S4 in the Supplemental Material). 3.

We report regression results with dummy variables for east and north coordinates; the results held when dummy variables for all interactions of east and north coordinates were used instead. 4.

This approach assumes that the measurement noise of TBV is negligible.