Study population

For the discovery analysis, we adopted a two-stage developmental design including cohorts with HC scores during childhood (Pediatric HC, mean age 6−9 years of age, N = 10,600), during adulthood (Adult HC, mean age 44−61 years of age, N = 8281) and a combination thereof (N = 18,881) including individuals of European descent from 11 population-based cohorts (Supplementary Tables 1 and 2, Fig. 1). Cohorts include The Avon Longitudinal Study of Parents and Children (ALSPAC), the Generation R Study (GenR), the Western Australia Pregnancy Cohort Study (RAINE), the Copenhagen Prospective Study on Asthma in Children (COPSAC2000 and COPSAC2010), the Infancia y Medio Ambiente cohort (INMA), the Hellenic Isolated Cohorts HELIC-Pomak and HELIC-MANOLIS, the Orkney Complex Disease Study (ORCADES), the Croatian Biobank Korčula (CROATIA-KORCULA), and the Viking Health Study-Shetland (VIKING). Within ALSPAC analysis was performed separately in individuals with whole-genome sequence data (ALSPAC WGS) and chip-based genotyping (ALSPAC GWA). For follow-up, we studied 973 individuals from the Croatian Biobank, Split (CROATIA-SPLIT) (Supplementary Table 2). Institutional and/or local ethics committee approval was obtained for each study. Written informed consent was received from every participant within each cohort, and this study has complied with all ethical regulations. An overview of each cohort can be found in Supplementary Tables 1 and 2 with more detailed information in Supplementary Note 1.

Genotyping

Within ALSPAC, we obtained low read depth (average × 7) whole-genome sequencing data (ALSPAC WGS)55. Chip-based genotyping was performed on various commercial genotyping platforms, depending on the cohort (Supplementary Table 1). Prior to the imputation, all cohorts had similar quality control; variants were excluded because of high levels of missingness (SNP call rate < 98%), strong departures from Hardy−Weinberg equilibrium (p < 1.0 × 10−6), or low MAF (<1%). Individuals were removed if there were sex discordance, high heterozygosity, low call rate (<97.5%) or duplicates. For imputation, the reference panel was either joint UK10K/1000 Genomes55 or the Haplotype Reference Consortium15. Additional details can be found in Supplementary Table 1 and Supplementary Note 1.

In addition to study-specific quality control measures, central quality control was performed using the EasyQC R package56. First, variants were filtered for imputation quality score (imputed studies only, INFO > 0.6), minor allele count (MAC; ALSPAC WGS MAC > 4, all imputed studies MAC > 10) and a minimum MAF of 0.0025. SNPs with MAF discrepancies (>0.30) compared to the HRC panel were also excluded. Marker names were harmonized and reported effect and noneffect alleles were compared against reference data (Build 37). Variants with missing or mismatched alleles were dropped, in addition all insertion/deletions (INDELs), duplicate SNPs and multiallelic SNPs were excluded. The reported EAF for each study was plotted against the frequency in the HRC reference data to identify possible strand alignment issues (Supplementary Figures 1, 3). The final number of variants passing all quality control tests and the per-study genomic inflation factor (λ) are reported in Supplementary Tables 1 and 2.

Phenotype preparation

Pertinent to this study, HC measures in all individual cohorts were transformed into Z-scores using a unified protocol. After the removal of outliers (±4 SD within each sample), HC was adjusted for age within males and females separately. Residuals for each sex were subsequently transformed into Z-scores and eventually combined (thus removing inherent sex-specific effects). Note that the phenotype transformation within ALSPAC was jointly carried out for both sequenced and genome-wide imputed samples.

Genetic-relationship structural equation modeling

Developmental changes in the genetic architecture of HC scores between the ages of 1.5 and 15 years were modeled using genetic-relationship structural equation modeling (GSEM, R gsem library, v0.1.2)24. This multivariate analysis of genetic variance combines whole-genome genotyping information with structural equation modeling techniques using a full information maximum likelihood approach24. Changes in genetic variance composition were assessed with longitudinal HC scores in ALSPAC participants (7924 individuals with up to three measures; 1.5 years, N = 3945; 7 years N = 5819; 15 years, N = 3406). HC scores were Z-standardized at each age, as described above. Genetic-relationship matrices were constructed based on directly genotyped variants in unrelated individuals, using GCTA software57, and the phenotypic variance dissected into genetic and residual influences using a full Cholesky decomposition model24.

Multiple testing correction

Using Matrix Spectral Decomposition (matSpD)58, we estimated that we analyzed 1.52 effective independent phenotypes within this study (Pediatric, Adult and Pediatric + adult HC scores and ICV12 scores) according to the LDSC-based genetic correlations22.

Single variant association analysis

Single variant genome-wide association analysis, assuming an additive genetic model, was carried out independently within each cohort using standard software (Supplementary Table 1, Supplementary Note 1). Residualized HC scores (Z-scores) were regressed on genotype dosage using a linear regression framework. For cohorts with unrelated subjects (Supplementary Table 1) association analysis was carried out using SNPTEST v2.5.0 (-method expected, -frequentist)59. Note that HC scores in GenR were, in addition, adjusted for four principal components. Cohorts with related participants (HELIC cohorts) utilized a linear mixed model to control for family and cryptic relatedness, implemented in GEMMA60.

Individual cohort level summary statistics for HC were combined genome-wide with standard error-weighted fixed effects meta-analysis, allowing for the existence of age-specific effects through an age-stratified design (Fig. 1a). We restricted each HC meta-analysis (Pediatric, Adult, Pediatric + adult) to variants with a minimum sample size of N > 5000. Genomic control correction was applied at the individual cohort level and heterogeneity between effects estimates was quantified using the I-squared statistic as implemented in METAL61. Accounting for the effective number of independent phenotypes studied, the threshold for genome-wide significance was fixed at 3.3 × 10−8 and the threshold for suggestive evidence at 6.6 × 10−6.

We contacted all studies (known to us) with (a) HC information available in later childhood or adult samples, (b) participants of European ancestry and (c) genotype data. Studies with whole-genome sequencing or densely imputed genotype data (HRC or UK10K/1KG combined templates) were included in the HC meta-analysis, while studies with imputation to other templates were reserved for follow-up. Following this strategy, the majority of studies were included in the meta-analysis, with follow-up in a single study.

Identification of known variants and conditional analysis

Known GWAS signals (p ≤ 5.0 × 10−8) were identified from previous studies on HC in infancy13, ICV12,62,63,64 and brain volume65 using either published or publicly available data (Supplementary Table 4). Conditional analysis was performed with GCTA software using summary statistics from HC (Pediatric) and HC (Pediatric + adult) meta-analyses (Supplementary Figure 4). In addition, we carried out an LD clustering of independent signals from the HC (Pediatric + adult) meta-analysis with respect to all known loci. Briefly, LD clustering is an iterative process that starts with the most significant SNP, which is clumped with variants that have pairwise LD of r2 ≥ 0.2 within 500 kb using PLINK v1.90b3w, and all variants in LD are removed. Then, the same clumping procedure is repeated for the next top SNP and the iteration continues until there are no more top variants with p < 1.0 × 10−4. For details, see Supplementary Note 2. For sensitivity analysis, we repeated the LD clustering with known loci for height as identified through the GIANT consortium (697 known independent height GWAS signals19, r2 = 0.2, ±500 kb).

Combined meta-analysis of HC and ICV

We carried out a weighted Z-score meta-analysis of the combined HC (Pediatric + adult) meta-analysis and the largest publicly available genome-wide summary statistics on intracranial volume (ICV; N = 26,577) based on data from Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) and the Enhancing NeuroImaging Genetics through Meta-Analysis (ENIGMA) consortium12. A weighted Z-score meta-analysis was carried out using METAL61 using standardized regression coefficients and 12,124,458 imputed or genotyped variants, assuming a genome-wide threshold of significance at p ≤ 3.3 × 10−8.

We used the Z-scores (Z) from the METAL output to calculate the standardized regression coefficient (β) for each SNP and trait66

$$\widehat {\beta _j} \approx Z_j\frac{{\widehat {\sigma _y}}}{{\sqrt {N_j \times 2(1 - {\mathrm {EAF}}_j){\mathrm {EAF}}_j} }},$$ (1)

where SNP j has an effect allele frequency (EAF j ) and \(\widehat {\sigma _y}\) is standard deviation of the phenotype, which is assumed to equal one for standardized traits. The standard error (SE) is calculated as

$$Z_j = \frac{{\widehat {\beta _j}}}{{{\mathrm {SE}}(\widehat {\beta _j})}}.$$ (2)

To disentangle lead signals observed in both the HC (Pediatric + adult) and the combined ICV + HC (Pediatric + adult) meta-analysis, variants were conditioned on each other using GCTA software and summary statistics.

Gene-based analysis

Gene-based tests for association were performed using MAGMA20, which calculates gene-based test statistics from SNP-based test statistics, position-based gene annotations and a linkage disequilibrium reference panel of UK10K haplotypes using an adaptive permutation procedure. SNP-based test statistics were annotated using mapping files with a 50 kb symmetrical window around genes. For gene definition, we used all 19,151 protein-coding gene annotations from NCBI 37.3 and corrected for the number of genes and effective phenotypes tested, using an adjusted Bonferroni threshold of 1.7 × 10−6.

S-PrediXcan

We used the S-PrediXcan method21 as a summary-statistic-based implementation of PrediXcan to test for association between tissue-specific imputed gene expression levels and HC, implemented in the MetaXcan standalone software (v0.3.5). This approach first predicts the transcriptome level using publicly available transcriptome datasets. Then, it infers the association between gene and phenotype of interest, by using the SNP-based prediction of gene expression as weights (predicted from the previous step) and combines it with evidence for SNP association based on phenotype-specific GWAS summary statistics. We predicted gene expression levels for cerebellum (4778 genes; GTEx v6p; Supplementary Note 3), cortex (3177 genes; GTEx v6p; Supplementary Note 3), and whole blood (6669 genes; DGN; Supplementary Note 3) using an adjusted Bonferroni threshold of p < 2.3 × 10−6 across all tissues tested.

Estimation of heritability and genetic correlation

Linkage-disequilibrium score regression (LDSC)22 was carried out to estimate the joint contribution of genetic variants as tagged by common variants (SNP-h2) to phenotypic variation in HC. The method is based on GWAS summary statistics and exploits LD patterns in the genome and can distinguish confounding from polygenic influences22. To estimate LDSC-h2, genome-wide χ2-statistics are regressed on the extent of genetic variation tagged by each SNP (LD-score). The intercept of this regression minus one estimates the contribution of confounding bias to inflation in the mean χ2-statistic. LD score regression was performed with LDSC software (v1.0.0) and based on the set of well-imputed HapMap3 SNPs (~1,145,000 SNPs with MAF > 5% and high imputation quality such as an INFO score of 0.9 or higher) and a European reference panel of LD-scores. LD-score correlation analysis can be used to estimate the genetic correlation (r g ) between distinct samples by regressing the product of test statistics against the same LD-score23. Bivariate LD score correlation was performed with the LDHub platform67 v1.9.0 (Supplementary Note 5). We assessed the genetic correlation between HC scores and a series of 235 phenotypes (excluding UK Biobank) comprising anthropometric, cognitive, structural neuroimaging and other traits as described in Zheng et al.67, with an adjusted Bonferroni threshold of p < 1.4 × 10−4.

Stratified LD score regression

Stratified LD score regression68 is a method for partitioning heritability from GWAS summary statistics with respect to genes that are expressed in specific tissue/cell types. We applied this method to HC summary statistics to evaluate whether the heritability of HC is enriched for genes that are highly expressed in brain tissues. GTEx v6p (Supplementary Note 3) provided gene expression data from 13 brain tissue/cell types. Each of these tissue annotations was added to the baseline model and enrichment was calculated with respect to 53 functional categories. This is for each functional category the proportion of SNP-h2 divided by the proportion of SNPs in that category. We performed stratified LD score regression with independent data from the Roadmap Epigenomics consortium and ENCODE project (Supplementary Note 3), where we restricted the analysis to 55 chromatin marks identified in neural and bone tissue/cell types. Similar to the deriving enrichment in gene expression, each annotation was added to the baseline model. Chromatin analysis includes the union and the average of cell-type-specific annotations within each mark. In the joint gene expression and chromatin enrichment analysis, we applied a multiple testing of p < 4.8 × 10−4 accounting for 68 neural and bone tissues/cell types tested (data from GTEx v6p, ENCODE and Roadmap; Supplementary Note 3).

Functional annotation of novel signals

Functional consequences of novel variants were explored using two web-based tools: Brain xQTL28 and FUMA (v1.3.1)25. The threshold for multiple testing for eQTL was adjusted according to the number of genes near the studied novel signals and their proxy SNPs (r2 = 0.2 and ±500 kb). For Brain xQTL, we corrected for multiple testing based on a threshold of p < 7.4 × 10−4 to account for 68 genes tested. For FUMA eQTL analysis, a multiple testing threshold of p < 7.6 × 10−4 was applied to adjust for 42 genes and, 24 blood and brain tissues/cell types (Supplementary Note 3).

UK Biobank phenome scan

To characterize the phenotypic spectrum of identified HC signals, we conducted a phenome scan on 2143 phenotypes in the UK Biobank cohort69, using PHESANT29 software (v0.13). Analyses were restricted to participants of UK ancestry (UK Biobank specified variable). One from each pair of related individuals, individuals with high missingness, heterozygosity, gender mismatch and putative aneuploidies were excluded. Genotype dosage at lead single variants identified with GWAS was converted into best-guess genotypes using PLINK v1.90b3w. Linear, ordinal logistic, multinomial logistic and logistic regressions were fitted to test the association between genotype and continuous, ordered categorical, unordered categorical and binary outcomes respectively. Analyses were adjusted for age, sex and genotyping chip, and, for sensitivity analysis, 10 principal components. A conservative Bonferroni threshold was applied accounting for a total of 11,056 tests performed and two genotypes tested (p < 2.26 × 10−6).

HC growth curve modeling

Trajectories of untransformed HC (cm) spanning birth to 15 years were modeled in 6225 ALSPAC participants with up to 13 repeat measures (17,269 observations) using a mixed effect SITAR model18 (R sitar library v1.0.11). SITAR comprises a shape invariant mixed model with a single fitted curve, where individual curves are matched to the mean curve by modeling differences in mean HC, differences in timing of the pubertal growth spurt and differences in growth velocity18. Individuals with large measurement errors, i.e. with HC scores at younger ages exceeding scores at later ages (by more than 0.5 SD of the grand mean) as well as outliers (with residuals outside the 99.9% confidence interval) were excluded. The best fitting model was identified using likelihood ratio tests and the Bayesian Information Criterion and included four fixed effects for splines, a fixed effect for differences in mean HC and a fixed effect for sex, in addition to two random effects for differences in mean HC and growth velocity. Stratified models were fitted for carriers and noncarriers of increaser-alleles at candidate loci. To examine the relationship between genotype dosage and differences in HC and growth velocity, these random effects were regressed on genotype dosage using a linear model.