Phenotype selection and study design

Consistent with the developmental pattern of language acquisition, the analysis of children’s expressive vocabulary in infancy was divided between an early phase (15–18 months of age, Fig. 1) and a later phase (24–30 months of age, Fig. 2) and conducted using independent individuals of up to four population-based European studies with both quantitative expressive vocabulary scores and genotypes available (early phase: total N=8,889; later phase: total N=10,819).

Expressive vocabulary scores were measured with age-specific-defined word lists and either ascertained with adaptations of the MacArthur CDI13,14,15,16,17 or the LDS18 and based on parent-report. The CDIs were developed to assess the typical course and variability in communicative development in children of the normal population (8–30 months of age)13. The LDS was designed as a screening tool for the identification of language delay in 2-year-old children18. Both measures have sufficient internal consistency, test-retest reliability and validity18,52,53.

Expressive vocabulary during the early phase was captured by an abbreviated version of the MacArthur CDI (Infant Version13, 8–16 months of age, Supplementary Data 1) within the discovery cohort (ALSPAC, N=6,851, Supplementary Fig. 1a). Note, the Infant CDI has recently become also known as CDI Words and Gestures54. A Dutch adaptation of the short-form version of the MacArthur CDI (N-CDI 2A)14,16 was used within the follow-up cohort (GenR, N=2,038). Scores in both cohorts comprised both expressive and receptive language aspects (‘says and understands’) and showed a positively skewed data distribution (1.95<skewness≤2.39; Supplementary Data 1).

Vocabulary production during the later phase was measured with an abbreviated version of the MacArthur CDI (Toddler version, 16–30 months of age)13,15 in the discovery cohort (ALSPAC, N=6,299, Supplementary Fig. 1b). Note, the Toddler CDI has recently become also known as CDI Words and Sentences54. Within the follow-up cohorts, expressive vocabulary was either assessed with the LDS18 (GenR N=1,812; the Raine study N=981) or an adapted short form of the MacArthur CDI (MCDI)14,17 (Twins Early Development Study, TEDS, N=1,727, independent individuals (one twin per pair), N=5,733 twin pairs (not all of them have genotype information available)). Later-phase expressive vocabulary scores measured expressive language only (‘says’) and were either symmetrically distributed or negatively skewed (−1.68<skewness≤0.24; Supplementary Data 1).

In total, three different languages were included in our analyses: English (three samples: ALSPAC; TEDS; Raine), Dutch (one sample: GenR) and Finnish (sensitivity analysis: Northern Finnish Birth Cohort (NFBC) 1966). The cross-cultural comparability of the CDI has been explored, and the measures in many languages, including Dutch and English, show minimal differences in vocabulary production scores in the early years55. In addition, the standardization within each sample (see below) would have removed any minor differences between instruments.

Basic study characteristics, details on phenotype acquisition and psychological instruments as well as summary phenotype characteristics (including mean, s.d., kurtosis, skewness and age at measurement) are presented for each cohort and developmental phase in Supplementary Data 1.

For each participating study, ethical approval of the study was obtained by the local research ethics committee, and written informed consent was provided by all parents and legal guardians. Detailed information on sample-specific ethical approval and participant recruitment is provided in Supplementary Note 1.

Genotyping and imputation

Genotypes within each cohort were obtained using high-density SNP arrays (Supplementary Data 1). Cohort-specific genotyping information including genotyping platform, quality control (QC) for individuals and SNPs, the final sample size, the number of SNPs before and after imputation as well as the imputation procedures are detailed in Supplementary Data 1. Briefly, for individual sample QC, this included filtering according to call rate, heterozygosity and ethnic/other outliers, and for SNP QC (prior to imputation) filtering according to minor allele frequency, call rate and SNPs with deviations from Hardy–Weinberg equilibrium (detailed exclusion criteria are listed in Supplementary Data 1). Genotypes were subsequently imputed to HapMap CEU (phase II and/or III) and/or Wellcome Trust Controls (Supplementary Data 1). For sensitivity analysis, ALSPAC genotypes on chromosome 3 were also locally imputed to 1,000 Genomes (v3.20101123, Supplementary Data 1).

Single-variant association analysis

Within each cohort, expressive vocabulary scores were adjusted for age, age-squared, sex and the most significant ancestry-informative principal components56 and subsequently rank-transformed to normality to facilitate comparison of the data across studies and instruments. The association between SNP and the expressive vocabulary score was assessed within each cohort using linear regression of the rank-transformed expressive vocabulary score against allele dosage, assuming an additive genetic model.

In the discovery cohort, the genome-wide association analysis for each phase was carried out using MACH2QTL57 using 2,449,665 imputed or genotyped SNPs. SNPs with a minor allele frequency of <0.01 and SNPs with poor imputation accuracy (MACH R2≤0.3) were excluded prior to the analysis, and all statistics were subjected to genomic control correction58 (Supplementary Data 1). All independent SNPs from the early- and later-phase GWAS below the threshold of P<10−4 (85 and 50 SNPs, respectively) were selected for subsequent follow-up analysis in additional cohorts. Independent SNPs were identified by linkage disequilibrium-based clumping using PLINK59) Proxy SNPs within ±500 kb, linkage disequilibrium r2>0.3 (Hapmap II CEU, Rel 22) were removed). All analyses within the follow-up samples were carried out in silico using MACH2QTL or SNPTEST60 software (Supplementary Data 1). For the selected SNPs, estimates from the discovery (genomic-control corrected) and follow-up cohort(s) were combined using fixed-effects inverse-variance meta-analysis (R ‘rmeta’ package), while testing for overall heterogeneity using Cochran’s Q-test. Signals below a genome-wide significance threshold of P<2.5 × 10−8 (accounting for two GWAS analyses) were considered to represent robust evidence for association.

An empirical approach (Bootstrapping with 10,000 replicates) was selected to obtain meaningful genetic effects (basic 95% bootstrap confidence interval) of the reported SNPs in the discovery cohort. For this, we utlilized a linear model of z-standardized expressive vocabulary scores against allele dosage, adjusted for age, age-squared, sex and the most significant ancestry-informative principal components. The local departmental server of the School of Social and Community Medicine at the University of Bristol was used for data exchange and storage.

Sensitivity analysis in ALSPAC using locally imputed genotypes on chromosome 3 (based on 1,000 Genomes) was performed as linear regression of the rank-transformed expressive vocabulary score against allele dosage, assuming an additive genetic model, using MACH2QTL (Supplementary Data 1).

Direct genotyping of reported SNPs

Reported SNPs with a medium imputation accuracy (MACH R2<0.8) were re-genotyped in the discovery cohort (ALSPAC) to confirm the validity of the observed association signal (rs10734234, MACH R2=0.76). Genotyping was undertaken by LGC Genomic Ltd ( http://www.lgcgenomics.com/) using a form of competitive allele-specific PCR system (KASPar) for SNP analysis.

Variance explained

To estimate the variation in expressive vocabulary scores explained by each reported SNP and jointly by all reported SNPs together, we calculated the adjusted regression R2 values from (i) univariate linear regression of the rank-transformed expressive vocabulary score (see above) against allele dosage and (ii) multivariate linear regression of the rank-transformed expressive vocabulary score (see above) against the allele dosage from all reported SNPs. All analyses were performed using R, SPSS or STATA software.

Phenotypic characterization of association signals

To investigate whether there is an association between the first single-word utterances at ~12 months of age and the reported SNPs, we conducted an association analysis in the NFBC 1966. The number of spoken words in the NFBC 1966 (word-list free assessment, ‘words’ are undefined) were based on parental response to a questionnaire administered at 12 months of age (Supplementary Data 1). Given the scarcity of categories referring to three or more spoken words, word numbers were dichotomized into ‘1+ words’ (one or more words, 1) versus ‘no words’ (0). The association between early word-production scores and allele dosage of the reported SNPs was studied using logistic regression models, adjusted for sex and the most significant principal components (as exact age at measurement was not available) using SNPTEST.

Pre-school language deficits have been repeatedly associated with later problems in language development, especially reading skills61. To assess whether genetic effects affecting expressive language skills early in life also influence language competencies during later development, we investigated the association between reported SNP signals and a series of language-related cognitive measurements in the ALSPAC cohort (Supplementary Table 8). All outcomes were z-standardized prior to analysis. The association between the transformed outcome and SNP allele dosage was investigated using linear regression adjusted for sex, the most significant principal components and age (except for age-normalized intelligence quotient scores, Supplementary Table 9).

To assess whether gestational age and maternal education influence the association between the reported signals and early expressive vocabulary scores, we (i) investigated the association between these potential covariates and the SNPs directly and (ii) adjusted the association between genotypes and language measures for potential covariate effects. Gestational age in the relevant cohorts was either estimated from medical records or obtained from midwife and hospital registries at birth (Supplementary Data 1), and measured in completed weeks of gestation. Information on maternal education was obtained from antenatal questionnaire data, and dichotomized into lower (1) and higher (0) maternal education (Supplementary Data 1). The association between gestational age and allele dosage for reported SNPs was investigated with linear regression models and adjusted for sex and the most significant principal components in each cohort. The link between maternal education and these SNPs was studied using logistic regression models adjusted for the most significant principal components in each cohort.

We furthermore created new transformations of expressive vocabulary scores, that is, the reported number of words were in addition to the previously described variables (see above) adjusted for gestational age and maternal education, respectively, before they were rank-transformed. Association analysis for reported SNPs was then carried out as described for discovery, follow-up and combined analysis before. All analyses were carried out using R, SPSS or STATA software.

GCTA

The proportion of additive phenotypic variation jointly explained by all genome-wide SNPs together (GCTA heritability) was estimated for all cohorts and analyses windows using GCTA32. In brief, using a sample of independent individuals, the method is based on the comparison of a matrix of pairwise genomic similarity with a matrix of pairwise phenotypic similarity using a random-effects mixed linear model32. Pertinent to this study, GCTA (Supplementary Data 1) was carried out using rank-transformed expressive vocabulary scores (previously adjusted for age, sex and the most significant ancestry-informative principal components in each cohort, see above) and directly genotyped SNPs (ALSPAC, GenR, Raine) or most likely imputed genotypes (TEDS). GCTA estimates from different cohorts were combined using fixed-effects inverse-variance meta-analysis assuming symmetrically distributed s.e., while testing for overall heterogeneity using Cochran’s Q-test.

The extent to which the same genes contribute to the observed phenotypic correlation between two variables can be furthermore estimated through genetic correlations62. For all cohorts with expressive vocabulary measures at two time points (ALSPAC and GenR), the genetic correlation (r g ) between the rank-transformed scores was estimated using bivariate GCTA analysis33 (based on the genetic covariance between two traits).

Twin analysis

Twin analyses allow the estimation of the relative contributions of genes and environments to individual differences in measured traits. Twin intraclass correlations were calculated63, providing an initial indication of the relative contributions of additive genetic (A), shared environmental (C) and non-shared environmental (E) factors. Additive genetic influence, also commonly known as heritability, is estimated as twice the difference between the identical and fraternal twin correlations. The contribution of the shared environment, which makes members of a family similar, is estimated as the difference between the identical twin correlation and heritability. Non-shared environments, that is, environments specific to individuals, are estimated by the difference between the identical twin correlation and 1, because they are the only source of variance making identical twins different. Estimates of the non-shared environment also include measurement error.

Maximum likelihood structural equation model-fitting analyses allow more complex analyses and formal tests of significance64. Standard twin model-fitting analyses were conducted using Mx65. The model fit is summarized by minus two times the log likelihood (−2LL). Differences in −2LL between models distributes as χ2, which provides a goodness of fit statistic. A change in χ2 of 3.84 is significant for a 1 degree of freedom test. Model fit was compared between the full ACE model and the saturated model (where variances are not decomposed into genetic and environmental sources). Reduced models testing CE, AE and E models were compared with the full ACE model and the saturated model. A significant P value indicates a significantly worse fit.

Twin analysis was carried out on rank-transformed expressive vocabulary scores at 24 months (adjusted for age, age-squared and sex), which were assessed in 5,733 twin pairs (monozygotic twins N=1,969; dizygotic twins (male, female and opposite sex) N=3,764) from the TEDS66.

The URLs for all utilized web pages are given in Supplementary Note 2.