Participants

The present study used the TEDS sample. TEDS is a large twin study that recruited over 16,000 twin pairs born between 1994 and 1996 in England and Wales. More than 10,000 twin pairs are still actively involved in the study. Rich cognitive and behavioral data, including educational achievement, have been collected from the twins, their parents and teachers, over compulsory education and beyond. Importantly, TEDS was a representative sample of the UK population at first contact, and remains representative in terms of family socioeconomic status and ethinicity.26,52 Ethical approval for this study was received from King’s College London Ethics Committee.

The sample for the present study included all twins with available academic achievement measures over the school years. Participants who had major medical or psychiatric conditions, or those with severe perinatal complications, were removed from the analyses. Zygosity was assessed by the parent-reported questionnaire of physical similarity. This measure has been shown to be highly reliable.53 Nevertheless, DNA testing was conducted when zygosity was unclear from the questionnaire. The sample size per academic achievement measure is shown in Supplementary Table S1.

DNA has been genotyped for a subsample of unrelated individuals from TEDS (one twin per pair). We processed genotypes for 6710 individuals using the standard quality control procedure followed by imputation of genetic variants to the Haplotype Reference Consortium54 (see Supplementary Methods). We then matched the individuals with genotyped data to those participants with available academic achievement data.

Measures

Measures of educational achievement obtained by TEDS

TEDS has obtained assessments of academic achievement directly from the twins’ teachers who reported grades following the UK National Curriculum guidelines, a standardized core academic curriculum formulated by the National Foundation for Educational Research (NFER) and the Qualifications and Curriculum Authority (QCA) (NFER: http://www.nfer.ac.uk/index.cfm; QCA: http://www.qca.org.uk). Data were obtained directly from teachers. At age 7 data are available for English and Mathematics; at ages 12 and 14 data are available for English, Mathematics, and Science. The teacher rating of English used a combined rating of students’ reading, writing, and speaking and listening; Mathematics used a combined score of knowledge in numbers, shapes, space, using and applying mathematics, and measures; and Science used a score combining life process, scientific enquiry, and physical process. These teacher ratings were found to be highly reliable when compared to the achievement measures collected by the UK National Pupil Database (NPD), as described later.

GCSE exam results were obtained from twins themselves or from their parents via questionnaires sent over mail or via telephone. GCSEs are UK-wide standardized examinations taken at age of 16 at the end of compulsory education. Children choose from a variety of different subjects, while English, Mathematics, and Science are compulsory. We used exam grades from English, Mathematics, and Science for the current analyses. Composite measures were created for English (mean of English language and English literature grades), Science (mean of single or double-weighted Science or, when taken separately Chemistry, Physics, and Biology grade), and Mathematics.

Measures of educational achievement obtained from the NPD

The TEDS dataset was linked to the NPD for every participant for whom we received written informed consent from either the twin or the parent. NPD is a rich UK database collecting data about students’ academic achievement across the school years (https://www.gov.uk/government/collections/national-pupil-database). Data are available for each Key Stage (KS) completed in the UK following the National Curriculum (NC). Teachers provide NC ratings for every student at the end of each KS (similarly to data collected at TEDS for the NC ratings in English, Mathematics, and Science). Exam scores as well as teacher ratings are available from KS1–KS3; and exam scores only are available for KS4 and KS5. Children’s ages for KS1, KS2, and KS3 are about 7, 11, and 14 years. KS4 marks the end of compulsory education with GCSE testing at about age 16. Sample size and descriptive characteristics for each measure are provided in Supplementary Table S1.

Composite scores of educational achievement

Composite scores were calculated at each KS combining the teacher ratings (both TEDS and NPD) with the exam scores for English, Mathematics, and Science separately by taking a mean of the three scores. The average correlation between NPD and TEDS teacher ratings was 0.70 (see Supplementary Table S5), and the average correlation between teacher ratings and exam scores was 0.80 (see Supplementary Table S6). For GCSE performance at the end of compulsory education, GCSE grades collected by TEDS and by NPD correlated 0.98 for English, 0.99 for Mathematics, and >0.95 for all Sciences. A mean score for NPD and TEDS was created to increase the sample size; when fewer measures were available we used any available data to calculate the composite score of educational achievement.

The overall achievement measure (core achievement) was calculated at each KS by taking a mean of English NC teacher ratings, Mathematics NC teacher rating (for both NPD and TEDS), English exam score, and Mathematics exam score. We did not include Science grades in overall achievement scores to make a more direct comparison across age because Science is not part of the National Curriculum at KS1.

Measures of general cognitive ability (g)

General cognitive ability (g; intelligence) was assessed in TEDS at ages 7, 9, 10, 12, 14, and 16. For the present analyses we created a longitudinal composite measure of g as a mean of these six assessments. See Supplementary Methods for a more detailed description of g measures.

Analyses

Phenotypic analyses

The measures were described in terms of means and variance, comparing males and females and identical and non-identical twins; mean differences for age and sex and their interaction were tested using univariate ANOVA. Phenotypic correlations were calculated between academic achievement measures across development. The academic achievement measures were corrected for the small mean effects of age and sex (Supplementary Table S1) by rescoring the variable as a standardized residual correcting for age and sex, because in the analysis of twin data members of a twin pair are identical in age and MZ twins are identical for sex, and this would otherwise inflate twin estimates of shared environment.55 Full sex limitation genetic modeling has previously been reported for academic achievement and found only very minor sex differences in genetic and environmental estimates.6,9,12 For these reasons, and to increase power in the present analyses, the full sample was used, combining males and females and including opposite-sex pairs.

Finally, before conducting twin analyses, the achievement measures were corrected for skew because they were slightly negatively skewed. The achievement measures were corrected for skew by mapping it on to a standard normal distribution using the rank-based van der Waerden’s transformation.56

Twin design

The twin design was used for univariate and multivariate genetic analyses. The twin method offers a natural experiment capitalizing on the known genetic relatedness of MZ and DZ twin pairs. MZ twins are genetically identical and share 100% of their genes, while DZ twins share on average 50% of their segregating genes. Both MZ and DZ twins are assumed to share 100% of their shared environmental influences growing up in the same family. Non-shared environmental influences are unique to individuals, not contributing to similarity between twins. Using these known family relatedness coefficients, it is possible to estimate the relative contribution of additive genetic (A), shared environmental (C), and non-shared environmental (E) effects on the variance and covariance of the phenotypes, by comparing MZ correlations to DZ correlations. Heritability can be roughly calculated by doubling the difference between MZ and DZ correlations, C can be calculated by deducting heritability from MZ correlation and E can be estimated by deducting MZ correlation from unity (following Falconer’s formula).47 These parameters can be estimated more accurately using structural equation modeling, which also provides 95% confidence intervals and estimates of model fit. The structural equation modeling program OpenMx was used for all model-fitting analyses.57

These univariate analyses can be extended to multivariate analyses to study the etiology of covariance between multiple traits. Multivariate genetic method decomposes the covariance between traits into additive genetic (A), shared environmental (C), and non-shared environmental (E) components by comparing the cross-trait cross-twin correlations between MZ and DZ twin pairs. This method also enables estimation of the genetic correlation (rG), which is an index of pleiotropy, indicating the extent to which the same genetic variants influence two traits or measures of the same trait at two times. The shared environmental correlation (rC) and non-shared environmental correlation (rE) are estimated in a similar manner.43,47

We used two longitudinal models to study the issue of age-to-age stability of educational achievement.

The simplex model is a multivariate genetic model that estimates the extent to which the genetic and environmental influences on a trait are transmitted from age to age, and the extent to which innovative and age-specific influences emerge.58 The covariance or correlation matrix for such data is called simplex because the strength of the associations tends to correspond to differences between ages, that is, they are often highest along the diagonal and fall systematically as the difference between ages increases.58 The simplex model is illustrated in Supplementary Figure S2.

The common pathway model is a multivariate genetic model in which the variance common to all measures included in the analysis can be reduced to a common latent factor, for which the A, C, and E components are estimated. As well as estimating the etiology of the common latent factor, the model allows for the estimation of the A, C, and E components of the residual variance in each measure that is not captured by the latent construct.59 The common pathway model estimates the extent to which the stable variance in educational achievement across compulsory education (the latent factor of achievement) is explained by A, C, and E. The common pathway model is illustrated in Supplementary Figure S6.

SNP heritability

The genome-wide complex trait analysis (GCTA) software package enables estimates of the proportion of phenotypic variance or covariance that is explained by all SNPs that are available on genotype arrays, without testing the association of any single SNP individually.17,49,60 This estimate is often called SNP heritability. This method does not use known genetic relatedness coefficients but estimates heritability from DNA using only unrelated individuals. SNP heritability is calculated using restricted maximum likelihood and the variance and covariance is decomposed using mixed linear models.

First, the genetic relatedness matrix is calculated by weighting genetic similarities between all possible pairs of individuals with the allele frequencies across all SNPs on the DNA array. Individuals who are found to be even remotely related (greater than fifth cousins) are removed from the analyses as they would otherwise bias the results, which rely on chance genetic similarity between pairs of individuals.17,18,61 The matrix of pair-by-pair genetic similarity is compared to the matrix of pair-by-pair phenotypic similarity using the residual maximum likelihood estimation. SNP heritabilities were calculated for overall achievement across compulsory education, as well as for specific subjects.

Genome-wide polygenic scores

GPSs aggregate the effects of individual SNPs shown to be associated with the trait in a GWA study.62 GPSs were calculated for 6710 participants using summary statistics from Okbay et al.28 GWA analysis of years of education (EduYears).28 Of the 293,723 participants in the EduYears GWA discovery sample, the summary statistics excluded 23andMe participants, for legal reasons. Polygenic scores were constructed as the weighted sums of each individual’s genotype across all SNPs using the LDpred method63 (see Supplementary Methods for details). Delta R2 is reported as the estimate of variance explained by the GPS. These delta R2 estimates were obtained by comparing the incremental increase in the model R2 after adding the GPS to the regression model, and comparing this to the model that included 10 principal components in order to control for population stratification. See Supplementary Methods for genetic quality control and further information about GPS calculation.

We correlated EduYears with general educational achievement composites, as well as with performance in specific subjects at each age to estimate EduYears GPS heritability. Delta R2 are reported as the estimates of variance explained by adding the GPS to the regression model that included the academic achievement from all earlier ages to assess the extent to which EduYears contributes to age-to-age stability.