Participants

The Twin Early Development Study (TEDS) sample was used. TEDS is a longitudinal study that has recruited over 16,000 twin pairs born in England and Wales between 1994–1996. Currently, more than 10,000 twin pairs remain actively involved in the study. The sample is a representative sample of the UK when compared to the UK National Statistics. Rich behavioural and cognitive data have been collected over many years from the participants, including measures of academic achievement. The twins have now completed compulsory education and are moving on to further education or the workforce13,30.

We included all twins with available GCSE achievement measures in the analyses. Participants with major medical or psychiatric conditions were removed from the analyses, as were those individuals with severe perinatal complications. Additionally, we excluded all twins who did not have English as their first language. Zygosity was assessed by the parent-questionnaire of physical similarity. This measure has previously been shown to be highly reliable (95%)31. DNA testing was conducted where zygosity was unclear from this questionnaire. The present study employed all individuals with available GCSE grades comprising an overall sample of 6,316 twin pairs (12,632 individuals): 2,245 monozygotic (MZ) twin pairs, 2,069 dizygotic (DZ) same-sex twin pairs and 2,002 opposite-sex twin pairs.

DNA has been genotyped for subsample of TEDS twins (one twin per pair), see Genotyping. Genome-wide genotypes were available for 3,152 individuals, which were matched to those participants with available grades for GCSE mathematics, GCSE English and GCSE science. This led to a sample of 2,572 unrelated individuals with GCSE mathematics grades, 2,601 individuals with GCSE English grades and 2,381 individuals with GCSE science grades; this sample was used to conduct the GCTA analyses. As intelligence scores were not available for all individuals, after correcting the GCSE grades for intelligence, the sample comprised 2,526, 2,561 and 2,345 individuals respectively.

Measures

General Certificate of Secondary Education (GCSE) grades were used. GCSE is a standardized examination taken at the end of compulsory education in the UK. Children typically start GCSE courses at the age of 14 and the exams are taken at the age of 16. Students can choose from a variety of different courses such as mathematics, science, history, music, physical education and modern foreign languages. English, mathematics and science are compulsory subjects. Some schools also require students to take one GCSE in a second language. Importantly, the subjects that students choose and their performance in the GCSE exams have profound impact on their further education and employment. The exams are graded from A* to G, with a U grade given for failed exams. Grades were coded from 11(A*) to 4(G) to have equivalent numerical comparisons. No information about the failed courses was available. Most pupils receive 5 or more grades between A* and C, which is the requirement for further education in the UK. GCSE grades were collected from parents or the twins themselves via questionnaires sent by mail or over the telephone. For 7,367 twins the grades were verified using the National Pupil Database (NPD; https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/251184/SFR40_2013_FINALv2.pdf), yielding a correlation of 0.99 for mathematics, 0.98 for English and >0.95 for all the sciences.

We created composite measures for English (mean of English language and English literature grades), science (mean of single or double-weighted science, or, when taken separately chemistry, physics and biology grade), second language achievement (mean of any second language grade available), humanities (mean of history, religious studies, geography and media studies), business informatics (mean of statistics, business studies and ICT - information and communication technology) and art (mean of art, drama and music grades). Additionally, we used the GCSE grade for mathematics.

Intelligence was assessed at age 16 using verbal and non-verbal abilities administered online. Verbal ability was measured by Mill Hill Vocabulary test, a multiple-choice test32. Twins were presented with the target word on a computer screen and they had to choose a word that was closest in meaning to the target words. Non-verbal ability was measured by Ravens Progressive Matrices33, collected from the twins via internet at the age of 16. Twins were presented with incomplete patterns (‘matrices’) and were asked to identify the missing part to complete the pattern. Intelligence (general cognitive ability) was indexed by taking the mean of verbal and non-verbal abilities.

Because not all the TEDS birth cohorts participated in the intelligence assessment at 16 and not all of those had been genotyped, the sample with all relevant measures would not have been sufficient for the GCTA analysis of grades corrected for intelligence. For that reason, we chose to construct a composite measure of ‘g‘ using all earlier measures from the longitudinal study. A composite was used to assess intelligence in order to control for intelligence in GCSE mathematics, GCSE English and GCSE science: a robust measure of ‘g’ derived from intelligence data collected longitudinally across nine ages from early childhood to age 16. At age 2, mean ‘g’ measure was calculated as a mean of a parent-administered design drawing task34, a matching task (match to design)35, a brick building task, a folding task and a copy task36,37,38,39,40; at age 3, mean ‘g’ was calculated as a mean of a parent-administered oddity task (odd-one-out)35, a design drawing task34 , a matching task35 and a parent-reported conceptual knowledge task36,37,39; at age 4, ‘g’ was calculated as a mean of parent- administered oddity task (an odd one out task)35, a design drawing task, a draw a man task34 and a puzzle task33,38 ; at age 7, ‘g’ was calculated as a mean of conceptual grouping34 , a WISC similarities test41, a WISC vocabulary test41 and a WISC picture completion test41 all collected over telephone testing; age 9, ‘g’ was calculated as a mean of a shapes test42, a WISC vocabulary test43, a WISC general knowledge task43 and a puzzle test42 all collected by the booklets sent to the twins by post; age 10, ‘g’ measure was calculated as a mean of the Ravens standard Progressive Matrices33, a WISC vocabulary43, WISC picture completion41 and a WISC general knowledge test43 all collected via internet testing; at age 12, ‘g’ was calculated as a mean of the Ravens Progressive Matrices33, a WISC picture completion41, a WISC vocabulary43 and a WISC general knowledge test43 all collected via internet testing; at age 14, ‘g’ was computed as a mean of the Raven’s Progressive Matrices33 and a WISC vocabulary43; and age 16, ‘g’ was measured as described above. The mean score of intelligence was calculated across the nine ages.

Prior to any genetic analyses all measures were corrected for small age and sex differences (see Table 1), using the regression method, which is a standard practice in twin analyses. Standardized residuals of the variables were used in all further analyses. This method avoids overestimation of shared environmental influences, as twins are identical for age and MZ twins are also identical for sex44. All outliers beyond three standard deviations from the mean were also removed from the analyses. The GCSE grades also showed negative skew, indicating a ceiling effect. Similar ceiling effect is observed in the UK population as demonstrated in the data from the National Statistics (NPD; https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/251184/SFR40_2013_FINALv2.pdf). To correct for the ceiling effect, all measures were transformed to the standard normal distribution using the rank-based van der Waerden transformation45,46.

Analyses

The measures were described in terms of means and variance, comparing boys and girls and identical and non-identical twins; mean differences for age and sex and their interaction were tested using univariate analysis of variance (ANOVA).

Twin Design

The twin method was used to conduct univariate and multivariate genetic analyses. The twin method can be used to estimate the relative contribution of additive genetic (A), shared environmental (C) and non-shared environmental (E) effects on the variance and covariance of academic achievement measures and intelligence, by comparing monozygotic (MZ) correlations to dizygotic (DZ) correlations. MZ twins share 100% of their segregating genes, while DZ twins share around 50% of the segregating genes, just like any other siblings. Both MZ and DZ twin pairs are assumed to share 100% of their shared environmental influences, when growing up in the same family. Non-shared environmental influences are assumed to be unique to individuals, that is, uncorrelated between twins and not contributing to similarities between them. Heritability can be roughly calculated by doubling the difference between MZ and DZ correlations; shared environmental influences can be calculated by deducting the heritability estimate from the MZ correlations; and non-shared environmental influences can be calculated by deducting the MZ correlation from unity (following the Falconer’s formula)28. These parameters can be estimated more accurately, including calculating the confidence intervals, using structural equation modeling with maximum likelihood estimation, which also provides estimates for the model fit. Structural equation modeling program OpenMx was used for all model fitting analyses47.

Multivariate genetic analysis is the extension of univariate genetic analysis. While univariate twin analysis investigates the variance of one trait, multivariate genetic analysis investigates the genetic and environmental nature of covariance between multiple traits. Multivariate genetic analyses is a method that compares the MZ and DZ cross-twin cross-trait correlations to decompose the covariance between two or more traits of interest into additive genetic (A), shared environmental (C) and non-shared environmental (C) components3,28. As shown in Supplementary Fig. S1 for the correlated factor solution, the genetic correlation (r G ) assesses the extent to which the same genes influence two traits; the shared environmental correlation (r C ) indicates the extent to which the same shared environmental influences that make twins more similar on trait one, also make the twins more similar on trait two; and non-shared environmental correlations (r E ), indicating the extent to which the same non-shared factors influence two traits. Importantly, genetic correlation is different from bivariate heritability estimate, as it does not take into account the heritability of two traits, which means that trait one and trait two can have low heritabilities, but the genetic correlation could be high, implying that if a gene were found for one trait, there would be a good chance that this gene would also be associated with trait two3. Alternative representation of the same analyses is the Cholesky decomposition (see Supplementary Fig. S1). The central question of Cholesky decomposition is the extent to which the heritability of trait one can be accounted for by the heritability of the other trait, thus answering the question ‘to what extent does the heritability of one variable explain the heritability of the other variables’. In the multivariate model, when studying how much variance in trait three is accounted for by trait two, the model controls for the variance of trait one. The Cholesky decomposition is conceptually similar to hierarchical regression, therefore, the order of the variables entered influences the results. Each variable in the model controls for the variance in the previous variable, as illustrated in Supplementary Fig. S1.

Genome-Wide Complex Trait Analysis (GCTA)

GCTA is a technique that estimates genetic and residual components of variance directly from the DNA of unrelated individuals, unlike twin analysis that relies on family resemblance data3. In order to create a sample of unrelated individuals, we randomly selected one twin per pair for GCTA analyses. Studies to date have shown that the heritability of complex behaviours, such as academic achievement, is highly polygenic, influenced by a large number of genes, each having only a small effect25. This explains the relatively slow progress in identifying the specific genes involved in educational achievement, as well as most other traits in the life sciences. GCTA can estimate heritability directly from DNA while not identifying the specific genes involved. The GCTA method uses hundreds of thousands of SNPs (single nucleotide polymorphisms) from thousands of individuals to calculate the proportion of phenotypic variance due to the additive effects of common SNPs3. First, the genetic relatedness matrix (GRM) is calculated by weighing the pairwise genetic similarities with the allele frequencies across all SNPs on the array. All participants who are found to be even remotely related (genetic relatedness 0.025 or greater) are removed from the analyses, as this would otherwise bias the results48. The matrix of pair-by-pair genetic similarity of all the participants is compared to the matrix of their phenotypic similarity using residual maximum likelihood estimation, without testing the association of any single SNP individually25,49,50. The advantage of this method is that heritability estimates can be calculated using a sample of unrelated individuals; the disadvantage is that very large pools of participants are needed to detect overall genetic similarity from the matrix of hundreds of thousands of SNPs. Notably, the heritability estimate calculated using the GCTA method only assesses additive genetic effects, not gene-gene or gene-environment interactions and only common SNPs are analysed50. Univariate GCTA analyses can be extended to bivariate analyses by comparing the phenotypic covariance matrix to the GRM51,52. Prior to the GCTA analyses we adjusted the GCSE English, GCSE mathematics and GCSE science grades for age and sex, using the regression method. Additionally, to control for population stratification, the principal component analysis was conducted for 100,000 quality-controlled SNPs and eight axes were identified with p < 0.05 using the Tracy Wisdom test; these eight principal components were added as covariates in the bivariate GCTA analyses53.

Power was calculated using an online tool for calculating power for GCTA heritability and genetic correlation in both univariate and bivariate GCTA analyses (http://spark.rstudio.com/ctgg/gctaPower/)54.

Genotyping

The analysis is based on the genotypic data generated for 3,665 TEDS unrelated individuals by Wellcome Trust Sanger Institute, Hinxton, UK as part of the Wellcome Trust Case Consortium. Briefly, DNA was collected from 3,665 individuals using buccal swabs, which was thereafter genotyped using AffyrmetrixGeneChip 6.0 genotyping array. This yielded to genome-wide genotype calls for all individuals for around 600,000 SNPs (See Trzaskowski et al., 2013 for full details)55. The data were thereafter imputed to 1000 genomes reference data using Impute 2 software and the standard quality control was applied. This left over 5.2 million SNPs available for molecular and GCTA analyses.

All analyses were carried out in accordance with the approved guidelines.

Ethical approval was received from King’s College London Ethics Committee: PNM/09/10-104 Twin Early Development Study; and informed consent was obtained from all subjects.