Twins Early Development Study

TEDS recruited over 15,000 families of twins born in England and Wales30 in 1994, 1995 and 1996 and the sample remains representative of the UK population2 (Supplementary Note 2). Ethical approval for TEDS has been provided by the Institute of Psychiatry ethics committee, reference number 05/Q0706/228. We excluded from the analyses children with severe current medical problems and children who had suffered severe problems at birth or whose mothers had suffered severe problems during pregnancy. We also excluded twins whose zygosity was unknown or uncertain, whose first language was other than English, and included only twins whose parents reported their ethnicity as ‘white’, which is 93% of this UK sample.

At age 12, the TEDS twins participated in web- and telephone-based testing, as described previously31. Four measures of reading ability were used: two measures of reading comprehension and a measure of reading fluency presented on the web, and a fourth measure (TOWRE) administered over the telephone. Mathematics ability was assessed using a web-based battery of tests that included questions from three components of mathematics, based on the UK national curriculum. Both the reading and mathematics phenotypes comprised an equally weighted combination of the quantile-normalized scales. For each phenotype, we regressed out the effect of age before further analyses. Further details are provided in Supplementary Note 2.

Phenotypic measurements were available for 2,243 (reading) and 2,772 (maths) of these, with 2,794 samples having at least one measurement and 2,221 samples having both. The sample genotyped on the Immunochip (N=2,432) included N=2,153 individuals with at least one phenotypic measurement, of which N=1,388 were DZ co-twins of individuals in the discovery sample. For analyses, taking into account family structure, we additionally included untyped MZ and DZ co-twins of individuals typed in the discovery or Immunochip phases (N=1,737) for a combined sample of N=7,323. For twin analyses, to parallel the population-based variance estimates as closely as possible, we used a sub-sample of the full TEDS cohort: the 2,794 informative samples with genome-wide data plus their co-twins, for a sample size of N=2,794 twin pairs.

Avon Longitudinal Study of Parents and Children

ALSPAC recruited more than 14,000 pregnant women in the former Avon area of the UK (around Bristol and Bath), with estimated dates of delivery between April 1991 and December 1992 (ref. 32). Ethical approval for the study was obtained from the ALSPAC Ethics and Law Committee and the Local Research Ethics Committees. The ALSPAC study website contains details of all the data that are available through a fully searchable data dictionary ( http://www.bris.ac.uk/alspac/researchers/data-access/data-dictionary/). The sample used for replication here is a population-representative group of participants who were tested for word-reading efficiency (TOWRE) at the age of 12.5 years. After combining TOWRE scores across two subtests (see Supplementary Note 2) a total of N=2,140 samples were available for analysis.

GWAS genotyping and imputation

Samples were genotyped at the Affymetrix’s service laboratory on the Genome-Wide Human SNP Array 6.0. For all samples passing Affymetrix’s laboratory quality control, raw intensities (from the.CEL files) were renormalized within collections using CelQuantileNorm ( http://sourceforge.net/projects/outmodedbonsai/files/CelQuantileNorm/). These normalized intensities were used to call genotypes with an updated version of the Chiamo software33 adapted for Affymetrix 6.0 SNP data.

As is the standard practice for GWAS studies, we excluded sets of individuals whose genome-wide patterns of diversity are outliers compared with the majority of those in the study34, and we excluded SNPs for which there is evidence that genotype calls do not provide precise estimates of genotype frequencies. Details of the quality control methods used are published elsewhere34,35. In total, 465 of 3,665 samples were excluded from the analyses by these criteria (see Supplementary Table 5). Genotypes and phenotypes from the discovery sample will be made available through the European Genome-Phenome Archive ( https://www.ebi.ac.uk/ega/).

To assess relatedness among study individuals, we compared each individual with the 100 individuals they were most closely related to (on the basis of genome-wide levels of allele sharing) and used a hidden Markov model (HMM) to decide, at each position in their genome, whether the two individuals shared 0, 1 or 2 chromosomes identical by descent. We obtained a set of ‘unrelated’ individuals with identity by descent <5% by iteratively removing the member of each pair of putatively related individuals with more missing genotypes. A total of 3,154 individuals were included in subsequent analyses.

In addition to standard SNP filters (Supplementary Table 6), we considered a measure of the statistical information (the IMPUTE info measure) carried by the genotype calls for the underlying allele frequency36. SNPs were removed prior to imputation if this information measure was below 0.98 or if the estimated minor allele frequency was below 1%. In total 84,029 (9%) of SNPs were removed by these criteria.

We imputed additional genotypes from a combined reference panel of the 120 CEU trios in HapMap2 and HapMap3 and the common control group of WTCCC2. As an additional quality control step, prior to imputation we re-imputed each typed SNP using IMPUTE version 1 and removed any SNP where the concordance between typed and imputed genotypes was <0.965. We used this high-confidence subset of 736,939 SNPs from the array to impute additional genotypes using IMPUTE2 (refs 36, 37).

IMPUTE2 adopts a two-stage approach using both a haploid reference panel and a diploid reference panel. For the haploid reference panel, we used HapMap2 and HapMap3 SNP data on the 120 unrelated CEU trios; and for the diploid reference, we used a merged set of genotype calls from Affymetrix 6.0 and Illumina 1.2 M genotyping chip typed on 5000 1958 Birth Cohort (58C) and National Blood Service (NBS) individuals forming the common control group of WTCCC2. For association testing we included SNPs with info measure of at least 0.98 (if imputed from HapMap) or 0.9 (if imputed from the WTCCC2 controls) and having an estimated minor allele frequency of at least 1%.

Immunochip genotyping

Replication samples were typed on the Illumina ‘Immunochip’, a custom chip designed by the Immunochip Consortium and WTCCC2, at the Wellcome Trust Sanger Institute. Bead intensity data were processed and normalized for each sample in BeadStudio. Data for successfully genotyped samples were extracted and genotypes were called using the Illuminus algorithm38. Samples and SNPs were subject to similar quality control procedures as described above.

ALSPAC genotyping

We obtained data for 194 SNPs from the region 48891732-49091732 (NCBI build 36) on chromosome 19 for ALSPAC participants by application to the ALSPAC executive. Details on ALSPAC genotyping and imputation are described elsewhere39. Samples were included in the analysis if they had attended the TOWRE test session and completed both parts of the test. N=63 individuals who were recorded as having scored zero on either part of the test were removed, leaving a total of N=2,077 individuals for analysis.

Genome-wide association analysis

After quality control, 1,588,650 SNPs were analyzed in the GWAS using SNPTEST ( https://mathgen.stats.ox.ac.uk/genetics_software/snptest/snptest.html), fitting an additive linear model to the data, with sex as a covariate. We used the missing data likelihood score test implemented in SNPTEST to compute P-values, and refer to these as P GWAS above. SNPs with a P-value<5 × 10−5 were further analyzed in a sample combining the GWAS and Immunochip participants, along with informative co-twins with phenotype data but no genotypes available (N=1,737). To take account of the relatedness in the combined sample, we fitted the association model using Merlin software40, again with sex as a covariate. The Merlin software also infers the posterior probabilities of missing genotype information from available pedigree information to increase power. For the region on chromosome 19 we analyzed the ALSPAC data using SNPTEST again with sex as a covariate.

Variance component analysis

For twin and population-level analyses, we consider a general partitioning of a quantitative phenotype Y (either reading or mathematics ability in our study) into five components Y=A+D+I+C+E, where A, D and I correspond to additive, dominance and interaction genetic effects over the whole genome, respectively, and C and E are within-family and individual environmental effects, respectively. We assume that these components are defined to be uncorrelated with each other and thus the phenotypic variance is also partitioned into five components V Y =V A +V D +V I +V C +V E .

Bivariate twin analysis

We consider the traditional ACE twin model (see Supplementary Methods) assuming that dominance and interaction effects are zero (D=I=0). To extend the model to bivariate phenotype, we introduce three parameters ρ A , ρ C and ρ E to describe the correlation between additive genetic, shared environmental and individual environmental effects, respectively, between reading and mathematics abilities (see Supplementary Methods). We use the model to estimate the variance components V A , V C , V E , and the three correlation parameters. The narrow-sense heritability is then defined as the ratio—V A /V Y.

For the twin analyses, standardized residuals correcting for age and sex were used because the age of twins is perfectly correlated across pairs, which means that, unless corrected, variation within each age group at the time of testing would contribute to the correlation between twins and be misrepresented as shared environmental influence. The same applies to the sex of the twins, since MZ twins are always of the same sex. The model was fitted using full information maximum likelihood analysis of raw data in the structural equation modelling R package OpenMx41, estimating the variance parameters for both phenotypes together with the correlation parameters.

Bivariate population-level analysis

As opposed to the twin model, the population-level model considers only individuals who are not closely related. The univariate version of this model was recently introduced to study human height22,25 and subsequently further assessed24. The bivariate extension was also recently considered20,42.

This model decomposes the variance into an additive genetic component (G) that is due to the available panel of SNPs, and the residual component which in principle includes D, I, C and E as defined previously, together with the part of the additive component A that is not captured by G. Thus it can be used for estimating a lower bound for the variance and covariance (or correlation) between the additive genetic components of the two phenotypes (Supplementary Methods). A caveat of this model is that environmental effects that correlate with genetics can act as potential confounders.

As for the twin model, we extend the population-level model to the bivariate case by introducing parameters ρ G and ρ ε to describe the correlation between genetic and residual effects between phenotypes (see Supplementary Methods). This model was previously used elsewhere20.

For population-level analysis, we included one member of each of those 2,221 twin pairs for which both the reading and the mathematics ability was measured. After quality control 686,458 autosomal SNPs from the Affymetrix array were included in the analysis. We used linear regression to adjust the phenotypes for age, sex and population structure (using 10 PCs) before the variance component analysis. We checked that the programme GCTA22 gave very similar results for the variance parameters as our own implementation (see Supplementary Methods).

We implemented a Metropolis-Hastings random walk algorithm to explore the posterior distribution on the parameters (here proportional to the likelihood due to the use of uniform priors, see Supplementary Figs 4 and 5 and Supplementary Methods), and compute credible intervals. Finally, we computed a Bayes factor to quantify the evidence for this model relative to the model where ρ G =0, and used both permutations and simulations to obtain a P-value for this model comparison (Supplementary Fig. 5 and Supplementary Methods).

Interpretation of twin and population-level estimates

Although the twin and population models are similar in spirit, they differ in modelling assumptions and parameter interpretation. Here we list factors that could potentially lead to differences between the model estimates.

Population-level estimates of heritability and genetic correlation take into account only those genetic factors which are tagged by variants present on the genotyping platform. Consequently the population-level model gives a lower-bound estimate of heritability. In particular, we chose not to include genotypes of SNPs on the sex chromosomes in order to help simplify the model and its interpretation. In principle, twin model estimates capture all genetic factors which produce differences between trait values for siblings.

Twin model estimates of narrow-sense heritability may be biased upwards by the presence of dominance or interaction effects23. By contrast, because the population-level model employs only distantly related individuals, dominance and interaction effects are expected to be much less important effects.

Twin models take into account only those genetic factors which lead to differences in trait values between siblings: consider an unmeasured environmental variable S that depends only on the family (socio-economic status may be one such variable). If S is correlated with genetics—for example, through parental genotypes—its effect in the twin model, where S is unmodelled, will be to increase the estimated proportion of C43 and so to act to deflate estimates of genetic contributions. However, since shared environment is not modelled in the population-level model, S potentially contributes to the estimated proportion of G. In principle, including PCs in the model may help control for this effect. We find little difference in the estimates between the model including and not including PCs (Supplementary Table 4).

More complex effects, including interaction between genetic and environmental influences, could potentially have different effects on the two models. For example, twin model estimates of heritability may be affected by within-family (that is, shared) environmental effects that are more similar between MZ twins than DZ twins44,45. More generally, environmental effects that correlate or interact with genetics and other factors are not modelled directly43,46. However, several studies have shown that these assumptions of the twin model are usually reasonable in practice44,46. Population-level estimates appear to be remarkably robust to deviations from modelling assumptions24.

The population-level model assumes a particular dependency between the minor allele frequency at a SNP and the size of the effect of that SNP on the phenotype (see Supplementary Methods). Simulation studies24 suggest the model is fairly robust to deviations from this assumption.