23andMe study

Genome-wide SNP data were available in up to 55,871 men aged 18 or older of European ancestry from the 23andMe study. Age at voice breaking was determined by response to the question ‘How old were you when your voice began to crack/deepen?’ in an online questionnaire. Participants chose from one of seven pre-defined age bins (under 9, 9–10 years old, 11–12 years old, 13–14 years old, 15–16 years old, 17–18 years old, 19 years old or older). These were then re-scaled to one year age bins post-analysis by a previously validated method27. SNPs were excluded prior to imputation based on the following criteria: Hardy-Weinberg equilibrium P < 10−20; call rate < 95%; or a large frequency discrepancy when compared with European 1000 Genomes reference data6. Imputation of genotypes was performed using the March 2012 ‘v3’ release of 1000 Genomes reference haplotype panel. Genetic associations with puberty timing were obtained by linear regression models using age and five genetically determined principal components to account for population structure as covariates, with additive allelic effects assumed. P values for SNP associations were computed using likelihood ratio tests. Participants provided informed consent to take part in this research under a protocol approved by Ethical and Independent Review Services, an institutional review board accredited by the Association for the Accreditation of Human Research Protection Programs.

UK Biobank

Genotyping for UK Biobank participants has been described in detail previously9. For this analysis, we limited our sample to individuals of white European ancestry. Age at voice breaking for male participants from the UK Biobank cohort study was obtained by responses to the touchscreen question ‘When did your voice break?’ Participants were required to choose from one of five possible options (younger than average, about average, older than average, do not know, prefer not to answer). For age at first facial hair, respondents were asked to choose from one of the same five options in response to the touchscreen question ‘When did you start to grow facial hair?’ In total, data were available in up to 191,270 individuals for voice breaking and 198,731 individuals for facial hair. UK Biobank received ethical approval from the NHS National Research Ethics Service North West (11/NW/0382).

Respondents who answered either ‘older than average’ or ‘younger than average’ were compared separately using the ‘about average’ group as a reference in a case-control design for both phenotypes. For both voice breaking and facial hair, relatively early and relatively late effect estimates were obtained using linear mixed models which were applied using BOLT-LMM software, which accounts for cryptic population structure and relatedness. Covariates included age, genotyping chip and 10 principle components.

Meta-analysis of voice breaking results

GWAS summary results for each of the five strata (23andMe age at voice breaking; UK Biobank relatively early and late voice breaking; and UK Biobank relatively early and late facial hair) all aligned to later timing of voice breaking were meta-analysed using MTAG10. MTAG uses GWAS summary statistics from multiple correlated traits to effectively increase sample size and statistical power to detect genetic associations. Full details on the methodology have been described previously10. In brief, MTAG estimates a variance-covariance matrix to correlate the effect sizes of each trait using a moments-based method, with each trait and genotype standardised to have a mean of zero and variance of one. In addition, MTAG calculates a variance-covariance matrix for the GWAS estimation error using LD score regressions. The effect estimate for the association of each SNP on the trait of interest is then derived using a moments-based function, in a generalisation of standard inverse-variance weighted meta-analysis.

Prior to meta-analysis, we removed extremely rare variants (MAF < 0.01). In addition, for the four UK Biobank strata we calculated the effective sample size using

$$N_{{\rm{eff}}} = \frac{2}{{\frac{1}{{N_{{\rm{cases}}}}} + \frac{1}{{N_{{\rm{controls}}}}}}}.$$ (1)

Effective sample sizes for early and late voice breaking were 15,711 and 21,217, respectively, and 17,391 and 23,011 for early and late facial hair, respectively. We used the genome-significant P-value threshold of P < 5 × 10−8 to determine significant SNP associations. Independent signals were identified using distance-based clumping, with the SNP with the lowest P-value within a 1 MB window being considered the association signal at that locus.

Gene annotation and identification of loci

For each independent signal identified in the voice breaking meta-analysis, we identified all previously reported age at menarche (AAM) and voice breaking (VB) loci within 1MB of that SNP. We assessed if each locus had previously been associated with puberty timing (AAM or VB) within 1MB, and if any such loci within 1MB were in LD (r2 < 0.05). Heterogeneity between AAM and VB for each SNP was determined by the I2 statistic and P-value generated by the METAL software.

Gene annotation was performed using a combination of methods. Information on the nearest annotated gene was obtained from HaploReg v4.1. In addition, other genes in the region were identified using plots produced from LocusZoom. The most likely causal variant was determined by combining this information with identification of any non-synonymous variants within the region as well as application of existing knowledge.

Replication in ALSPAC

The Avon Longitudinal Study of Parents and Children (ALSPAC) recruited pregnant women resident in the Avon area of the UK with an expected delivery date between 1 April 1991 and 31 December 1992. Since then, mothers, partners, and offspring have been followed up regularly through questionnaires and clinical assessments11. The offspring cohort consists of 14,775 live-born children (75.7% of the eligible live births). Full details of recruitment, follow-up and data collection have been reported previously11. Ethical approval for the study was obtained from the ALSPAC Ethics and Law committee and the Local Research Ethics Committees. A series of nine postal questionnaires regarding pubertal development was administered approximately annually from the time the participant was aged 8 until he was aged 17. The questionnaires, which were responded by either the parents or the participant, had schematic drawings and verbal descriptions of secondary sexual characteristics (genitalia and pubic hair development) based on the Tanner staging system, as well as information on armpit hair growth and voice change. Age at voice change was considered the age at which the adolescent reported his voice to be occasionally a lot lower or to have changed completely. Weight and height were measured annually up to age 13 years, then at ages 15 and 17 years by a trained research team. Age at peak height velocity (PHV) was estimated using Superimposition by Translation And Rotation (SITAR) mixed effects growth curve analysis28. The sample size available varied according to the phenotype, from 1126 (for genital development) to 2403 (for age at which armpit hair started to grow).

The genetic risk score (GRS) was calculated based on 73 SNPs (genotypes at 3 SNPs were unavailable) weighted by the effect size reported for that SNP in the ReproGen Consortium. The GRS was standardised, and results are presented as increase in the phenotype per standard-deviation increase in the GRS. Linear (continuous phenotype) and logistic (binary phenotype) regression analyses were performed unadjusted and adjusted for age (except for age at PHV, age at voice change and age at which armpit hair started to grow) and controlled for population stratification using the first 10 principal components.

Gene expression and pathway analysis

We used MAGENTA to investigate whether genetic associations in the meta-analysed dataset showed enrichment in any known biological pathways. MAGENTA has previously been described in detail29. In brief, genes are mapped to an index SNP based on a 150 kb window, with a regression model applied to correct the P-value (gene score) for gene size, SNP density and LD-related properties. Gene scores are ranked, and the numbers of gene scores observed in a given pathway in the 75th and 95th percentiles are calculated. A P-value for gene-set enrichment analysis (GSEA) is calculated by comparing these values to one million randomly generated gene sets. Testing was completed on 3216 pathways from four databases (PANTHER, KEGG, Gene Ontology and Ingenuity). Significance was determined based on an FDR < 0.05 for genes in the 75th or 95th percentile.

To determine tissue-specific expression of genes, we used information from the GTEx project. GTEx characterises transcription levels of RNA in a variety of tissue and cell types, using sample from over 1000 deceased individuals of European, African-American and Asian descent. We investigated transcription levels of significant genes identified in our meta-analysis of voice breaking in 53 different tissue types. We used a conservative Bonferroni-corrected P-value of 9.4 × 10−4 (=0.05/53) to determine significance.

Association between hair colour and puberty timing

Information on natural hair colour for UK Biobank participants was collected via touchscreen questionnaire, in response to the question ‘What best describes your natural hair colour? (If your hair colour is grey, the colour before you went grey)’. Participants chose from one of 6 possible colours: blond, red, light brown, dark brown, black or other. For our analyses, we restricted this to include only non-related individuals of white European ancestry, totalling 190,845 men and 238,179 women. Hair colours were assigned numerical values from lightest (blond) to darkest (black) in order to perform ordered logistic regression of hair colour for both relative age at voice breaking in men and AAM in women. In both cases, blond hair was used as the reference group and models were adjusted for the top 10 principle components to account for population structure. In men this produces an effect estimate as an odds ratio for early puberty (relative to blond-haired individuals), while in women the effect estimate is on a continuous scale for AAM (in years) relative to the mean AAM for those with blond hair.

Genetic correlations

Genetic correlations (r g ) were calculated between age of puberty in males and 751 health-related traits which were publically available from the LD Hub database using LD Score Regression21,30.

Mendelian randomisation analyses

GWAS summary statistics for longevity were obtained from Timmers et al.31. Briefly, Timmers et al. performed a GWAS of parent survivorship under the Cox proportional hazard model in 1,012,050 parent lifespans of unrelated subjects using methods of Joshi et al.32, but extending UK Biobank data to that of second release. Data were meta-analysed using inverse-variance meta-analysis results from UK Biobank genomically British, Lifegen, UK Biobank self-reported British (but not identified as genomically British), UK Biobank Irish, and UK Biobank other white European descent. The resultant hazard ratios and their standard errors were then taken forward for two-sample MR using the 74 male puberty loci which were available in both datasets. We then performed a sensitivity analysis after removing outlier SNPs on the basis of heterogeneity using MR-PRESSO33.

Summary statistics for the association between the genetic variants and risk of prostate cancer were obtained from the PRACTICAL/ELLIPSE consortium, based on GWAS analyses of 65,044 prostate cancer cases and 48,344 controls (all of European ancestry) genotyped using the iCOGS or OncoArray chips34. The analyses were repeated using summary statistics from a comparison of the subset of 9,640 cases with advanced disease versus 45,704 controls, where advanced cases were defined as those with at least one of: Gleason score 8+, prostate cancer death, metastatic disease or PSA > 100. Two-sample MR analyses were conducted using weighted linear regressions of the SNP-prostate cancer log odds ratios (logOR) on the SNP-puberty beta coefficients, using the variance of the logORs as weights. This is equivalent to an inverse-variance weighted meta-analysis of the variant-specific causal estimates. Because of evidence of over-dispersion (i.e. heterogeneity in the variant-specific causal estimates), the residual standard error was estimated, making this equivalent to a random-effects meta-analysis. Unbalanced horizontal pleiotropy was tested based on the significance of the intercept term in MR-Egger regression. The total effect of puberty timing on prostate cancer risk was separated into a direct effect (independent of BMI, a potential mediator) and an indirect effect (operating via BMI), as described in Burgess et al.19.

Genetic associations between hair colour and puberty timing were assessed in two ways. First, we performed a two-sample MR analysis based on summary statistics (as described in Burgess et al.19) using the most recently reported GWAS for hair colour18 and 23andMe data for SNP effect estimates on age at voice breaking in men, and published ReproGen consortium data on AAM in women. However, there is partial overlap (between samples used for the discovery phase GWAS of hair colour variants (n = 290,891 from 23andMe plus UK Biobank) and the samples used for puberty timing (n = 55,871 men from 23andMe; maximum potential overlap = 55,871/290,891 = 19%). Therefore, we also performed a sensitivity analysis in non-overlapping samples; we used a more limited 5-SNP instrument for darker hair colour (identified in an earlier hair colour GWAS that did not include UK Biobank35), and assessed its effects on puberty timing in UK Biobank men and women in an individual-level MR analysis, controlling for geographical (assessment centre) and genetic ancestry factors (40 principal components).

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.