UK Biobank

UKB is a prospective study of 500,000 volunteers, comprising relatively even numbers of men and women aged 40–69 years old at recruitment, with extensive baseline and follow-up clinical, biochemical, genetic and outcome measures. The UKB study has approval from the North West Multi-Centre Research Ethics Committee, and all participants provided informed consent.

Genotyping was performed by UKB using the Applied Biosystems UK BiLEVE Axiom Array or the UKB AxiomTM Array41. SNPs were imputed centrally by UKB using the Haplotype Reference Consortium (HRC) panel. Information on UKB array design and protocols is available on the UKB website (URLs).

Exercise test protocol and data acquisition

The exercise test started with 15 s of rest (pre-test), followed by 6 min of exercise (cycling) initially at constant load (2 min), then at increasing workload (4 min), and a 1 min recovery period without pedalling. Automatic HR measurements were taken throughout the protocol and were provided by UKB, together with the raw ECG recordings42.

Approximately 95,000 individuals participated in the exercise test using a stationary bicycle in conjunction with a 4-lead electrocardiograph device at the initial assessment (2006–2008), from which ~20,000 individuals were invited to the first repeat assessment in 2011–2013.

Selection of individuals for analysis

An overview of the process used to select individuals who participated in the exercise test for subsequent genetic analysis is presented in Supplementary Fig. 6. From the ~95,000 individuals invited, 79,745 completed the test. Automated HR measurements provided by UKB (UKB HR measurements) were available in 78,655 individuals. If available, data from the test performed at initial assessment (63,972 individuals) were analysed; otherwise, data from the first repeat assessment (14,683 individuals) were included in the analysis.

Validation of UKB HR measurements was performed by comparing them with traits derived from available raw ECG recordings (N = 62,272; ECG HR measurements) using our bespoke algorithms43. ECGs were selected for visual inspection when ECG and UKB HR measurements did not match for resting, peak HR or recovery HR. The visual inspection was to ensure whether the UKB HR values could be trusted. UKB HR measurements were rejected if the ECG recordings contained episodes without clear identifiable QRS complexes. In other cases the mismatch in HR values was caused by failure of the algorithm to detect QRS complexes. Detection of QRS complexes was then corrected by hand and the UKB HR values were accepted for analysis. Correlation for resting HR, maximum exercise and recovery was >0.9 (Supplementary Fig. 7). After visual inspection, 1,482 individuals were excluded. In 16,383 individuals’ validation was not possible due to the absence of raw ECG recordings. However, the statistical distribution of UKB HR measurements at rest, maximum exercise and recovery for these individuals closely matched that of the validated measurements (Supplementary Table 11) and traits from these individuals were included in the analysis. In total 77,173 individuals with reliable UKB HR measurements at three different phases of the exercise test were used to derive our phenotypes of interest.

Genetic and phenotypic QC

Genetic quality control (QC) was performed on the set of individuals who participated in the exercise test protocol (N = 95,216). Individuals with bad genotype quality, provided by UKB, i.e. high missingness or heterozygosity (N = 1472) and discordance between the self-reported sex and the sex inferred from the genotypes (N = 982) were excluded41. We then restricted our data set to individuals of European ancestry only (N = 85,522).

We used the k-means function in R as a clustering algorithm, to objectively and statistically select the clusters according to information from PC1 and PC2. The k-means algorithm ‘partitions the points into k groups such that the sum of squares from points to the assigned cluster centres is minimised.’ Then, we applied k-means separately to cluster according to each of PC1 and PC2, and initially only with k = 4, for a 4-way clustering, to correspond to the 4 main ethnic clusters within UKB: White, African, Asian and Chinese.

We then created an overall clustering, according to the intersections of the PC1-4means-clustering and the PC2-4means-clustering, so that participants were only categorised as ‘White’ overall, if they were contained in the ‘White’ cluster for both PC1 and PC2. Next, we created an overall ‘Mixed/Other’ cluster, for any participants, whose clustering differed between PC1 and PC2. Finally, we combined the PCA ancestry clusters with the self-reported ethnicity. Individuals were only included if the results PCA-clustering results matched the self-reported ancestry. However, we count ‘mixed’, ‘other’ and ‘missing’ as being broad/uncertain self-reported ethnicity, which have now been validated more objectively from the genetic PCA data. Over the 77,173 individuals for whom it was possible to derive the phenotypes, 69,353 complied with genetic QC and were of European ancestry (Supplementary Fig. 8).

Before genetic analysis, we further excluded individuals (N = 2,317) based on existing medical conditions known to affect HR (namely atrial fibrillation, history of myocardial infarction or heart failure, (supra)-ventricular tachycardia, atrio-ventricular nodal re-entrant tachycardia, second or third degree atrioventricular block, and use of a pacemaker) and/or individuals on HR altering medications (non-dihydropyridine calcium antagonists; Anatomic Therapeutic Chemical (ATC) code C08D; digoxin (ATC code C01AA5), and amiodarone (ATC code C01BD01; Supplementary Fig. 8).

Individuals with extreme UKB HR measurements at rest (<40 or >120 bpm), peak exercise (>200 bpm) or at 1 min post-exercise (<40 or >200 bpm) were also excluded. This led to N = 66,800 individuals with ΔHRex measurements and N = 66,665 individuals with ΔHRrec measurements for analysis. A total of N = 66,844 individuals with resting HR measurements were also available for analysis (Supplementary Fig. 8).

Derivation of ΔHRex and ΔHRrec

HR response to exercise (ΔHRex) and HR response to recovery (ΔHRrec) were computed from the resting HR, peak HR and recovery HR, where resting HR was defined as the mean HR during pre-test period, peak HR as the maximum HR during exercise, and recovery HR as the minimum HR 1 min after the peak exercise. The phenotypic traits ΔHRex and ΔHRrec were then computed as:

$$\Delta \mathrm{HR}^{\mathrm{ex}} = {\mathrm{Peak}}\,{\mathrm{HR}} - \,{\mathrm{Resting}}\,{\mathrm{HR}}$$

$$\Delta \mathrm{HR}^{\mathrm{rec}} = {\mathrm{Peak}}\,{\mathrm{HR}} - {\mathrm{Recovery}}\,{\mathrm{HR}}$$

Inverse-normal transformation of the traits was not performed since their distribution approximated a normal distribution (Supplementary Fig. 9).

Genetic analyses

As we did not have access to an independent study with raw ECG recordings during exercise and genetic data that could serve as a replication study, and with the limited sample size we randomly divided our cleaned data set into discovery (N ~ 40,000) and replication (N ~ 27,000) datasets (Fig. 1). For the discovery data set, we selected the model SNPs from the genotyped SNPs, required for the subsequent GWASs using PLINK 1.944. This selection was based on the following criteria: a minor allele frequency (MAF) > 1%, a Hardy–Weinberg equilibrium (HWE) with a threshold of P-value = 1 × 10−6, and missingness < 0.0015.

Then, we estimated the proportion of ΔHRex and ΔHRrec explained by additive genetic variation (heritability) using a variance components method (BOLT-REML)45, with the model SNPs and ~ 7.8 million imputed variants with MAF ≥ 1%, imputation quality (INFO) > 0.3 using the full data set.

Next, we performed a GWAS for each trait using a linear mixed model method (BOLT-LMM)46 under the additive genetic model including ~ 7.8 million imputed SNPs with MAF ≥ 1% and INFO > 0.3. We used the heritability estimation obtained from BOLT-REML.

For ΔHRex and ΔHRex traits we included the following covariates: sex, age, body mass index (BMI), resting HR, resting HR2 and a binary indicator variable for the genotyping array (UK Biobank versus UK BiLEVE).

Although BOLT-LMM software accounted for genetic relatedness within the analysed cohort, individuals across discovery and replication cohorts could be related, making both datasets not completely independent from each other. Therefore we removed a total of N = 818 first and second degree related individuals (kinship coefficient > 0.88) from the replication cohort as indicated from UK Biobank41.

We also performed a GWAS for both traits in the replication data set following removal of related individuals using the heritability parameter estimated in the discovery data set.

All of our genetic analysis in UKB were restricted to variants imputed using the HRC panel41.

Replication analyses

All SNPs with P < 1 × 10−6 from the discovery analysis for both traits were compiled and SNPs were mapped to individual loci based on genomic distance of >500 Kb to each side of another SNP. If multiple SNPs fit the selection criteria for a single region, only the SNP with the smallest P value was considered for follow-up. As a QC step, we reviewed each selected SNP to check for unrealistically high beta values, large standard errors, and none were observed. Locus Zoom plots were produced for all selected SNPs and these were carefully reviewed. Fourteen variants for ΔHRex and 17 variants for ΔHRrec were taken forward into replication. Replication was confirmed if P (one-tailed) ≤0.05/14 = 2.60 × 10−3 for ΔHRex and ≤0.05/16 = 3.10 × 10−3 for ΔHRrec and the effect (β) was in the direction observed in discovery analyses for each trait in the replication cohort.

Full data set analyses

We also performed a full data set GWAS for each trait using BOLT-LMM. Additional loci for each trait reaching a genome-wide significance threshold (P ≤ 5 × 10−8) from the full data set GWAS are reported.

We additionally performed a GWAS for resting HR in the full data set, N = 67,257 to serve as a reference for interpreting results from our traits. The regression model included the following covariates: sex, age, BMI and a binary indicator variable for UKB versus UK BiLEVE genotyping arrays.

Conditional analyses

To examine the existence of independent SNPs to lead SNPs, we applied genome-wide complex trait analysis (GCTA)47 for all validated and genome-wide significant loci from the full data set GWAS. A secondary signal would be declared if: (i) the newly identified SNP original P value was lower than 1 × 10−6; (ii) there was less than a 1.5-fold difference between the lead SNP and secondary association P values on a –log 10 scale, i.e., if –log 10 (P lead )/-log 10 (P sec ) < 1.5; and (iii) if there was less than a 1.5-fold difference between the main association and conditional association P values on a –log 10 scale, i.e., if –log 10 (P sec )/-log 10 (P cond ) < 1.5.

Percent variance

The percent of variance explained in ΔHRex and ΔHRrec by all genome-wide significant variants and the secondary signal for ΔHRrec (N = 14 and N = 18, respectively) was calculated using standard regression models including analysis covariates (see above) and each SNP separately for each trait. Each trait, ΔHRex and ΔHRrec, was regressed initially only on analysis covariates and then on covariates and the respective SNPs. Both r2 values obtained from these two regressions were used as estimations of the percent variance explained by the respective models. Through subtraction of the both r2 values, we determined the percent variance explained by all newly discovered SNPs.

Sensitivity analyses

Previous GWAS and Exome-chip analyses for resting HR have shown that use of beta-blockers do not significantly affect the effect sizes of associated variants9. However, since the traits analysed in study might not only depend on resting HR, we undertook additional association analyses excluding individuals taking beta-blockers for our reported SNPs using the full data set. We then calculated the Spearman’s correlation coefficient between the β estimates and P values for the results including and excluding individuals under beta-blockers for each trait.

Sex-stratified analyses

For each trait, we performed a GWAS for men and women separately in the full cohort including the same covariates in the regression model as specified above, but excluding sex.

Bioinformatics analyses

We performed several analyses to annotate the HR response to exercise and recovery associated SNPs, at the variant and gene level (all SNPs in LD r2≥0.8 with the HR response to exercise and to recovery associated SNPs were considered). LD was calculated using genetic data from UKB if the lead SNP was imputed using the HRC reference panel in order to calculate pairwise-LD for all associated SNPs. For SNPs not available in UKB, we used the 1000 Genomes Project phase 3 (1000 G) reference panel. The r2 of pairwise SNPs (minimum r2 = 0.8 and maximum distance between a pair of SNPs is 1 Mb) were computed using PLINK44.

Using University of California, Santa Cruz (UCSC) known genes, we annotated each lead SNP with the nearest genes and those found within 5 kb. We used VEP48 to characterise the variants, including the impact of amino acid substitutions based on a range of prediction tools including SIFT and PolyPhen-2 and we also assessed the conservation scores. Missense variants were annotated to be damaging if indicated by two or more methods.

We evaluated all SNPs in LD (r2≥0.8) with our validated lead SNPs to explore if there was support for mediation of eQTLs in 44 tissues in the GTEx database. We also sought to identify if the validated variant at each locus was coincident with the the strongest evidence of eQTL association for that gene, and we focussed on reporting results from the brain, heart and adrenal tissue.

We identify potential target genes of regulatory SNPs using long-range chromatin interaction (Hi–C) data from left and right ventricles, adrenal glands, neural progenitor stem cells, hippocampus and cortex49 which tissues and cell types are considered relevant for regulating heart rate. Hi–C data is corrected for genomic biases and distance using the Hi–C Pro and Fit-Hi-C pipelines according to Schmitt et al.49 (40 kb resolution—correction applied to interactions with 50 kb-5 Mb span). We find the most significant promoter interactions for all potential regulatory SNPs (RegulomeDB score ≤5) in LD (r2 ≥ 0.8) with our sentinel SNPs and choose the interactors with the SNPs of highest regulatory potential to annotate the loci.

We also performed enrichment testing across all loci. We used DEPICT50 (Data-driven Expression-Prioritised Integration for Complex Traits) to identify cells and tissues in which the HR response to exercise and to recovery loci were highly expressed. Subsequently, for the best candidate genes per locus (Supplementary Table 10), we used g:profiler that performs functional profiling of gene lists using various kinds of biological evidence (including GO, HPO annotation). Enrichments with FDR < 0.05 were deemed significant.

Furthermore, to investigate regulatory regions, we used FORGE to investigate cell-type-specific enrichment within DNase I-hypersensitive sites in 123 cell types from ENCODE and the Epigenomics Roadmap Project51. Validated lead SNPs from our study were analysed along with independent secondary signals (with P < 10−6) to evaluate the overall tissue-specific enrichment of the HR response to exercise and to recovery variants. Our directly our curated candidate regulatory SNPs were checked for overlap with cell-type-specific DNase I-hypersensitive signals in a second analysis which did not include an LD filter.

We queried associated SNPs against PhenoScanner52 to investigate trait pleiotropy, extracting all association results with a nominal significance of P < 0.05 for full reporting, and then extracted genome-wide significant results to highlight loci with the strongest evidence of association with other traits and we also indicate results with P < 1 × 10−4. We also accessed results from GeneAtlas to determine further phenotypic associations of our associated variants. We next performed an extensive review of all highlighted candidate genes from bioinformatics analyses at the 31 loci. The National Center for Biotechnology Information (NCBI) Gene database and GeneCards®: The Human Gene Database were used to obtain official full names and, where relevant, common aliases for each candidate gene product. NCBI’s PubMed was used to interrogate primary literature pertaining to gene function. We also reviewed gene-specific animal models using International Mouse Phenotyping Consortium and the Mouse Genome Informatics database.

Genetic risk scores

We constructed GRSs in the full data set by aggregating all lead and secondary SNPs separately for each trait using the beta estimates from the replication GWASs as independent, unbiased weights, to assess the combined effect of the identified variants on each trait, respectively, and cardiovascular mortality risk, while avoiding bias from ‘winner’s curse’.

The GRS for each trait was standardised to have a mean of 0 and a standard deviation of 1. We then assessed the association of the continuous GRS variable with each HR response trait by simple linear regression adjusting for the same covariates used in the genetic analyses and for the ten first principal components. Related individuals were also excluded. We also ran logistic regression to examine the association of the GRSs with cardiovascular mortality risk. We then compared HR response levels and cardiovascular mortality risk for individuals in the top (highest quintile) and bottom (lowest quintile) 20% of the GRSs distributions.

URLs

For UK Biobank, www.ukbiobank.ac.uk; for Haplotype Reference Consortium panel, http://www.haplotype-reference-consortium.org/site; for National Center for Biotechnology Information (NCBI) Gene database, see http://www.ncbi.nlm.nih.gov/gene/; for NCBI’s PubMed, http://www.ncbi.nlm.nih.gov/pubmed/; for International Mouse Phenotyping Consortium, http://www.mousephenotype.org/; for Mouse Genome Informatics, http://www.informatics.jax.org; for Gene Atlas PheWas, http://geneatlas.roslin.ed.ac.uk; for The Human Gene Database, http://v4.genecards.org/. For FORGE, http://browser.1000genomes.org/Homo_sapiens/UserData/Forge?db=core. For GTEx, www.gtexportal.org; for Phenoscanner, http://www.phenoscanner.medschl.cam.ac.uk; for RegulomeDB, http://www.regulomedb.org.

Data availability

The HR data generated during the current study are available from the UK Biobank data repository, which can be accessed by researchers upon application. These data include the GWAS summary data for the HR responses to exercise traits and resting HR. The genetic and phenotypic UK Biobank data are also available upon application to the UK Biobank. All replication data generated during this study are included in the published article and are contained within the supplementary tables provided.