Measurement of the HR profile and quality control

The UK Biobank is a cohort of individuals with an age range of 40–69 registered with a general practitioner of the UK National Health Service. In total 503,325 individuals were included and provided informed consent between 2006 and 2010. The UK Biobank cohort study was approved by the North West Multi-centre Research Ethics Committee (reference number 06/MRE08/65). Detailed methods used by UK Biobank have been described elsewhere45.

In total, 99,539 ECG exercise records were taken for 96,567 participants who underwent a cardio assessment; 79,217 were performed during the baseline visit (2006–2010), and 20,322 were performed at the second follow-up visit (2012–2013). The participants were asked to sit on a stationary bike, start cycling after 15 s of rest, and then perform six minutes of physical activity, after which exercise was terminated and participants sat down for about one minute without cycling. The exercise protocol was adapted according to participants’ risk factors; details can be found elsewhere46. Participants were only included in the study if they were allowed to cycle at 50% or 30% of their maximum workload (no risk to minimum risk), as described further in the 'Statistical analyses (exclusions)' section. The exercise was ended after participants reached a pre-set maximum HR level of 75% of their age-predicted maximum HR. The cardio assessment involved a 3 lead (lead I, II, and III) ECG recording (AM-USB 6.5, Cardiosoft v6.51) at a frequency of 500 Hz. The ECG was recorded using four electrodes placed on the right and left antecubital fossa and wrist and stored in an xml-file of Cardiosoft.

Of all available ECG records, 77,190 contained full disclosure data that could be used to detect R waves; other records contained an error relating to the ECG device used ('Error reading file C:/DOCUME~1/UKBBUser/LOCALS~1/Temp/ONL2F.tmp'). R waves were detected with the gqrs algorithm47 and further processed using Construe48 (https://github.com/citiususc/construe) to detect individual Q-R-S waves. Following international recommendations to obtain reliable RR intervals49, abnormal values (0.286–2 s) were removed. Additional outliers were removed using the tsclean function, a part of R-package forecast v7.3 that incorporates the method described by Chen and Liu50 for automatic detection of outliers in time series. A total of 2,804 ECGs were excluded due to excess noise (identified by determining the standard deviation over a rolling standard deviation with a window length of three beats over RR intervals per ECG per phase and removing the 98th percentile of this distribution). In total we inspected about 10,000 RR interval profiles or ECGs to evaluate the RR-interval detection and ensure quality control. For each ECG, we estimated the mean resting HR, standard deviation of RR intervals (SDNN, log2 transformed), and root mean square of successive differences between RR intervals (RMSSD, log2 transformed) from the RR intervals before exercise started. HR increase was determined as the difference between peak HR during exercise and resting HR. HR recovery was defined as the difference between maximum HR during exercise and mean HR at 10 ± 3, 20 ± 3, 30 ± 3, 40 ± , and 50 ± 3 s after exercise cessation (HRR10-HRR50). HR recovery at exactly one minute was not available; only nine participants recovered after a duration ≥60 s. Observations of the second follow-up visits were used when no baseline observation was available. Variables were inspected for normality, and participants with extreme ECG exercise measurements (more than ±5 standard deviations from mean) were excluded on a per-phenotype basis.

By means of external validation, we estimated that resting HR, SDNN, and RMSSD were highly consistent with previous GWAS estimates21,22. To this end, we performed linear regressions between the HR traits and their polygenic score (please also see the 'polygenic score' method section). The beta coefficients (β) of resting HR (β = 1.085, se = 0.029, p = 3 × 10−309), SDNN (β = 1.145, se = 0.051, p = 1 × 10−108), and RMSSD (β = 1.0816, se = 0.043, p = 2 × 10−139) was close to 1 and highly significant.

For the current analyses, HR phenotypes were rank-based inverse normal transformed to increase the power to detect low-frequency variants and allow for comparisons of beta coefficients between traits. Source code, example data, and further descriptions of the methods are available at https://github.com/niekverw/E-ECG.

Individual data on disease prevalence and incidence were obtained from the Assessment Centre in-patient health episode statistics (HES) and self-reports during any of the visits obtained through questionnaires and nurse-interviews, as described previously51. Mothers, fathers, and parental age of death were defined according to Pilling et al.’s44 study; in short, participants aged between 55–70 years were included, only if fathers died at ≥46 years of age or mothers died at ≥55 years of age. If an age of death was missing, questionnaires of follow-up visits were used where available. The lifespan of mothers and fathers were combined into a single normalized parental lifespan. Parental lifespan, as a proxy for mortality, was defined as the primary outcome variable.

Genotyping and imputation

Genotyping, quality control, and imputation to three reference panels (HRC v1.1,1000 genome and UK10K) was performed by The Wellcome Trust Centre for Human Genetics, as described in detail elsewhere52. Sample outliers (based on heterozygosity or missingness) were excluded, and 373 participants were excluded on the basis of gender mismatches. The analyses were restricted to SNPs of the HRC v1.1 imputation panel. Post-GWAS analyses were conducted using SNPs with a minor allele frequency greater than 1% and an imputation quality score of more than 0.3. Summary statistics deposited online will include all SNPs.

Statistical analysis

Regression analyses of resting HR, SDNN, and RMSSD were adjusted for gender, age, gender-age interaction, body mass index (BMI), BMI*BMI, the first 30 principal components, and genotyping chip (Affymetrix UK Biobank Axiom or Affymetrix UK BiLEVE Axiom array). To fully account for aerobic exercise capacity in HR increase and HR recovery, the model also included exercise duration, exercise program (30% or 50% maximum load), maximum workload achieved, and the interaction between exercise program and maximum workload achieved.

Participants were excluded if they stopped exercising earlier than planned, experienced chest-pain or other discomfort, were at medium-to-high cardiovascular risk46 at the time of the test, or terminated exercise for unknown reasons. In a post-hoc analysis, the population was stratified by participants that reported taking sotalol medication, beta-blockers, anti-depressants, atropine, glycosides or other anti-cholinergic drugs, or were previously diagnosed with myocardial infarction, supraventricular tachycardia, bundle branch block, heart failure, cardiomyopathy, or previously had a pacemaker or ICD implant. In a post-hoc sensitivity analysis, the differences in beta estimates in participants with and without cardiovascular disease or HR-altering medication were assessed using a Chow test.

In total, 58,818 participants were included in the GWAS. The genome-wide association study and heritability analyses were performed using BOLT-LMM53 and BOLT-REML54, respectively. A conjugate gradient-based iterative framework for fast mixed-model computations was employed to accurately account for population structure and relatedness; additive effects were assumed. The BOLT software was used with 509,255 genotyped SNPs that were extracted from the final imputation set (to ensure a 100% call rate per SNP). After pruning (R2 > 0.5, using plink ‘--indep-pairwise 50 5 0.5), LD scores also used by BOLT, were estimated from 2,000 randomly selected UK Biobank participants (who were selected after sample exclusions based on relatedness, missingness, and heterozygosity). To control for relatedness among participants in linear logistic, or cox regression analyses, we used cluster-robust standard errors with genetic family IDs as clusters. A family ID was given to individuals belonging together based on 3rd-degree or closer as indicated by the kinship matrix, which was provided by UK Biobank (kinship coefficient > 0.0442). All statistical analyses other than the genome-wide analysis were carried out using R v3.3.2 or STATA/SE release 13.

Since the current study is by far the largest population-based study of electrocardiographic exercise tests, independent cohorts that matched this study in size and availability of variables (specific HR response variables and genetics) were unavailable for replication purposes. Therefore, a conservative genome-wide significant threshold of p< 8.3 × 10−9 was adopted to account for six independent traits, in accordance with similar multi-phenotype studies of this scale55,56,57,58,59.

Variants were considered to be independent if the pairwise LD (R2) was less than 0.01. A locus was defined as the highest associated independent SNP +/− 1MB. The strongest associated variant within a locus was assigned the sentinel SNP. If there was evidence for multiple independent SNPs in one locus based on LD, it was confirmed by using linear regression and adjusting for the sentinel SNP.

Pleiotropy analyses

The GWAS catalog database (https://www.ebi.ac.uk/gwas/) was queried by searching for SNPs in a 1MB distance of the SNPs found in this study. LD was determined by calculating the R2 and D′ in the UK Biobank between the GWAS catalog SNPs and the SNPs found in this study. In addition we examined genome-wide summary statistics for 699 traits using PhenoScanner23 (v1.1, http://www.ner.medschl.cam.ac.uk/phenoscanner). PhenoScanner was used to cross-reference HRR associated SNPs for their association with a broad range of phenotypes regardless of genome-wide significance.

To gain insights into pleiotropy among HR variables, we performed linear regression analyses for all significantly associated SNPs with resting HR, HR variability (SDNN and RMSSD), HR increase, and HR recovery. The Z-scores, which were aligned to the HR recovery increasing allele, were visualized in a heat plot.

Polygenic score

Polygenic scores of HR increase and HR recovery were constructed by calculating the sum of the number of alleles that increased HR increase or HR recovery weighted by the corresponding beta coefficients. The primary polygenic score was based on all primary and secondary SNPs that were genome-wide significantly associated. The relationships between the polygenic score and clinical phenotypes were tested in 422,947 individuals who were not part of the discovery GWAS, using linear, logistic, and cox regression analyses. The discovery sample was excluded from this analysis to avoid any potential bias or reverse confounding. The statistical power for a case-control Mendelian randomization in this study (N = 422,334) was calculated at α = 0.05 using the sample size, proportion of cases, strength of the polygenic risk score, and the expected causal hazards ratio60.

Functional variants and candidate causal genes

To search for evidence of the functional effects of HR profile associated SNPs, we used multiple QTL databases including the following: Stockholm–Tartu Atherosclerosis Reverse Network Engineering Task (STARNET)61, GTEX, version 662, cis-eQTL datasets of Blood63,64,65, and cis-meQTLs66. Only eQTLS/meQTLs that achieved p < 1 × 10−6 and were in LD (R2 > 0.8) with the queried SNP were considered significant.

Long-range chromatin interactions with the 1MB region at either side of a SNP were examined and visualized using HUGin67. Only genes that achieved a Bonferonni significant association and demonstrated a clear pattern of interaction between the queried SNP and the promoter region were prioritized.

For all primary and secondary SNPs that were genome-wide significantly associated, candidate causal genes were prioritized as follows: a) by proximity, the nearest gene or any gene within 10 kb; b) by protein-coding gene variants in LD (R2 > 0.8); c) by eQTL analysis (described above); and d) by long-range chromatin interaction analysis (described above).

Data availability

The data that support the findings of this study are available from the corresponding author upon reasonable request. The de novo GWAS analysis (complete summary statistics of all genetic variants and traits) have been deposited in Mendeley with the identifier 'doi:10.17632/tg5tvgm436.1'.