Derivation of a metaGRS for ischaemic stroke

To create the GRSs we randomly split the UK Biobank (UKB) British white dataset (n = 407,388) into a derivation (n = 11,995) and validation set (n = 395,393; “Methods” section Fig. 1, Table 1). In order to increase statistical power in the derivation phase, we enriched the derivation set (n = 11,995) with IS events (n = 888, 7.4%). A schematic of the overall study design is given in Fig. 1.

Fig. 1: Study design. a Individual GRSs were derived in the UK Biobank training set (n = 11,995) using GWAS summary statistics for individual traits. b The metaGRS for ischaemic stroke was then derived by integrating individual GRSs using elastic-net cross-validation. c Validation of the metaGRS for ischaemic stroke was performed in the UK Biobank validation set (n = 395,393). UKB UK Biobank, GWAS genome-wide association study, GRS genomic risk score. Full size image

Table 1 Study characteristics of the UK Biobank validation dataset. Full size table

We used GWAS summary statistics that did not include the UKB for five stroke outcomes and 14 stroke-related phenotypes (Supplementary Table 1) to generate 19 GRSs associated with IS (Fig. 1). As expected, the 19 individual GRSs were correlated with each other in several distinct clusters: (i) any stroke (AS), IS, cardioembolic stroke (CES), large artery stroke (LAS), and small vessel stroke (SVS); (ii) the three CAD scores (1KGCAD, 46K, and FDR202); (iii) total cholesterol (TC), triglycerides (TG), low-density lipoprotein cholesterol (LDL), and high-density lipoprotein cholesterol (HDL); (iv) systolic BP (SBP) and diastolic BP (DBP); and (v) body mass index (BMI) and type 2 diabetes (T2D) (Fig. 2). From the 19 distinct GRSs, we constructed the metaGRS using elastic-net logistic regression with 10-fold cross-validation on the derivation set (Fig. 1; metaGRS; model weights are shown in Supplementary Fig. 1), and subsequently converted the model to a set of 3.2 million SNP weights, which are freely available (https://doi.org/10.6084/m9.figshare.8202233).

Fig. 2: Individual GRSs for stroke-related phenotypes and stroke outcomes correlate in several distinct clusters. Shown is the partial Pearson correlation plot of individual GRSs in a random sample of 20,000 UK Biobank individuals. Estimates are from linear regression of each pair of standardised GRSs, adjusting for genotyping chip (UKB/BiLEVE) and 10 PCs. Stars indicate Benjamini–Hochberg false discovery rate < 0.05 (adjusting for 171 tests). GRSs were ordered via hierarchical clustering of the absolute correlation. Anthrop anthropometric, cardio cardiovascular (other than CAD), SBP systolic blood pressure, DBP diastolic blood pressure, Height measured height, BMI body mass index, T2D type 2 diabetes, 1KGCAD coronary artery disease from 1000 Genomes, 46K coronary artery disease from Metabochip, FDR202 coronary artery disease from 1000 Genomes (top SNPs), CES cardioembolic stroke, AS any stroke, IS ischaemic stroke, LAS large artery stroke, SVS small vessel stroke, TC total cholesterol, LDL low-density lipoprotein cholesterol, HDL high-density lipoprotein cholesterol, TG triglycerides, AF atrial fibrillation, Smoking cigarettes per day. Full size image

We performed a sensitivity analysis to assess whether the estimation of the metaGRS weights on the UKB derivation set led to over-fitting (upwards bias in apparent performance) of the score in the validation set. We developed a metaGRS based on four component GRSs (AS, IS, BMI, and SBP) in cross-validation on the derivation set. We compared this metaGRS with a score derived using smtPred28, which relies on the chip heritabilities and genetic correlations estimated from the GWAS summary statistics via LD score regression31,32, independently of the UKB (Supplementary Fig. 2). Overall, the two scores were highly correlated (Pearson r = 0.98), and had indistinguishable associations with IS in the UKB validation set, indicating that our metaGRS procedure did not lead to overfitting in the validation set.

The metaGRS improves risk prediction of ischaemic stroke compared with other genetic scores

Using the independent UKB validation set, we next quantified the risk prediction performance of the metaGRS, and evaluated its association with IS via survival analysis. The metaGRS was associated with IS with a hazard ratio (HR) of 1.26 (95% CI 1.22–1.31) per standard deviation of metaGRS, which was stronger than any individual GRS comprising the metaGRS (including the IS GRS [HR = 1.18, 95% CI 1.15–1.22]) and was twice the effect size of the previously published 90-SNP IS score24 (HR = 1.13 [95% CI 1.10–1.17]; Supplementary Fig. 3a). The metaGRS also increased the C-index by 0.029 over the 90-SNP GRS (Supplementary Fig. 3b). We also assessed the performance of the IS metaGRS for predicting the AS outcome. We found the associations were consistently weaker for AS than for IS, however, as with IS, the metaGRS was a stronger predictor of AS than the 90-SNP GRS score (Supplementary Fig. 3).

In a Kaplan–Meier analysis of IS, the top and bottom 10% of the metaGRS showed substantial differences in cumulative incidence of IS (Supplementary Fig. 4; log-rank test between the top decile and the 45–55% decile: P = 3 × 10−6); these results were consistent with a Cox proportional hazards model of the metaGRS assessing the HRs for the top 10% decile vs the middle 45–55% decile (Supplementary Fig. 5). The top 0.25% of the population were at a threefold increased risk of IS vs. the middle decile (45–55%), with HR = 3.0 (95% CI 1.96–4.59) (Fig. 3).

Fig. 3: The metaGRS identifies individuals at increased risk of ischaemic stroke. Shown is the distribution of the metaGRS for ischaemic stroke in the UK Biobank validation set (n = 395,393), and corresponding hazard ratios. Hazard ratios are for the top metaGRS bins (stratified by percentiles) vs. the middle metaGRS bin (45–55%). Full size image

There was no evidence for a statistical interaction of the metaGRS with sex on IS hazard (Wald test in Cox proportional hazard model, P = 0.614), indicating that the substantial differences in cumulative incidence between the sexes were driven by differences in baseline hazards rather than by any sex-specific effects of the metaGRS itself.

A small number of individuals (n = 45) had recorded haemorrhagic stroke before their primary IS event. We conducted two sensitivity analyses to assess the impact of this on our results: (i) excluding n = 45 individuals from the analysis; (ii) adjusting for haemorrhagic stroke status in the analysis. In both cases, there was essentially no difference in the association of the metaGRS with IS compared with the original analysis (HR = 1.27 per standard deviation of the metaGRS across the two analyses).

To further assess the contribution of different classes of GRS in the final score, we constructed two metaGRSs: (i) a score based on stroke-related GRSs (AS, IS, CES, LAS, SVS) and CAD-related GRSs (46K, FDR202, 1KGCAD) but no other risk factors; and (ii) a metaGRS based on stroke-related GRSs (AS, IS, CES, LAS, SVS) and risk-factor-related GRSs (SBP, DBP, TC, LDL, HDL, TG, AF, BMI, Height, T2D, Smoking) but no CAD-related GRSs (Supplementary Fig. 6). Addition of either risk-factor GRSs or CAD-related GRSs each led to more powerful metaGRSs compared with the IS-only GRS, but the best score was achieved when combining both types of GRS into the 19-component metaGRS, indicating that both types of GRS had independent information about stroke risk. Note that due to pleiotropy there is some overlap between the genetic signal for CAD and risk factors such as BP and cholesterol.

The ischaemic stroke metaGRS has comparable or higher predictive power than established risk factors

We next compared the performance of the metaGRS with established risk factors33 for predicting IS. We examined seven risk factors at the first UKB assessment: LDL cholesterol, SBP, family history of stroke, BMI, diabetes diagnosed by a doctor, current smoking, and hypertension (an expanded definition based on SBP/DBP measurements, BP medication usage, self-reporting, and hospital records; “Methods” section).

As expected, established risk factors were positively associated with incident IS, with hypertension being the strongest risk factor (Supplementary Fig. 7). Notably, the HR of the metaGRS (incident IS HR = 1.25 per s.d.) was similar to that of SBP (incident IS HR = 1.28 per s.d., where the s.d. of SBP was 21.7 mm Hg) and current smoking (incident IS HR = 1.25, s.d. = 0.3) (Supplementary Fig. 7).

Comparison of the C-index for time to incident IS revealed that BP phenotypes, hypertension and SBP (C = 0.590 [95% CI 0.577–0.603]; C = 0.584 [95% CI 0.570–0.598], respectively), had the largest C-indices, followed by the metaGRS (C = 0.580 [95% CI 0.566–0.593]) and the remaining established risk factors (Fig. 4). Notably, the metaGRS had a greater C-index than family history of stroke (C = 0.558, 95% CI 0.544–0.572; Fig. 4). The metaGRS and hypertension contained similar additional information on top of the other risk factors; adding either the metaGRS or hypertension to the six other risk factors yielded similar predictive power, C = 0.629 (95% CI 0.615–0.643) and C = 0.628 (95% 0.614–0.641), respectively. Finally, adding both the metaGRS and hypertension to the six risk factors yielded the model with the highest C-index, C = 0.637 (95% CI 0.623–0.650) (Fig. 4). Note that LDL-cholesterol was not included in this analysis as it had only weak associations with stroke and is not considered a major stroke risk factor.

Fig. 4: The metaGRS for ischaemic stroke has comparable or higher predictive power than established risk factors. Shown are the C-indices for incident stroke in the UKB validation set comparing the metaGRS with established risk factors. The reference model included the genotyping chip and 10 genetic PCs. Results are for the UKB validation set, excluding prevalent stroke events (n = 390,849). Red circles represent genetic/genomic scores; black circles represent non-genetic scores. Error bars represent 95% confidence intervals. Full size image

The metaGRS contributes to ischaemic stroke risk independent of established risk factors

Given that the metaGRS is composed of GRSs for stroke and stroke risk factors, we conducted several complementary analyses to assess the association of the metaGRS with these risk factors, and whether the metaGRS was associated with IS risk independently of these risk factors. As expected, the IS metaGRS was positively and significantly associated with all seven risk factors (Supplementary Table 2). Adjusting for these risk factors as well as BP-lowering and/or lipid-lowering medication status only modestly attenuated the association of the metaGRS with incident IS (Supplementary Fig. 8), indicating that the information contained in the metaGRS was only partially explained by these factors. On the other hand, adjusting for the metaGRS modestly but consistently attenuated the association of each risk factor itself with IS risk (Supplementary Fig. 7). There was no evidence for statistical interaction of the metaGRS effects on incident IS with medication status at assessment (Wald test in logistic regression, P = 0.23 and P = 0.82 for interaction of the metaGRS with BP medication and cholesterol-lowering medication, respectively).

Predicting ischaemic stroke risk with established risk factors and the metaGRS

The clinical utility of a GRS depends on its performance in combination with established risk factors and risk models. To examine this, we conducted analyses integrating information on risk factor levels based on (i) recent ACC/AHA/AAPA/ABC/ACPM/AGS/APhA/ASH/ASPC/NMA/PCNA guidelines34 (SBP < 120 mm Hg); (ii) AHA/ASA guidelines for primary prevention of stroke33 (BMI < 25 kg m−2); (iii) smoking status and diabetes status. We used Cox models of these established risk factors and the metaGRS together with the estimated baseline cumulative hazards to predict cumulative incidence of IS for individuals with a high metaGRS (top 1%), average metaGRS (50%), and low metaGRS (bottom 1%) along with two levels of risk factors: (i) meeting guideline targets for the above risk factors34 and (ii) the following combination of risk factors representative of an individual at typical stroke risk: SBP = 140 mm Hg, BMI = 30 kg m−2, current smoking, and no diagnosed diabetes.

The predicted risk of IS for individuals with a high metaGRS (top 1%) and high levels of risk factors was maximal by age 75, reaching a cumulative incidence of 8.5% (95% CI 5.2–11.6%) for males and 5.1% (95% CI 3.1–7.1%) for females (Fig. 5a). Effective reduction in the levels of the modifiable risk factors (SBP, BMI, and smoking) to match guideline targets was predicted to result in a substantial reduction in risk, down to 2.8% (95% CI 1.7–3.9%) for males and 1.7% (95% CI 1.0–2.4%) for females by age 75, thus substantially compensating for the high genomic risk.

Fig. 5: Predicted cumulative incidence of ischaemic stroke. Shown is the predicted cumulative incidence of IS in subjects with either (a) high levels of the metaGRS along with different risk factor levels (red: outside the guidelines; cyan: within the guidelines); or (b) risk factors within accepted guidelines along with different levels of the metaGRS (cyan: top 1% of the metaGRS; grey: middle 50% of the metaGRS; dark blue: bottom 1% of the metaGRS). Results are based on the UKB validation set, excluding prevalent stroke events (n = 390,849). Error bars represent 95% confidence intervals. Full size image

Conversely, for individuals matching the guidelines for established risk factors (Fig. 5b), there were notable differences in IS incidence for individuals in the top (1%) compared with the bottom (1%) of the metaGRS; with 2.8% (95% CI 1.7–3.9%) vs. 1.2% (95% CI 0.7–1.7%) in males and 1.7% (95% CI 1.0–2.4%) vs. 0.7% (95% CI 0.4–1.0%) in females, respectively, by age 75. These results further indicate that the metaGRS captures residual risk of stroke not quantified by existing risk factors.