Data resources

We included data from the Global Lipid Genetics Consortium (European ancestry samples only, GLGC), The UK Household Longitudinal Study (UKHLS), two isolated populations from the Greece Hellenic Isolated Cohorts (HELIC), a rural West Ugandan population from the African Partnership for Chronic Disease Research (APCDR-Uganda) study, China Kadoorie Biobank (CBK), and Biobank Japan (BBJ). Raw genotype and phenotype data were available for UKHLS, APCDR-Uganda, CKB, HELIC-MANOLIS, and HELIC-Pomak. All participants provided written informed consent and each study obtained approval from ethical review boards. The APCDR-Uganda study was approved by the Uganda Virus Research Institute, Science and Ethics Committee (Ref. GC/127/10/10/25), the Uganda National Council for Science and Technology (Ref. HS 870), and the U.K. National Research Ethics Service, Research Ethics Committee (Ref. 11/H0305/5). The HELIC study was approved by the Harokopio University Bioethics Committee. The UKHLS study has been approved by the University of Essex Ethics Committee and the nurse data collection by the National Research Ethics Service (10/H0604/2). For CKB, central ethics approvals were obtained from Oxford University, and the China National CDC. In addition, approvals were also obtained from institutional research boards at the local CDCs in the 10 regions. BBJ was approved by the ethics committees of RIKEN Center for Integrative Medical Sciences and the Institute of Medical Sciences, the University of Tokyo. Our analyses were based on summary statistics for BBJ and GLGC. The details of genotyping, QC and imputation for all studies are summarised in Supplementary Table 5. Descriptive information about the sample sets is provided in Supplementary Table 6. Details of the quality control, imputation, genome-wide association analyses and ethical approval have also been previously described for GLGC14, BBJ13, HELIC10, APCDR-Uganda8 and UKHLS12. Each study confirmed sample ethnicity through PCA which rules out sample overlap between studies.

For CKB, 102,783 participants were genotyped using 2 custom-designed Affymetrix Axiom® arrays including up to 803 K variants, optimised for genome-wide coverage in Chinese populations. Stringent quality control included SNP call rate > 0.98, plate effect P > 10−6, batch effect P > 10−6, HWE P > 10−6 (combined 10df χ2 test from 10 regions), biallelic, MAF difference from 1KGP EAS < 0.2, sample call rate > 0.95, heterozygosity < mean + 3 SD, no chrXY aneuploidy, genetically-determined sex concordant with database, resulting in genotypes for 532,415 variants present on both array versions. Imputation into the 1,000 Genomes Phase 3 reference (EAS MAF > 0) using SHAPEIT version 3 and IMPUTE version 4 yielded genotypes for 10,276,633 variants with MAF > 0.005 and info > 0.3.

In CKB, lipid levels were regressed against eight principle components, region, age, age2, sex, and − for LDL and TG − fasting time2 for the single SNP association analysis. For CKB, PCs were included in both single SNP and PRS association analyses to improve inflation. Recruitment for CKB occurred at 10 different rural and urban locations across China leading to somewhat increased population structure. The resulting inflation estimates lambda after PC adjustment were 1.063, 1.050, and 1.053 for HDL, LDL, and TG, respectively. LDL levels were derived using the Friedewald formula. After rank-based inverse normal transformation, the residuals were used as the outcomes in the genetic association analyses using linear regression. Associations were carried out within a mixed model framework using BOLT-LMM38.

The single SNP association analysis for APCDR-Uganda was carried out within a mixed model framework using GEMMA39. Rank-based inverse normal transformation was applied to the lipid biomarkers after adjusting for age and gender. For Uganda, the inflation estimates lambda were 1.000, 1.004, and 1.005 for HDL, LDL, and TG, respectively.

Established lipid loci

A list of established lipid-associated loci was extracted from the latest Global Lipid Genetics Consortium (GLGC2017) publication15 reporting 444 independent variants in 250 loci associated at genome-wide significance with HDL, LDL, and triglyceride levels. We excluded three LDL variants where the association was not primarily driven by the samples with European ancestry. We assessed evidence for transferability of the loci, applied trans-ethnic colocalization and used them to construct genetic risk scores.

Reproducibility of established lipid loci

We assessed evidence that these established lipid signals are reproducible in other populations. For loci harbouring multiple signals, we only kept the most strongly associated variant. Out of the 444 loci, this left 170 HDL, 135 LDL and 136 TG variants. We distinguished major loci, i.e. those with p < 10−100 based on a score test in GLGC2017. For each lead SNP we identified all variants in LD (r2 > 0.6) based on the European ancestry 1000 Genomes data. We assessed whether the lead or any of the correlated variants, henceforth called credible set, displayed evidence of association in the target study. If this was not the case, we tested whether there was any other variant with evidence of association within a 50 Kb window. We used a p-value threshold of p < 10−3 based on a score test. This threshold was derived by computing the minimum p-value in 1000 random windows of 50 Kb for each study. Less than 5% of random windows had a minimum p < 10−3 for the non-European ancestry studies. While this p-value threshold might not be appropriate to provide conclusive evidence of reproducibility for individual loci, we used this to test evidence of reproducibility across sets of loci. These analyses excluded the HELIC studies because the smaller sample size makes it difficult to differentiate between lack of power and lack of reproducibility.

Trans-ethnic genetic correlations

We used the popcorn software30 to estimate trans-ethnic genetic correlations between studies while accounting for differences in LD structure. This provides an indication of the correlation of causal-variant effect sizes across the genome at SNPs common in both populations. Variant LD scores were estimated for ancestry-matched 1000 Genomes v3 data for each study combination. The estimation of LD scores failed for chromosome 6 for some groups. We therefore left out the major histocompatibility complex (MHC) region (positions 28,477,797 to 33,448,354) from chromosome 6 from all comparisons. Variants with imputation accuracy r2 < 0.8 or MAF < 0.01 were excluded. Popcorn did not converge for any of the studies with less than 20,000 samples. Therefore, results are presented for comparisons between GLGC2013, CKB and BBJ. We estimated effect rather than impact correlations. We used a Bonferroni correction to adjust for multiple testing of three traits with each other (p < 0.05/9 = 0.0056).

Genetic risk scores

As it was not possible to compute trans-ethnic genetic correlations for UKHLS, the HELIC cohorts, and APCDR-Uganda, we created genetic risk scores based on the established lipid loci and assessed their associations with serum lipid levels in these studies. We also tested the associations of GRS in CKB as raw data were available for this study as well. Age and sex were adjusted for by regressing them on the lipid biomarker values and using the residuals as outcomes for subsequent analyses. For CKB, we additionally adjusted for 20 PCs and region covariates in order to ensure population structure was accounted for. To ensure values are normally distributed, we used rank-based inverse normal transformation for all biomarkers and data sets which involves ordering values first and then assigning them to expected normal values. To make sure GRS were comparable across studies, we excluded variants that were absent, rare (MAF < 0.01) or badly imputed (r2 < 0.8) in any of the studies and variants that had different alleles from those in the GLGC. The variant with larger discovery p-value from each correlated pair of SNPs (r2 > 0.1) was also removed. These filters were applied based on each, UKHLS, HELIC, and APCDR-Uganda and then the intersection of variants was carried forward to generate GRS. Out of the 444 loci, this left 120, 103, and 101 variants for HDL, LDL and TG, respectively (Supplementary Table 7). We created trait-specific weighted GRS. The β-regression coefficients from SNP-trait associations in GLGC201715 were used as weights. All lipid biomarkers and scores were scaled to mean = 0 and standard deviation = 1 for each study, so that the regression coefficients represent estimates of the correlation between scores and lipid biomarkers.

We carried out association analyses between each genetic risk score and each lipid biomarkers using a linear mixed model with random polygenic effect implemented in GEMMA39 in order to account for relatedness and population structure. For CKB, we used BOLT-LMM because it is efficient for large samples. We used a Bonferroni correction to adjust for multiple testing of three GRS with three different lipid biomarker outcomes (p < 0.05/9 = 0.0056 for the score test).

Trans-ethnic colocalization

Differences in allele frequency, LD structure and sample size make it difficult to assess whether a given GWAS hit is transferable to samples with different ancestries. Therefore, we applied trans-ethnic colocalization. Colocalization methods test whether the associations in two studies can be explained by the same underlying signal even if the specific causal variant is unknown. The joint likelihood mapping (JLIM) statistic was developed by Chun and colleagues to estimate the posterior probabilities for colocalization between GWAS and eQTL signals and compare them to probabilities of distinct causal variants16:

$$\Lambda = \mathop {\sum}\limits_{i \in N_{\theta} ^{1}\left( {m^\ast} \right)} L_{1}\left( i \right) \times {\mathrm{log}}\frac{{L_{1}\left( i \right)L_{2}\left( i \right)}}{{\mathop {\mathrm{max}}\limits_{j

otin N_{\theta} ^{2}\left( i \right)} L_{1}\left( i \right)L_{2}\left( j \right)}}$$ (1)

Where i SNP1; m* lead SNP; L 1 (i) likelihood of SNP i being causal for trait 1; L 2 (i) likelihood of SNP i being causal for trait 2; \(N_\theta ^1\left( i \right)\), \(N_\theta ^2\left( i \right)\) sets of SNPs in LD with i; θ LD threshold.

JLIM explicitly accounts for LD structure. Therefore, we assessed whether it is suitable for trans-ethnic colocalization. For the reference sample set, it was possible to use genome-wide summary statistics for the analysis. For this set, LD scores were estimated using a subset of samples from the 1000 Genomes Project v3 that had matching ancestry to that study. The second sample set needed raw genotype data and LD was estimated directly for these samples. JLIM assumes only one causal variant within a region in each study. We therefore used small windows of 50Kb for each known locus to minimise the risk of interference from additional association signals. Distinct causal variants were defined by separation in LD space by r2 ≥ 0.8 from each other. We excluded loci within the MHC region due to its complex LD structure. We used a significance threshold of p < 0.05 given the evidence of association of the established lipid loci in Europeans and the overall evidence for shared causal genetic architecture across populations for most lipid traits from our other analyses. We compared each target study to UKHLS because of the study’s matched ancestry with the discovery study, high level of homogeneity in terms of ancestry, biomarker quantification and study design.

Simulation

To test the power of trans-ethnic colocalization to detect associations shared between pairs of populations with different ancestry, we ran JLIM on two sets of simulated traits with realistic effect size and environmental noise level. The first set of simulations used the same causal variant in both populations, whereas the second set of simulations used discordant causal variants. Causal variants were selected using the sample function in R, corresponding to a uniform random draw from the entire chromosome. We sampled 10,000 randomly chosen biallelic variants with MAF > 0.05 and simulated random phenotypes in UKHLS, CKB, APCDR-Uganda and 50,000 individuals with British ancestry from UK Biobank as the reference set. For UK Biobank we applied the QC and used the ancestry assignment provided by Bycroft et al.40. UKHLS was included as an ancestry-matched set in order to derive an upper limit estimate of the power. For each data set relatives were excluded. We also sub-sampled CKB to match the number of individuals in APCDR-Uganda in order to test whether the difference in performance was due to ancestry or sample size. We used a simple linear model to generate the phenotype for each individual i:

$$y_i = \beta \ast \left( {x_i - 1} \right) + \eta _i$$ (2)

where y is the phenotype value, β is the effect size, x is the number of the alternate alleles carried at the locus and η i ~N(0,σ2), where σ2 is the variance of the environmental noise and Cov(η I, η j )=0. We tested effect size estimate beta from 0.10, 0.15, 0.20, and 0.25 in order to represent a range similar to that observed for the major lipid loci15. We used σ2 = 1 to match the trait variances of the standardised phenotypes.

Comparison of transferable loci with non-transferable loci

We assessed whether there are any systematic differences between loci that are shared between European ancestry samples and APCDR-Uganda and loci that are not. We identified all loci with evidence of reproducibility based on the above definition that also had significant (p < 0.05) colocalization based on a permutation test. We only kept one variant per region. We contrasted them with loci where none of the evidence suggested generalisation: p > 0.05 for colocalization or missing result due to failed convergence, no variant with a lipid association at p < 10−3 in the region and the lead variant from the discovery study was not rare in APCDR-Uganda. We identified the nearest protein coding gene for each locus and carried out pathway analyses for the two sets using FUMA41. We also assessed the associations of the lead variants with body mass index (BMI) in European ancestry samples using results from a meta-analysis between the GIANT consortium and UK Biobank17. We used a Bonferroni adjusted p-value threshold.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.