UK Biobank participants

The first release of imputed genotype data from UK Biobank (N = 152,249) was used as a discovery cohort, and an unrelated set of participants from the second release (N = 326,565) as a replication cohort. Imputed genotype data from the third UK Biobank genoype data release were used for replication. Participants who self-reported as being of British descent (data field 21000) and were classified as Caucasian by principal component analysis (data field 22006) were included in the analysis. Genetic relatedness pairing was provided by the UK Biobank (Data field 22011). Participants were removed due to relatedness based on kinship data (estimated genetic relationship > 0.044), poor genotyping call rate (<95%), high heterozygosity (Data field 22010), or sex-errors (Data field 22001). After filtering, 116,138 participants remained in the discovery cohort and 246,361 in the replication cohort.

Ethics

Ethical approval to collect participant data was given by the North West Multicentre Research Ethics Committee, the National Information Governance Board for Health & Social Care, and the Community Health Index Advisory Group. UK Biobank possesses a generic Research Tissue Bank approval granted by the National Research Ethics Service (http://www.hra.nhs.uk/), which lets applicants conduct research on UK Biobank data without obtaining separate ethical approvals. All participants provided signed consent to participate in UK Biobank48.

Genotyping, imputations, and QC

Genotyping in the discovery cohort had been performed on two custom-designed microarrays: referred to as UK BiLEVE and Axiom arrays, which genotyped 807,411 and 820,967 SNPs, respectively. Imputation had been performed using UK10K49 and 1000 genomes phase 350 as reference panels. Prior to analysis, we filtered SNPs based on call rate (--geno 0.05), HWE (P > 10−20 (Fischer’s exact test), MAF (--maf 0.0001), and imputation quality (Info > 0.3) resulting in 25,472,837 SNPs in the discovery cohort. The third release of data from the UK Biobank contained genotyped and imputed data for 488,366 participants (partly overlapping with the first release). For our replication analyses, we included an independent subset that did not overlap with the discovery cohort. Genotyping in this subset was performed exclusively on the UK Biobank Axiom Array. This dataset included 47,512,111 SNPs that were filtered based on HWE (P < 10−20, Fischer’s exact test), call rate > 95% (--geno 0.05), Info > 0.3, and MAF > 0.0001. All genomic positions are in reference to hg19/build 37.

Phenotypic measurements

The phenotypes used in this study derive from impedance measurements produced by the Tanita BC-418MA body composition analyzer. Participants were barefoot, wearing light indoor clothing, and measurements were taken with participants in the standing position. Height and weight were entered manually into the analyzer before measurement. The Tanita BC-418MA uses eight electrodes: two for each foot and two for each hand. This allows for five impedance measurements: whole body, right leg, left leg, right arm, and left arm (Fig. 1a). Body fat for the whole body and individual body parts had been calculated using a regression formula, that was derived from reference measurements of body composition by DXA (Fig. 1b) in Japanese and Western subjects. This formula uses weight, age, height, and impedance measurements51 as input data. Arm, and leg fat masses were averaged over both limbs. Arm, leg, and trunk fat masses were then divided by the total body fat mass to obtain the ratios of fat mass for the arms, legs and trunk, i.e., what proportion of the total fat in the body is distributed to each of these compartments. These variables were analyzed in this study and were named: AFR, LFR, and TFR.

Correlations between fat ratios and anthropometric traits

Phenotypic correlations between fat distribution ratios and anthropometric traits were estimated by calculating squared semi-partial correlation coefficients for males and females separately, using anova.glm in R. Adipose tissue ratios (AFR, LFR or TFR) were set as the response variable. BMI, waist circumference, waist circumference adjusted for BMI, waist-to-hip ratio, height, or one of the other ratios were included as the last term in a linear model with age and principal components as covariates. The reduction in residual deviance, i.e., the reductions in the residual sum of squares as BMI, waist circumference, waist circumference adjusted for BMI, waist-to-hip ratio, height, or one of the other ratios was added to the model, is presented as percentages of the total deviance of the null model in supplementary Table 8.

Genome-wide association studies for body fat ratios

A two-stage GWAS was performed using a discovery and a replication cohort. Sex-stratified GWAS were performed in the discovery cohort for each trait. A flowchart that describes the steps taken for the genetic analyses is included as supplementary Fig. 6. Prior to running the GWAS, body fat ratios were adjusted for age, age squared and normalized by rank-transformation separately in males and females using the rntransform function included in the GenABEL library52. GWAS was performed in PLINK v1.90b3n18 using linear regression models with the age-adjusted and rank-transformed AFR, LFR, and TFR as the response variables and the SNPs as explanatory variables. 50,000 participants of UK Biobank were genotyped on a separate array as part of the UK BiLEVE project. A batch variable was used as covariate in the GWAS for the discovery analyses to adjust for genotyping array (UKB Axiom and UK BiLEVE) as well as for other differences between UK BiLEVE and UKB Axiom-genotyped participants. We also included the first 15 principal components and sex (in the sex-combined analyses) as covariates in the GWAS. LD score regression intercepts (see further information below), calculated using ldsc17, were used to adjust for genomic inflation, by dividing the square of the t-statistic for each tested SNP with the LD-score regression intercept for that GWAS, and then calculating new P-values based on the adjusted t-statistic. We used a threshold of P < 10−7, after adjusting for LD score intercept, as a threshold for significance in the discovery cohort.

The --clump function in PLINK was used to identify the number of independent signals in each GWAS. This function groups associated SNPs based on the linkage disequilibrium (LD) pattern. The parameters for clumps were set to: --clump-p1 1*10−7, --clump-p2 1*10−7, --clump-r2 0.10, and --clump-kb 1000. This function groups SNPs within one million base pairs that were associated with the trait at P < 1*10−7. After running --clump in PLINK, conditional analyses were also performed for each locus conditioning for the lead SNP, but no further signals were identified. Several associations were found in more than one of the three body fat ratios (AFR, TFR, or LFR) or strata (males, females, or sex-combined) and different lead SNPs were observed for different traits and strata at several loci. To assess whether these represented the same signal, we assessed the LD between overlapping lead SNPs in PLINK. SNPs in low LD (R2-value < 0.05) were considered to represent independent signals. We then performed conditional analysis in PLINK, conditioning on the most significant SNPs across all phenotypes and strata. Lead SNPs with a P < 1*10−7 after conditioning on other potentially linked R2-value ≥ 0.05 lead SNPs, were considered as being independent signals. For each independent signal, the lead SNP (lowest P-value) was taken forward for replication. Bonferroni correction was used to correct for multiple testing during replication and p-values < 0.05/135 were considered to be statistically significant. Meta analyses of results from the discovery and replication cohorts was performed with the METAL software53 for all independent associations that were taken forward for replication.

SNP heritability and genetic correlations

We estimated SNP heritability and genetic correlations using LD score regression (LDSC), implemented in the ldsc software package17. Only SNPs that were included in HapMap3 were included in these analyses. LDSC uses LD patterns and summary stats from GWAS as input. For genetic correlations, we performed additional sex-stratified GWAS in the UK biobank (using the same covariates as for the ratios) for standard anthropometric traits, BMI, height, WC, WHR, WCadjBMI, and WHRadjBMI, in the discovery cohort. GWAS summary stats were filtered for SNPs included in HapMap3 to reduce likelihood of bias induced by poor imputation quality. After this filtering, 1,164,192 SNPs remained for LDSC analyses. LD scores from the European data of the 1000 Genomes project (including LD patterns for all the HapMap3 SNPs) for use with LDSC were downloaded from the Broad institute at: https://data.broadinstitute.org/alkesgroup/LDSCORE/eur_w_ld_chr.tar.bz2. Genetic correlations between the three body fat ratios and anthropometric traits were assessed by cross-trait LD score regression.

Overlap with findings from previous GWAS

Lead SNPs from all independent signals in our analyses were cross-referenced with the NHGRI-EBI catalog of published genome-wide association studies (GWAS Catalog—data downloaded on 23 April 2018)19 to determine whether body fat ratio-associated signals overlapped with previously identified anthropometric associations from previous GWAS. We used a cut-off of R2 < 0.1 between SNPs from our analyses and anthropometric trait-associated SNPs (P < 5*10−8) from GWAS catalog to determine any overlap with findings from previous GWAS. LD between data in the GWAS catalog and our lead SNPs were calculated using PLINK v1.90b3n18. In addition, lead SNPs at Body fat ratio-associated loci that potentially overlapped (R2 > 0.1) with signals from previous GWAS were tested for association with standard anthropometric traits (BMI, height, WC, WHR, WCadjBMI, and WHRadjBMI) in the UK biobank discovery cohort using PLINK v1.90b3n18 through linear regression modeling and including sex, age a batch variable and 15 principal components as covariates. Here, a P < 1e−7 was considered significant.

Functional annotation of associated loci

Associated loci were investigated for overlap with eQTLs from the GTEx project33. The threshold for significance for the eQTLs was set to 2.3*10−9 in agreement with previous studies54. The strongest associated SNP for each tissue and gene in the GTEx dataset was identified. We then estimated the LD between the top eQTL SNPs and the lead SNP for each independent association from our analysis. If a SNPs from our analyses were in LD (R2 > 0.8) with a lead eQTL SNP the two signals were considered overlapping.

Lead SNPs, and all SNPs in LD (R2 > 0.8) with a lead SNP from our analyses (LD determined in the UK biobank cohort in PLINK) were cross-referenced with dbSNP (human 9606 b150) in order to identify potentially deleterious intragenic variants in LD (R2 > 0.8) with the body fat ratio-associated variants. Polyphen and SIFT-scores for the missense variants (extracted from Ensembl—www.ensembl.org) were used to assess the deleteriousness of the body fat ratio-associated variants.

Enrichment analysis

To identify the functional roles and tissue specificity of associated variants, we performed tissue and gene-set enrichment analyses using DEPICT39. For the gene-set enrichment in DEPICT, gene expression data from 77,840 samples have been used to predict gene function for all genes in the genome based on similarities in gene expression. In comparison to standard enrichment tools that apply a binary definition to define membership in a set of genes that have been associated with a biological pathway or functional category (genes are either included or not included), in DEPICT, the probability of a gene being a member of a gene set has instead been estimated based on correlation in gene expression. This membership probability to each gene set has been estimated for all genes in the human genome and the membership probabilities for each gene have been designated reconstituted gene sets. A total of 14,461 reconstituted gene sets have been generated which represent a wide set of biological annotations (Gene Ontology [GO], KEGG, REACTOME, Mammalian Phenotype [MP], etc.). For tissue enrichment in DEPICT, microarray data from 37,427 human tissues have been used to identify genes with high expression in different cells and tissues.

For the enrichment analyses, we performed sex-stratified GWAS for AFR, LFR and TFR on the combined cohort, i.e., the discovery and replication cohorts, including 195,043 females and 167,408 males, in order to achieve higher power. The clump functionality in PLINK is used to determine associated loci. The P-value cut-off for association for clump was set at P < 10−7. In the enrichment analyses, DEPICT assesses whether the reconstituted gene sets are enriched for genes within trait-associated loci39. The false discovery rate (FDR)55 was used to adjust for multiple testing. Twelve analyses were run in total (tissue enrichment and gene-set enrichment) and FDR < 0.05/12 was considered significant.

Interaction between SNPs and sex

We used the GWAMA software31 to test for heterogenous effects of associated SNPs between sexes. In GWAMA, fixed-effect estimates of sex-specific and sex-combined beta coefficients and standard errors are calculated from GWAS summary statistics to test for heterogeneous allelic effects between females and males. GWAMA obtains a test-statistic by subtracting the sex-combined squared t-statistic from the sum of the two sex-specific squared t-statistics. This test statistic is asymptotically χ2-distributed and equivalent to a normal z-test of the difference in allelic effects between sexes. Lead SNPs that replicated were tested for heterogeneity between sexes for the trait that they were associated with. This corresponds to 30 tests for AFR, 44 for LFR, and 66 for TFR. Bonferroni correction was used to correct for multiple testing and P-values < 0.05/140 = 3.57*10−4 (χ2 test) were considered to be significant. Summary statistics from the replication cohort were used in order to maximize statistical power.