Overview of analyses

The objective of this work was to disentangle genetic and environmental components of health-related traits, linked to geographic variation. We explored a Scottish population consisting of ~11,000 individuals with different degrees of kinship, genotyped for ~500K markers, phenotypes for 11 traits (8 anthropometric and 3 metabolic traits), geographic covariates (principal components) reflecting the regional genetic structure of the data (gPCs) and a large set of environmental covariates (socioeconomic and lifestyle (SELS)). We fitted jointly genetic and environmental information in a range of statistical models, in an innovative approach to disentangle the causes of regional variation. For more information, see Supplementary Tables 1, 2 and 3 and Supplementary Methods. An overview of analyses and models is shown in Fig. 1.

Fig. 1 Overview of the models and analyses performed. G, Genomic Relationship matrix; K, Kinship matrix; C, Couples matrix; S, Siblings matrix; gPCs, Geographic Principal Components; SELS, Socioeconomic and lifestyle covariates Full size image

Regional differences in traits within Scotland

In order to illustrate the geographic differences existing in Scotland, in the Basal Model (Fig. 1) we adjusted each trait for sex, age and clinic and tested the traits for differences between the 32 regions (council areas, defined from the individual’s postcode of residence; Supplementary Table 3). For 9 of the 11 traits studied, differences between regions (i.e., council areas) were significant at a 0.05 level (see Table 1, first column).

Table 1 Significance of region on phenotypes in the benchmark framework Full size table

To test if the regional differences detected were due to the genetic relatedness of the sample, we adjusted for kinship by fitting a genomic relationship matrix (G) together with sex, age and clinic in a mixed model analysis (Family Model). We tested the residuals from this model for remaining regional differences. When including the genomic relationship matrix in the model, the differences between regions disappeared for two traits (height and body fat measured by bioelectric impedance analysis (BIA fat) (Table 1, second column), suggesting that the regional variation detected in the Basal Model for these two traits was due to the genetic relatedness of the sample. Nonetheless, for waist circumference, hips circumference, waist-to-hips ratio (WHR), body mass index (BMI), a body mass index (ABSI), creatinine and high density lipoprotein (HDL) levels regional differences still exist (α = 0.05) after adjusting for the genetic relatedness and family structure in the sample.

We then explored if the regional differences could be explained by the population genetic structure of the sample, i.e., the genetic differences between the regions. To do that we adjusted for ten geographic principal components (gPCs) that represent geographical population genetic structure in the cohort. The gPCs were calculated using a subset of unrelated individuals and unlinked markers and then extrapolated to the rest of the population. They reflect the genetic differences between regions as shown in Amador et al.14 (for more information see Methods). We adjusted for the gPCs (together with a genomic relationship matrix, sex, age and clinic in the Structure Model) and we used the residuals of the model to test whether the regional differences remained significant (Table 1, third column). For all six traits with significant regional differences after the previous analyses, these differences remained significant (α = 0.05) after adjusting for the gPCs, i.e., the genetic differences between regions do not explain the regional differences in the studied traits.

Next, we examined if the regional differences could be explained by the environmental differences measured in the cohort by adjusting for the SELS covariates. We fitted a model adjusting for a genomic relationship matrix and SELS covariates, representing this environmental information, together with sex, age and clinic (Environment Model). When we tested the significance of the region in the residuals of this model (Table 1, fourth column), we observed that only ABSI and creatinine showed significant differences (α = 0.05) between regions and these differences had become non-significant for waist circumference, BMI, WHR and HDL, indicating that the regional differences are explained by the measured SELS variables. We fitted a final model including both the gPCs and the SELS covariates (Table 1, Structure and Environment Model) to corroborate the results. The results obtained for this model were very similar to those from the Environment Model, reinforcing the conclusion that the SELS covariates are responsible for the regional differences observed.

A visualisation of the changes in the standardised residual means for each trait per region before and after adjusting for the SELS variables was created using latitude and longitude of Scottish postcodes in R16. This is shown for BMI in Fig. 2 and for all traits in Supplementary Fig. 1. The only remaining regional differences were for creatinine and ABSI. Since our results suggest that those were not due to the geographical population genetic structure (Table 1, Structure Model), these remaining differences are likely to be caused by other environmental variables not measured in our data and not associated with family genetic structure or family environment.

Fig. 2 Regional values of BMI before and after adjusting for the environmental variables. Changes in the standardised means of BMI per region before (left panel) and after (right panel) adjusting for all the lifestyle and socioeconomic covariates. Yellow: regions with less than 20 individuals, not considered Full size image

We repeated the whole set of analyses including a larger set of genetic and environmental matrices (G: genomic relationship matrix, K: kinship matrix, C: couples matrix, S: siblings matrix; see Fig. 1: Full models F, S, E and S+E) combining the different set of covariates and the results observed were similar to those of the Benchmark models described above: most regional differences were removed when fitting the SELS variables (Supplementary Table 4).

Heritability estimates and covariate effects

We evaluated the proportion of the variance explained by all the components fitted in several mixed models to further explore genetic and environmental variation in the 11 traits studied following Xia et al.17 Using mixed-model analysis18, 19 we partitioned the phenotypic variance into components representing genetic or environmental effects. We used two genetic relationship matrices (G and K) to account simultaneously for the genetic sharing among distant and closely related individuals7; and two environmental relationship matrices that represented shared environments between members of a couple (C) and siblings (S)17 (Fig. 1, Full models).

The proportion of the phenotypic variance explained by the components in a Full Model is shown in Table 2. The table includes the results for two types of analyses: the Family Model including only the matrices, sex, age and clinic, or the Structure and Environment Model (S+E) including the matrices and gPCs and SELS covariates together with sex, age and clinic.

Table 2 Proportion of the phenotypic variance explained by genomic (G: genomic relationship matrix, K: kinship matrix) and environmental matrices (C: couple matrix, S: sibling matrix) Full size table

The estimates of the genotyped-single-nucleotide polymorphism (SNP) heritability (h 2 g , proportion of the phenotypic variance captured by matrix G) and of the pedigree heritability (h 2 kin , captured by matrix K) did not change significantly when including the extended set of covariates in the model, even for those traits where the environmental covariates contributed to regional differences. Furthermore, for most of the traits the estimates of variance due to the shared environments of couples and siblings (C and S) were robust to the inclusion of the extended set of SELS variables (Table 2). This is illustrated for two traits in Fig. 3.

Fig. 3 Heritability estimates from models with different covariates. Proportion of the variance in two different traits captured by each of the genetic or environmental matrices fitted: Model F: including four matrices and sex, age, clinic as covariates (blue bars); Model S+E: including four matrices, gPCs, SELS and sex, age, clinic as covariates (green bars). Error bars show the standard errors of the estimates Full size image

The proportion of the variance captured from the couple environment (matrix C) was significant for eight traits although for HDL the significance disappeared after including the full set of environmental covariates. This would suggest that, for HDL, some of the phenotypic similarities observed in couples can be accounted for by the recorded lifestyle or socioeconomic variables. In addition, the variance captured by the sibling environment (matrix S) was detectable only for two traits (BIA fat and TC). For creatinine and HDL, the variance captured by sibling environment was not different from zero in the Family Model, but became significant after including the whole set of covariates. In all the cases the differences in proportion of the variance captured between the Structure and Environment Model (including the whole set of covariates) and the Family Model explored were subtle.

Table 3 shows the variance explained by the SELS covariates together with the gPCs in the models including a G matrix (details of each individual covariate are shown in Supplementary Table 5). The amount of variance explained by SELS covariates ranged between 0.64 and 35.57% while the gPCs explained always < 0.5% of the variance for all traits. Scottish index of multiple deprivation (SIMD) was the covariate affecting most traits (all except for creatinine) and years of education also explained substantial variance for several traits, with effects on most of the body measurements. Activity level explained a large amount of variance (up to 18.9%) for traits like HDL, BMI, weight and BIA fat. The dietary variables showed effects on many traits but overall explained little variance. For all traits the SELS covariates explained more variance than the geographical population genetic structure, which is consistent with the results showing that the regional differences in the obesity-related traits are associated with environmental rather than genetic variation between the regions.