Abstract The variation in weight within a shared environment is largely attributable to genetic factors. Whilst many genes/loci confer susceptibility to obesity, little is known about the genetic architecture of healthy thinness. Here, we characterise the heritability of thinness which we found was comparable to that of severe obesity (h2 = 28.07 vs 32.33% respectively), although with incomplete genetic overlap (r = -0.49, 95% CI [-0.17, -0.82], p = 0.003). In a genome-wide association analysis of thinness (n = 1,471) vs severe obesity (n = 1,456), we identified 10 loci previously associated with obesity, and demonstrate enrichment for established BMI-associated loci (p binomial = 3.05x10-5). Simulation analyses showed that different association results between the extremes were likely in agreement with additive effects across the BMI distribution, suggesting different effects on thinness and obesity could be due to their different degrees of extremeness. In further analyses, we detected a novel obesity and BMI-associated locus at PKHD1 (rs2784243, obese vs. thin p = 5.99x10-6, obese vs. controls p = 2.13x10-6 p BMI = 2.3x10-13), associations at loci recently discovered with much larger sample sizes (e.g. FAM150B and PRDM6-CEP120), and novel variants driving associations at previously established signals (e.g. rs205262 at the SNRPC/C6orf106 locus and rs112446794 at the PRDM6-CEP120 locus). Our ability to replicate loci found with much larger sample sizes demonstrates the value of clinical extremes and suggest that characterisation of the genetics of thinness may provide a more nuanced understanding of the genetic architecture of body weight regulation and may inform the identification of potential anti-obesity targets.

Author summary Obesity-associated disorders are amongst the leading causes of morbidity and mortality worldwide. Most genome-wide association studies (GWAS) have focused on body mass index (BMI = weight in Kg divided by height squared (m2)) and obesity, but to date no genetic association study testing thin and healthy individuals has been performed. In this study, we recruited a first of its kind cohort of 1,471 clinically ascertained thin and healthy individuals and contrasted the genetic architecture of the trait with that of severe early onset obesity. We show that thinness, like obesity, is a heritable trait with a polygenic component. In a GWAS of persistent healthy thinness vs. severe obesity with a total sample size of 2,927, we are able to find evidence of association in loci that have only been recently discovered using large cohorts with >40,000 individuals. We also find a novel BMI-associated locus at PKHD1 in UK Biobank highlighted by our association study. This work illustrates the value and increased power brought upon by using clinically ascertained extremes to study complex traits and provides a valuable resource on which to study resistance to obesity in an increasingly obesogenic environment.

Citation: Riveros-McKay F, Mistry V, Bounds R, Hendricks A, Keogh JM, Thomas H, et al. (2019) Genetic architecture of human thinness compared to severe obesity. PLoS Genet 15(1): e1007603. https://doi.org/10.1371/journal.pgen.1007603 Editor: Adam E. Locke, Washington University in Saint Louis School of Medicine, UNITED STATES Received: March 23, 2018; Accepted: August 2, 2018; Published: January 24, 2019 Copyright: © 2019 Riveros-McKay et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability: Data from the STILTS and SCOOP cohorts are available through the EGA using a Data Access Agreement (accession codes EGAD00010001622 and EGAD00010001623). Summary statistics of the STILTS vs SCOOP cohorts are available from the NHGRI-EBI GWAS catalog (https://www.ebi.ac.uk/gwas/summary-statistics). UKHLS data is available for download via EGA with accession code EGAS00001001232. The analyses presented in this study were based on data accessed through the UK Biobank. UK Biobank data is available to all researchers that submit a formal application. Funding: This work was supported by the European Research Council (ISF), Wellcome Trust (ISF, IB, EZ) (098497/Z/12/Z; WT098051, WT206194), Medical Research Council (ISF, SOR) (MRC_MC_UU_12012/5), NIHR Cambridge Biomedical Research Centre (ISF, IB, SOR), Bernard Wolfe Health Neuroscience Endowment (ISF), and the European Community’s Seventh Framework Programme (FP7/2007-2013) project Beta-JUDO n°279153 (ISF). Understanding Society: The UK Household Longitudinal Study, is led by the Institute for Social and Economic Research at the University of Essex and funded by the Economic and Social Research Council. The data were collected by NatCen and the genome wide scan data were analysed by the Wellcome Sanger Institute. This research was specifically funded by Wellcome Trust and MRC (Grant ref: 076467/Z/05/Z). GWAS data was generated by Sample Logistics and Genotyping Facilities at Wellcome Sanger Institute and LabCorp (Laboratory Corporation of America) using support from 23andMe. The Genotype-Tissue Expression (GTEx) Project was supported by the Common Fund of the Office of the Director of the National Institutes of Health, and by NCI, NHGRI, NHLBI, NIDA, NIMH, and NINDS. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: The authors have declared that no competing interests exist.

Introduction The rising prevalence of obesity is driven by changes in the environment including the consumption of high calorie foods and reduced levels of physical activity [1]. However, within a given environment, there is considerable variation in body weight; some people are particularly susceptible to severe obesity, whilst others remain thin [2,3]. Family, twin and adoption studies have consistently demonstrated that 40–70% of the variation in body weight can be attributed to heritable factors [4]. As a result, many studies have focused on the genetic basis of body mass index (BMI) and/or obesity. To date >250 common and low-frequency obesity-susceptibility loci have been identified [5–10]. Additionally, studies of people at one extreme of the distribution (severe obesity) have led to the identification of rare, penetrant genetic variants that affect key molecular and neural pathways involved in human energy homeostasis [11–14]. These findings have provided a rationale for targeting these pathways for therapeutic benefit. In contrast, little is known about the specific genetic characteristics of persistently thin individuals (thinness defined using WHO criteria BMI≤18kg/m2). Understanding the mechanisms underlying thinness/resistance to obesity may highlight novel anti-obesity targets for future drug development. A small number of previous studies have found that thinness appears to be a trait that is at least as stable and heritable as obesity [15–18]. A large study of 7,078 UK children and adolescents, found that the strongest predictor of child/adolescent thinness was parental weight status. The prevalence of thinness was highest (16.2%) when both parents were thin and progressively lower when both parents were normal weight, overweight or obese [19]. One approach to studying thinness is to study individuals from a population-based cohort for a quantitative or continuous trait. For example, it is possible to generate a “case-control” study by taking the extremes of the population distribution for a continuous trait such as BMI, an approach used effectively by Berndt et al. 2013 [20] who analysed the top and bottom 5% in cohorts participating in the GIANT Consortium. However, by their very definition, such population-based cohorts often contain a limited number of people at the “extremes” (i.e. severe obesity and thinness) [20]. To date, other GWAS approaches that included thin individuals have either used them exclusively as controls to contrast with extreme obesity [21], or have not ascertained for healthy thinness [22]. Here, we use a different study design, and one that has been used to increase power to detect genetic association, in particular for disorders where there is a large environmental component (e.g. asthma, type 2 diabetes and obesity), enriching our case series with affected individuals that may be more genetically loaded. This selection is usually done by selecting individuals who may have a more extreme form of disease, are younger (less time for environment to impact their disease) and perhaps have family members also affected with the same condition. To complement this approach to the selection of cases, controls are also selected to increase the chances that they do not have the disease or are unlikely to develop the disease later in life [21]. This is normally done by selecting contrasting controls, or “super-controls”. However, the low prevalence of thinness in countries such as the UK and the fact that people who are well but constitutionally thin do not routinely come to medical attention, poses challenges to recruitment of a cohort of healthy thin individuals. We were able to take advantage of the UK National Health Service (NHS) research infrastructure to recruit from primary care (Methods) using body mass index (BMI: weight in kg/height in metres2) criteria and personal review of individual case files to identify a cohort of approximately 2000 UK European descent thin adults (Study Into Lean and Thin Subjects, STILTS cohort; mean BMI = 17.6 kg/m2) who are well, without medical conditions or eating disorders (Methods). 74% of the STILTS cohort have a family history of persistent thinness throughout life, suggesting we have enriched for genetically driven thinness. Here, we present a new, and the largest-to-date, GWAS focused on persistent healthy thinness and contrast the genetic architecture of this trait with that of severe early onset obesity ascertained in the clinic. We explored whether the genetic loci influencing thinness are the same as those influencing obesity, i.e., are these two clinically ascertained traits reverse sides of the same “coin”, or whether there are important genetic differences between them. We show that persistent thinness and severe early onset obesity are both heritable traits (h2 = 28.07% and h2 = 32.33%, respectively) that share a number of associated loci, and both are enriched for established BMI associated loci (binomial p = 3.05x10-5 and 9.09x10-13, respectively). Nonetheless, we also detected important differences, with some loci more strongly associated at the upper clinical end of the BMI distribution (e.g. FTO), some at the lower end (e.g. CADM2), whilst other loci are equivalently associated with both clinical ends of the BMI spectrum (e.g. MC4R). Simulation tests showed that these results did not significantly deviate from additive effects and most likely reflect the different degrees of extremeness present in our clinically ascertained cohorts, where severely obese individuals represent a more significant deviation from the mean than healthy thin individuals do (the same degree of thinness may not be compatible with healthy human life). These data support expansion of genetic studies of persistent thinness as an approach to gain further insights into the biology underlying human energy homeostasis, and as an alternative approach to uncovering potential anti-obesity targets for drug development.

Discussion Here we present results from the largest to-date GWAS performed on healthy individuals with persistent thinness and provide the first insights into the genetic architecture of this trait. To our knowledge, there are only two other studies using thin individuals with comparable mean BMIs [21,22]. The study by Hinney et al. [21] (N = 442), was only able to detect FTO at genome-wide significance level with rs1121980 having a similar effect to that which we report (OR = 1.66 vs OR = 1.69 in our data). In the Scannell Bryan et al. [22] study, Bangladeshi individuals were reportedly thin and malnourished, and a single suggestive association was found with an intronic variant in NRXN3 (rs12882679, p = 9.57x10-7) which is not significant in our study (p = 0.77). Using genome-wide genotype data we show that persistent healthy thinness, similar to severe obesity (h2 = 32.33%), is a heritable trait (h2 = 28.07%). Persistent healthy thinness and severe childhood obesity are negatively correlated (r = -0.49, 95% CI [-0.17, -0.82], p = 0.003), and share a number of genetic risk loci. Nonetheless, the genetic overlap between the two clinically ascertained traits appears to be incomplete, as highlighted by some loci which were more strongly associated at one end of the BMI distribution (e.g. CADM2), while others, appeared to exert effects across the entire BMI spectrum (e.g. MC4R [9,33,34]). Further exploration by simulation demonstrated that these differences are likely to be due to the different degrees of extremeness of the two clinical cohorts (i.e. a similar degree of thinness to that of the obese cohort may not be compatible with healthy human life) and not due to a deviation from additive effects of the tested loci on BMI, with the possible exception of CADM2 which deviated from expectation with nominal significance in both the obese and the thin analysis (S3 Table). This is in contrast with earlier studies which suggested larger effects at the higher end of the BMI distribution [35,36] but in agreement with more recent observations contrasting the bottom 5% and top 5% of the BMI tails where associated loci were also consistent with additive effects [20]. This is also in contrast with a previous study on height, where a deviation from additivity was found, but only for short individuals in the bottom 1.5% of the distribution [37], which suggests that analysis focused just on the most extreme individuals may be warranted. Focusing on the 97 previously established BMI associated loci [24], we show that the percentage of phenotypic variance explained by these loci is lower in persistently thin (4.33%) compared to obese individuals (10.67%), and that the effect of an increase/decrease in the BMI genetic risk score was much larger, on average, for obese individuals than for thin individuals (one standard deviation increase in the standardised BMI genetic risk score of 1.94, 95% CI (1.83, 2.07) and 1.50, 95% CI (1.42, 1.59), respectively) which is consistent with the difference in BMI units amongst categories. And, although our analysis using age-matched controls from ALSPAC suggested that the observed differences in ORs, comparing obese vs control individuals to controls vs thin individuals, was unlikely to be due to age effects, we cannot completely exclude the possibility that different effects of age and sex in our discovery cohorts (S1 Table), and gene-by-environment interactions, could be influencing some of the results we observe. For example, gene-by-environment interactions and age effects have been previously reported at the FTO locus [38–41] where a larger effect is detected in younger adults. It is worth noting though that non-additive effects have also been observed in the FTO locus [42]. In studying thin individuals there are often concerns regarding the prevalence of eating disorders, notably anorexia nervosa amongst participants. We sought to carefully exclude eating disorders at two phases of recruitment (by medical history and by questionnaire). Additionally, we demonstrate that in our cohort of healthy thin individuals, anorexia nervosa is unlikely to be a confounder as the two traits are genetically only weakly correlated (r = 0.13, 95% CI [-0.02,0.28], p = 0.09). This was not the case for the UKBB replication cohort where a positive genetic correlation was observed (r = 0.49 95% CI [0.22–0.76] p = 0.0003). The positive genetic correlation with anorexia was still observed after removing individuals with medical conditions that could explain their low BMI (r = 0.62, 95% CI [0.30,0.92], p = 0.0001, Methods). These results highlight the importance of the careful phenotyping performed in the recruitment phase and the utility of the STILTS cohort as a resource to study healthy and persistent thinness. In the genome-wide association analyses amongst the signals we took forward for replication, in addition to detecting established BMI-associated loci, we find a novel BMI-association at PKHD1 in the UKBB BMI dataset (rs10456655, β = 0.10, p = 2.3x10-13, S9 Table), where a proxy for this variant (rs2579994, r2 = 1 in 1000G Phase 3 CEU) has been previously nominally associated with waist and hip circumference (p = 5.60x10-5 and p = 4.40x10-4 respectively) [43]. In addition, we found associations at loci that have only recently been established using very large sample sizes. FAM150B, was only suggestively associated at discovery stage in Tachmazidou et al. (2017) [32] (n = 47,476, p = 2.57×10−5) whereas it reached genome-wide significance when contrasting SCOOP vs STILTS (n = 2,927, p = 2.07x10-8, S5 Table). Also, PRDM6-CEP120 [5] was recently discovered in a Japanese study with a sample size of 173,430 and has not been previously reported in a European population. In our study, a signal near the locus (rs112446794, r2 = 0.36) showed suggestive evidence of association in SCOOP vs UKHLS (p = 2.08x10-6, S6 Table) with a significantly smaller sample size. Conditional analysis reveals the lead SNP in this study drives the association of the previously established signal (S8 Table). CEP120 codes for centrosomal protein 120. Variants near this locus have been previously associated with height [44] and waist circumference in East Asians [45]. Missense variants in the gene itself have been associated with rare ciliopathies [46,47]. Lastly, amongst the signals we took for replication, and after removing known and newly established loci, we still observe an enrichment of directionally consistent and nominal associations in the analysis of BMI as a continuous trait, suggesting that some of these results may warrant additional investigation, in particular in similarly ascertained thin and obese cohorts. One such example is rs4447506, near PIK3C3, which was not only nominally significant and consistent in the independent UKBB BMI analysis (p = 1.5x10-6, S9 Table), but also in the Locke et al. (2015) [24] BMI results (p = 0.01), and in the GIANT BMI tails analysis we used as replication (S5 Table). We also note, that despite not reaching genome-wide significance in our discovery cohorts, we observe directionally consistent suggestive associations at a number of loci previously associated with BMI tails and with different obesity classes [20] (S10 Table). Altogether, these results highlight some power advantages of using clinically ascertained extremes of the phenotype distribution to detect associations and suggest that healthy thinness falls at the lower end of the polygenic BMI spectrum. It is worth noting though that these clinically ascertained extremes display evidence of incomplete genetic correlation with BMI, in contrast to previously described obesity classes (S4 Fig), so it is plausible that additional loci might be uncovered by focusing on clinical extremes. As our results were based on clinically ascertained participants which met very specific criteria, it is worth noting these conclusions cannot be straightforwardly extrapolated to the general population. Experiments in animals have identified loci/genes associated with thinness/decreased body weight due to reduced food intake/increased energy expenditure/resistance to high fat diet-induced obesity [48,49], mechanisms that we hypothesise may contribute to human thinness. The STILTS cohort, being uncorrelated to anorexia nervosa, is an excellent resource in which to conduct such additional genetic exploration. Further genetic and phenotypic studies focused on persistently thin individuals may provide new insights into the mechanisms regulating human energy balance and may uncover potential anti-obesity drug targets.

Methods Ethics statement The study was reviewed and approved by the South Cambridgeshire Research Ethics Committee (12/EE/0172). All participants provided written informed consent prior to inclusion. Cohorts SCOOP, STILTS and UKHLS cohorts were used for the heritability, genetic correlation, genetic risk score and association analyses with established BMI loci, as well as, used as a discovery cohort in the genome-wide association study (GWAS) and gene-based tests. UK Biobank samples were used for genetic correlation analysis and in the replication stages of the GWAS and gene-based tests. ALSPAC was used as an additional control dataset to UKHLS for comparison against SCOOP in the established BMI loci analysis. STILTS. The aim was to recruit a new cohort of UK European people who are thin (defined as a body mass index < 18kg/m2) and well. After ethical committee approval (12/EE/0172), we worked with the NIHR Primary Care Research Network (PCRN) to collaborate with 601 GP practices in England. Each practice searched their electronic health records using our inclusion criteria (age 18–65 years, BMI<18 kg/m2) and exclusion criteria (medical conditions that could potentially affect weight (chronic renal, liver, gastrointestinal problems, metabolic and psychiatric disease, known eating disorders). A small number of individuals (n = 43) with a BMI of 19.0 kg/m2 were included as they had a strong family history of thinness. The case notes of each potential participant were reviewed by the GP or a senior nurse with clinical knowledge of the participant to exclude other potential causes of low body weight in discussion with the study team. Through this approach we identified 25,000 individuals who fitted our criteria for inclusion in the study. These individuals were invited to participate in the study; approximately 12% (2,900) replied consenting to take part. We obtained a detailed medical and medication history, screened for eating disorders using a questionnaire (SCOFF) that has been validated against more formal clinical assessment [50]. We excluded all participants who stated that they exercised every day/more than 3 times a week/whose reported activity exceeded 6 metabolic equivalents (METs) for any duration or frequency (http://www.who.int/dietphysicalactivity/physical_activity_intensity/en/). With these rather strict criteria for exercise, we sought to limit the contribution of exercise as a contributor to the thinness of participants in the STILTS cohort. We excluded people who were thin only at a certain point in their lives (often as young adults) to focus on those who were persistently thin/always thin throughout life as we hypothesised that this group would be enriched for genetic factors contributing to their thinness. We asked a specific question to identify these individuals: “have you always been thin?” Only those who answered positively were included. Questionnaires were manually checked by senior clinical staff for these parameters and for reported ethnicity (non-European ancestry excluded). DNA was extracted from salivary samples obtained from these individuals using the Oragene 500 kit according to manufacturer’s instructions (S1 Table). SCOOP. With ethical committee approval (MREC 97/5/21), we have recruited 7,000 individuals with severe early-onset obesity (BMI standard deviation score (SDS) > 3; onset of obesity before the age of 10 years) to the Genetics of Obesity Study (GOOS) [51]. The Severe Childhood Onset Obesity Project (SCOOP) cohort [31] is a sub-cohort of GOOS comprised of ~4,800 British individuals of European ancestry; S1 Table). SCOOP individuals likely to have congenital leptin deficiency, a treatable cause of severe obesity, were excluded by measurement of serum leptin, and individuals with mutations in the melanocortin 4 receptor gene (MC4R) (the most common genetic form of penetrant obesity) were excluded by prior Sanger sequencing. UKHLS. Understanding Society (UKHLS) is a longitudinal household study designed to capture economic, social and health information from UK individuals [52]. A subset of 10,484 individuals was selected for genome-wide array genotyping. This cohort was used as a control dataset with SCOOP and STILTS cases (S1 Table). UK BIOBANK (UKBB). This study includes approximately 487,411 participants with genetic data released (including ~50,000 from the UKBiLEVE cohort [53]) of the total 502,648 individuals from UK BioBank (UKBB). UKBB samples were genotyped on the UK Biobank Axiom array at the Affymetrix Research Services Laboratory in Santa Clara, California, USA and imputed to the Haplotype Reference Consortium (HRC) panel [54]. UKBiLEVE samples were genotyped on the UK BiLEVE array which is a previous version of the UK Biobank Axiom array sharing over 95% of the markers. To date, 487,411 samples with directly genotyped and imputed data are available and data was downloaded using tools provided by UK Biobank. Extensive data from health and lifestyle questionnaires is currently available as well as linked clinical records. BMI, as well as other physical measurements were taken on attendance of recruitment centre. Severely obese participants in the available data were defined as those with BMI ≥ 40 kg/m2 (N = 9,706) and thin individuals were defined as those with BMI ≤ 19 kg/m2 (N = 4,538). Given that it has been previously shown that type I error rate for variants with a low minor allele count (MAC) is inadequately controlled for in very unbalanced case-control scenarios [55], we randomly subsampled 35,000 individuals from the original 487,411 genotyped individuals and removed those with BMI≤19 or BMI ≥30, to generate an independent control set. The 25,856 participants remaining after BMI exclusions from the tails, generated a non-extreme set of individuals kept as putative controls (S2 Fig). The other 452,411 genotyped samples were kept as the BMI dataset for downstream analyses (S11 Table, S2 Fig). An interim release consisting of a subset 152,249 individuals from UKBB was released in May 2015. This interim release was imputed to a combined UK10K and 1000G Phase 3 reference panel and contains several variants which are not currently present in the HRC panel, as such it was used in some of the analyses described. ALSPAC. The Avon Longitudinal Study of Parents and Children (ALSPAC) [27,56], also known as Children of the 90s, is a prospective population-based British birth cohort study. Ethical approval for the study was obtained from the ALSPAC Ethics and Law Committee and the Local Research Ethics Committees. Please note that the study website contains details of all the data that is available through a fully searchable data dictionary (http://www.bris.ac.uk/alspac/researchers/data-access/data-dictionary/). Further information about this cohort, including details of the genotyping and imputation procedures, can be found in S2 Appendix. This analysis was restricted to a subset of unrelated (identity-by-state < 0.05 [57]) children with genetic data and BMI measured between the age of 12 and 17 years (n = 4,964, 48.5% male). The mean age of the children was 14 years and the mean BMI 20.5. Genotyping and quality control SCOOP, STILTS and UKHLS. For the SCOOP cohort, DNA was extracted from whole blood as previously described [31]. For the STILTS cohort, DNA was extracted from saliva using the Oragene saliva DNA kits (online protocol) and quantified using Qubit. All samples from SCOOP, STILTS and UKHLS were typed across 30 SNPs on the Sequenom platform (Sequenom Inc. California, USA) for sample quality control. Of the 3,607 SCOOP and STILTS samples submitted for Sequenom genotyping, 3,280 passed quality controls filters (90.9% pass rate). Of the 10,433 UKHLS samples, 9,965 passed Sequenom sample quality control (95.5% pass rate). Subsequently, UKHLS controls were genotyped on the Illumina HumanCoreExome-12v1-0 Beadchip. The 3,280 SCOOP and STILTS samples, and 48 overlapping UKHLS samples (to test for possible array version effects) were genotyped on the Illumina HumanCoreExome-12v1-1 Beadchip by the Genotyping Facility at the Wellcome Sanger Institute (WSI). Genotype calling was performed centrally for all batches at the WSI using GenCall. Criteria for excluding samples were as follows: i) concordance against Sequenom genotypes <90%; ii) for each pair of sample duplicates, exclude one with highest missingness; iii) sex inferred from genetic data different from stated sex; iv) sample call rate <95%; v) sample autosome heterozygosity rate >3 SDS from mean done separately for low (<1%) and high MAF(>1%) bins; vi) magnitude of intensity signal in both channels <90%; and vii) for each pair of related individuals (proportion of IBD (PI_HAT) >0.05), the individual with the lowest call rate was excluded. We performed SNP QC using PLINK v1.07 [58]. Criteria for excluding SNPs was: i) Hardy-Weinberg equilibrium (HWE) p<1x10-6; ii) Call rate <95% for MAF≥5%, call rate <97% for 1% ≤MAF<5%, and call rate <99% for MAF <1%. SMARTPCA v10210 [59] was used for principal component analysis (PCA). To verify the absence of array version effects we used PCA on the subset of shared controls genotyped on both versions of the array. Cut-offs for samples that diverged from the European cluster were chosen manually after inspecting the PCA plot. SNPs with discordant MAFs in the different versions of the array were excluded. After removal of non-European samples and 13 samples due to cryptic relatedness, 1,456 SCOOP and 1,471 STILTS samples remained for analysis. For UKHLS, 82 samples were removed after applying a strict European filter and 680 related samples were removed after applying a “3rd degree” kinship filter in KING [60]. A total of 9,203 samples remained, of which 6,460 had a BMI >19 and <30 (“controls”). UK BIOBANK. Sample QC was performed using all 487,411 samples. Criteria for excluding samples were as follows: i) supplied and genetically inferred sex mismatches; ii) heterozygosity and missingness outliers according to centrally provided sample QC files; iii) samples not used in kinship estimation by UKBB; iv) individuals that did not identify as “white british” or did not cluster with other “white british” in PCA analysis; v) samples that withdrew consent and vi) for each pair of related individuals (KING kinship estimate>0.0442), we randomly selected an individual preferentially keeping cases if one related individual is a control. After sample QC, thirteen individuals with underlying health conditions that could influence their BMI were also removed, twelve had BMI<14, and one had BMI>74. In the end, 7,526 obese, 3,532 thin and 20,720 non-extreme controls remained for case-control analyses. In addition, 387,164 samples remained for analysis of BMI as a continuous trait. There is an overlap of 10, 282 samples (~2.6% of the BMI dataset) with obese and thin cases (S2 Fig). The same procedure was performed on the interim release of 152,249 UKBB samples to produce a set of 2,799 obese, 1,212 thin, 8,193 controls and 127,672 individuals for the independent BMI dataset. All subsequent analyses on UKBB were also performed on this subset to query variants that are not currently available in the full UKBB release. Imputation and genome wide association analyses SCOOP, STILTS and UKHLS single-variant association analysis. Genotypes from SCOOP, STILTS and UKHLS controls were phased together with SHAPEITv2 [61], and subsequently imputed with IMPUTE2 [62,63] to the merged UK10K and 1000G Phase 3 reference panel [64], containing ~91.3 million autosomal and chromosome X sites, from 6,285 samples. More than 98% of variants with MAF ≥0.5% had an imputation quality score of r2≥0.4, however variants with MAF <0.1% had a poor imputation quality with only 27% variants with r2≥0.4 (S5 Fig). First-pass single-variant association tests were done for all variants irrespective of MAF, or imputation quality score (see below). Analyses of 1,456 SCOOP, 1,471 STILTS and 6,460 controls (BMI range 19–30) of European ancestry were based on the frequentist association test, using the EM algorithm, as implemented in SNPTEST v2.5 [65], under an additive model and adjusting for six PCs and sex as covariates. UKBB BMI dataset single-variant association analysis. For the BMI dataset, we used BOLT-LMM [66] to perform an association analysis with BMI using sex, age, 10 PCs and UKBB genotyping array as covariates. Heritability estimates and genetic correlation. Summary statistics from the SCOOP vs. UKHLS, STILTS vs. UKHLS, UKBB obese vs controls, UKBB thin vs controls and UKBB BMI analyses were filtered and a subset of 1,197,969 HapMap3 SNPs was kept in each dataset. Using LD score regression [67] we first calculated the heritability of severe childhood obesity (SCOOP vs UKHLS) and persistent thinness (STILTS vs UKHLS). For severe childhood obesity, we estimated a prevalence of 0.15% using the BMI centile equivalent to 3SDS in children [68]. In the case of persistent thinness (BMI< = 19), we used a GP based cohort for our prevalence estimates: CALIBER [69]. The CALIBER database consists of 1,173,863 records derived from GP practices. For the heritability analysis, we used a prevalence estimate of 2.8% for BMI< = 19 (Claudia Langenberg and Harry Hemingway, personal communication). We also used LD score regression to calculate the genetic correlation of SCOOP with STILTS, SCOOP with UKBB obese, SCOOP with BMI, STILTS with UKBB thin and STILTS with BMI. The genetic correlation between obesity and persistent thinness with anorexia was estimated using the summary statistics from SCOOP vs UKHLS and STILTS vs. UKHLS, and summary statistics available from the Genetic Consortium for Anorexia Nervosa (GCAN) in LD Hub [70]. The same analysis was repeated for UKBB obese vs controls and UKBB thin vs controls. Genetic correlation estimates for BMI vs Overweight, Obesity Class 1, Obesity Class 2 and Obesity Class 3 were also extracted from LD Hub (S4 Fig). Comparison with established GIANT BMI associated loci. We obtained the list of 97 established BMI associated loci from the publicly available data from the GIANT consortium [24]. We used this list as we wanted to focus on established common variation in Europeans with accurate effect sizes for simulations. In order to test whether there is evidence of enrichment of nominally significant signals with consistent direction of effect, we performed a binomial test using the subset of signals with nominal significance in the SCOOP vs UKHLS, and STILTS vs UKHLS analyses. Variance explained was calculated using the rms package [71] v4.5.0 in R [72] and Nagelkerke’s R2 is reported. Power calculations were performed using Quanto [73]. To calculate ORs and SE from the ALSPAC BMI summary statistics we used genotype counts from SNPTEST output. We then used a z-test to test for significant differences between the OR calculated using genotype counts of SCOOP and ALSPAC against the SCOOP vs. UKHLS OR. Simulations under an additive model. We created 10,000 simulations of 1 million individuals for the 97 GIANT BMI loci randomly sampling alleles based on the allele frequency from the sex-combined European dataset reported in Locke et al. [24] using an R script. For each simulated genotype, we simulated phenotypes with DISSECT [74] using the effect size in GIANT and then removed all samples from the lower tail where the phenotype was <3SDs to better reproduce the actual BMI distribution. Afterwards we randomly sampled 1,471 individuals from the bottom 2.8% and 1,456 from top 0.15% and compared against a random set of 6,460 controls from the equivalent percentiles to BMI 19–30. Finally, for each of these loci, we calculated the absolute difference between our observed OR and the mean OR from the simulations and counted how many times we saw an equal or larger absolute difference in the simulated data and assigned a p-value. This was done separately for SCOOP vs UKHLS and STILTS vs UKHLS. Genetic risk score. The R package GTX (https://cran.r-project.org/web/packages/gtx/index.html) was used to transpose genotype probabilities into dosages, and a combined dosage score, weighted by the effect size from GIANT, for 97 BMI SNPs [24] was calculated and standardised. We checked whether there was an ordinal relationship between the genetic risk score and BMI category (i.e. thin, normal, or obese) using ordinal logistic regression with the clm function in the ordinal R package. While the assumption of equal variance appears to hold (S6 Fig), the proportional odds assumption indicating equal odds between thin, normal, and obese groups is violated for the BMI genetic risk score and some of the principal component covariates (i.e., PC2, PC3, and PC6). As our primary model, we ran a partial proportional odds model adjusting for PC1, PC4, and PC5 and allowing the BMI genetic score, PC2, PC3, and PC6 to vary between BMI category. To check for consistency, we ran a partial proportional odds model adjusting for the first six PCs and allowing only the BMI genetic score to vary between BMI group and a full proportional odds model allowing all six PCs and the BMI genetic score to vary between BMI group (S1 Appendix). Using ANOVA, we formally tested the proportional odds assumption for the BMI genetic risk score. A genetic risk score was created and an ordinal logistic regression was run for each of the 10,000 simulations. We compared the observed test statistic testing whether the odds were the same by BMI category to the 10,000 simulation test statistics. We calculated the p-value as the number of simulations with a test statistic larger than that observed in the real data. A mean genetic risk score was also calculated for each BMI category (obese, thin and controls) across the 10,000 simulations. A t-test was used to test whether the mean observed GRS score in each category was significantly different from the one estimated using the simulations. Discovery stage GWAS. First pass single-variant association analyses results were used as discovery datasets for the GWAS. After association analysis, we removed variants with MAF<0.5%, an INFO score <0.4, and HWE p<1x10-6, as these highlighted regions of the genome that were problematic, including CNV regions with poor imputation quality. Quantile-quantile plots indicated that the genomic inflation was well controlled for in SCOOP-UKHLS (λ = 1.06) and STILTS-UKHLS (λ = 1.04), and slightly higher for SCOOP-STILTS (λ = 1.08, S7 Fig). We used LD score regression [67] to correct for inflation not due to polygenicity. To identify distinct loci, we performed clumping as implemented in PLINK [58] using summary statistics from the association tests and LD information from the imputed data, clumping variants 250kb away from an index variant and with an r2>0.1. In order to further identify a set of likely independent signals we performed conditional analysis of the lead SNPs in SNPTEST to take into account long-range LD. A total of 135 autosomal variants with p<1x10-5 in any of the three case-control analyses were taken forward for replication in UKBB. All case-control results are reported with the lower BMI group as reference. UKBB association analysis. We tested 1,208,692 SNPs for association under an additive model in SNPTEST using sex, age, 10 PCs and UKBB genotyping array as covariates. Three comparisons were done: obese vs thin, obese vs controls and controls vs thin. Variants with an INFO score <0.4, HWE p<1x10-6 were filtered out from the results. Inflation factors were calculated using HapMap markers. The LD score regression intercepts were 1.0074 in obese vs thin, 1.0057 in obese vs controls and 1.009 in thin vs controls. We used all thin individuals, regardless of health status, as our replication cohort to maximize power. However, using ICD10 codes and self-reported illness data (S12 and S13 Tables) to remove individuals who had a relevant medical diagnosis before date of attendance at UKBB recruitment centre, yielded 2,518 thin individuals and materially equivalent results (S8 Fig). GIANT, EGG and SCOOP 2013 summary statistics. We obtained summary statistics for the GIANT Extremes obesity meta-analysis [20] from http://portals.broadinstitute.org/collaboration/giant/index.php/GIANT_consortium_data_files. Summary statistics for EGG [30] were obtained from http://egg-consortium.org/childhood-obesity.html. We used summary statistics from our previous study of 1,509 early-onset obesity SCOOP cases compared to 5,380 publicly available WTCCC2 controls (SCOOP 2013) [31]. Data for the SCOOP cases is available to download from the European Genome-Phenome Archive (EGA) using accession number EGAD00010000594. The control samples are available to download using accession numbers EGAD00000000021 and EGAD00000000023. These replication studies are largely non-overlapping with our discovery datasets and each-other. When a lead variant was not available in a replication cohort, a proxy (r2≥ 0.8) was used in the meta-analysis. Replication meta-analysis. We meta-analysed summary statistics for the 135 variants reaching p<1x10-5 in SCOOP/STILTS/UKHLS with the corresponding results from UKBB and study specific replication cohorts (S5–S7 Tables). For obese vs. thin and obese vs. controls comparisons we used fixed-effects meta-analysis correcting for unknown sample overlap in replication cohorts using METACARPA [75]. For thin vs. controls we used a fixed-effects meta-analysis in METAL [76]. Heterogeneity was assessed using Cochran’s Q-test heterogeneity p-value in METAL. A signal was considered to replicate if it met all the following criteria: i) consistent direction of effect; ii) p<0.05 in at least one replication cohort; and iii) the meta-analysis p-value reached standard genome-wide significance (p<5x10-8). Given that we are querying additional variants on the lower allele frequency spectrum, one could also use a more strict genome-wide significance threshold taking into account the increased number of tests (p≤1.17x10-8) [77]. In practice, this only affected one previously established signal (SULT1A1, rs3760091) in our obese vs. controls analysis that fell just below this threshold (S6 Table). rs4440960 was later removed from final results (SCOOP vs UKHLS and STILTS vs UKHLS) after close examination revealed it was present in a CNV region with poor imputation quality. Comparison of newly established candidate loci and UKBB independent BMI dataset. We identified eleven signals in SCOOP vs STILTS, nine in SCOOP vs UKHLS and two in UKHLS vs STILTS that were nominally significant in the UKBB BMI dataset GWAS, and directionally consistent. A binomial test was used to check for enrichment of signals with consistent direction of effect (S9 Table). Lookup of previously identified obesity-related signals in our discovery datasets. We took all signals reaching genome-wide significance, or identified for the first time in the GIANT Extremes obesity meta-analysis [20], with either the tails of BMI or obesity classes, and in childhood obesity studies [30,31] and performed look-up of those signals in all three of our discovery analyses (SCOOP vs STILTS, SCOOP vs UKHLS and UKHLS vs STILTS). ORs and p-values from the previous studies and look-up results from our discovery datasets are reported in S10 Table.

Acknowledgments We are indebted to the participants of the STILTS cohort and the patients and families involved in the Genetics of Obesity Study (GOOS) cohort. We thank the staff of the NIHR Primary Care Research Network, the GPs, Physicians and nurses involved in identifying and recruiting participants to STILTS and GOOS. These data are from Understanding Society: The UK Household Longitudinal Study, which is led by the Institute for Social and Economic Research at the University of Essex and funded by the Economic and Social Research Council. The data were collected by NatCen and the genome wide scan data were analysed by the Wellcome Sanger Institute. The Understanding Society DAC have an application system for genetics data and all use of the data should be approved by them. This research has been conducted using the UK Biobank Resource (Application Number 14069). Data on the childhood obesity trait has been contributed by EGG Consortium and has been downloaded from www.egg-consortium.org. We are extremely grateful to all the families who took part in the ALSPAC study, the midwives for their help in recruiting them, and the whole ALSPAC team, which includes interviewers, computer and laboratory technicians, clerical workers, research scientists, volunteers, managers, receptionists and nurses. The data used for the analyses described in this manuscript were obtained from the GTEx Portal on 08/Feb/2018. The authors would like to thank Emma Gray, Michelle Dignam and staff of the WSI Sample Management and Genotyping facilities for their contribution, as well as, Konstantinos Hatzikotoulas and Ioanna Tachmazidou for their assistance in the QC of UK Biobank data. Understanding Society Scientific Group members: Michaela Benzeval1, Jonathan Burton1, Nicholas Buck1, Annette Jäckle1, Meena Kumari1, Heather Laurie1, Peter Lynn1, Stephen Pudney1, Birgitta Rabe1, Dieter Wolke2. 1) Institute for Social and Economic Research, University of Essex, UK; 2) University of Warwick, UK.