European data came from two cohorts, the 500FG cohort and the 300-Obesity (300-OB) cohort. Both studies were approved by the Ethical Committee of Radboud University Nijmegen, and all participants received detailed printed and oral information and provided written informed consent. Experiments were conducted according to the principles expressed in the Declaration of Helsinki. The 500FG cohort consisted of 534 healthy individuals of Western-European genetic ancestry (). The inclusion of the volunteers took place between 8/2013 and 12/2014 at the Radboud University Medical Center, the Netherlands. Volunteers had a mean age of 28.5 ± 13.9 years, 44.5% were men and 55.5% women. The 300-OB cohort included 302 individuals aged between 55 to 80 at the Radboud University Medical Center in the period between 2014 and 2016 (). All subjects had a BMI above 27 kg/mand most had also participated in the Nijmegen Biomedical Study – Non-Invasive Measurements of Atherosclerosis 1 (NBS-NIMA1) study, a population-based survey of Nijmegen residents (). Among the patients 45% were men and 55% women.

Indonesian samples were taken according to a protocol approved by The Developing-Country Committee of The Danish National Committee on Health Research Ethics. All participants provided informed consent for their participation in this study. The 59 ethnic Bajau and 34 ethnic Saluan individuals analyzed in this study were from the Indonesian villages of Jaya Bakti and Koyoan, respectively. Both villages are found on the eastern tip of the Central Sulawesi peninsula. The individuals had an average age of 41.1 (ranging from 18-85), and were comprised of 75.3% men and 24.7% women.

In the 300-OB cohort, abdominal MRIs were performed to assess the prevalence of hepatic steatosis and the abdominal fat distribution. All MRI examinations were performed using a 3.0 T Magnetom Trio or Skyra (Siemens, Erlangen, Germany). Subjects were examined in the supine position with their arms positioned parallel to the lateral sides of the body. For each subject, a series of thirty T1-weighted TIRM axial MR images of 5 mm each were acquired from the liver region. The images acquired were retrieved from the MR scanner in DICOM (Digital Imaging and Communications in Medicine) format, and analyzed with software developed in the IDL 6.0 environment, called HIPPO FAT (version 1.3, V. Positano 22 ). In HIPPO FAT all thirty slices were assessed for the spleen size. Of the slice with visually the largest spleen size, contour lines were placed manually and the size was calculated automatically. The intraclass correlation coefficients of the spleen size was 0.995.

In the 500FG cohort, blood samples were collected by venipuncture from the cubital vein in the morning, and a large set of circulating mediators, including steroid and thyroid hormones, were measured as previously reported ().

Genomic DNA was extracted from spit samples using the prepIT L2P extraction kit from DNA Genotek. Genomic DNA extracts were quantified using a Qubit™ dsDNA High-Sensitivity Assay. Aliquots containing 75 ng of DNA were transfered to Covaris micro-TUBE-15 and fragmented down to 550 bp on a Covaris M220 ultrasonicator (duty factor 20%, 50 cycles per burst, 23 s). Sequencing libraries were built using the TruSeq Nano DNA Library Preparation Kit on an Illumina NeoPrep instrument, following manufacturer’s instructions, but without normalization step. Each library preparation session included one negative control. Sequencing Libraries were checked for size distribution and molarity on an Agilent BioAnalyser 2100 instrument, using the High-Sensitivity DNA Kit, and pooled equimolar (8 to 16 libraries per pool). Each pool was sequenced 125 Paired-End over one or two lanes on the Illumina HiSeq2500 (version 4 chemistry). Samples were sequenced to an average depth of 5x.

Two other phenotypic measurements, height and weight, were obtained at the same time the DNA samples and spleen measurements were taken. Through a brief interview with each participant, we obtained demographic information including ethnicity, age, gender, and diving history. An individual was defined as a diver if they engaged in breath-hold diving on average at least three times a week, with the last dive having occurred within the last week. Non-divers were those who had never engaged in frequent or non-recreational diving (thereby excluding formerly frequent divers). There were no ambiguous cases in which an individual fell between diver and non-diver.

Spleen measurements were performed using the SonoScape A6 Portable Ultrasound System with a C321 2-5MHz Convex Transducer. Spleen measurements were made in the transverse and longitudinal planes in order to calculate spleen volume according to the methodology outlined inshown to have the highest correlation with values obtained through a computed tomography (CT) scan. All spleen measurements were taken by the same researcher to ensure consistent measurements.

Quantification and Statistical Analysis

Read processing Schubert et al., 2016 Schubert M.

Lindgreen S.

Orlando L. AdapterRemoval v2: rapid adapter trimming, identification, and read merging. Li, 2013 Li, H. (2013). Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv, arXiv:1303.3997. Li et al., 2009 Li H.

Handsaker B.

Wysoker A.

Fennell T.

Ruan J.

Homer N.

Marth G.

Abecasis G.

Durbin R. 1000 Genome Project Data Processing Subgroup

The sequence alignment/map format and SAMtools. DePristo et al., 2011 DePristo M.A.

Banks E.

Poplin R.

Garimella K.V.

Maguire J.R.

Hartl C.

Philippakis A.A.

del Angel G.

Rivas M.A.

Hanna M.

et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Quinlan and Hall, 2010 Quinlan A.R.

Hall I.M. BEDTools: a flexible suite of utilities for comparing genomic features. The sequences were basecalled using the Illumina software CASAVA-1.8.2 and de-multiplexed using a full match of the 6 nucleotide index incorporated during library adaptor ligation. The reads were trimmed using AdapterRemoval-2.1.3 () for adaptor sequences and leading/trailing stretches of Ns. Additionally, bases with quality of 2 or less were removed by trimming from the 3′ and only reads larger than 30bp were kept. Retained read pairs after trimming were aligned using BWA mem-0.7.10 () to GRCh37 and processed using samtools-1.3.1 () removing reads with a mapping quality lower than 30 and merged to libraries. Hereafter duplicates were marked using picard-1.127 MarkDuplicates ( https://broadinstitute.github.io/picard/ ), libraries merged to sample level and realigned using GATK-3.3.0 () with Mills and 1000G gold standard indels. Finally, realigned bams had the md-tag updated and extended BAQs calculated using samtools calmd. Read depth and coverage were determined using pysam ( https://code.google.com/archive/p/pysam/ ) and BEDtools ().

Error rate estimation Korneliussen et al., 2014 Korneliussen T.S.

Albrechtsen A.

Nielsen R. ANGSD: analysis of next generation sequencing data. Orlando et al., 2013 Orlando L.

Ginolhac A.

Zhang G.

Froese D.

Albrechtsen A.

Stiller M.

Schubert M.

Cappellini E.

Petersen B.

Moltke I.

et al. Recalibrating Equus evolution using the genome sequence of an early Middle Pleistocene horse. To estimate error rates, we analyzed each of the samples of interest separately using a method in ANGSD () that is based on comparison to an error free genome. Briefly explained, the method exploits the fact that if the analyzed sample has no errors it should have the same expected number of derived alleles as an error free genome, and that any observed excess of derived alleles in the analyzed sample compared to an error free genome therefore must be due to errors in the analyzed sample. The method thus bases its error estimates on the excess in derived alleles in the analyzed sample compared to a given error free genome. Since no genomes are entirely error free, the estimates in practice become estimates of excess error compared to a high-quality genome, i.e., relative error rates. However, if a high-quality genome is used, the estimates should be close to absolute error rates (for more detail about the method see). We used the chimp genome from the hg19 multiz46 alignment as an out-group for assessing what alleles are derived. As the high-quality genome, we used NA12778 low-coverage genome from the 1000 Genomes Project’s May 2013 release. Before the analyses, we filtered the data from this genome so only reads with MapQ > 35 and bases with baseQ > 35 were used to ensure that the data used were indeed of high-quality. Also, we only included data from genomic positions where there is coverage on both the chimp, the sample of interest, and the high-quality genome. We used all reads from the samples of interest as opposed to just sampling one per site. However, before performing the estimation we filtered away reads with MapQ < 30 and bases with baseQ < 20 to mimic the filtering we have used in the many of the analyses performed in this study. None of the base type specific error rates stand out for any of the individuals and all overall estimates are below or equal to 0.05% suggesting that all the genomes have low error rates and can be used in downstream analyses.

Relatedness estimation Several of the analyses we performed in this study are based on an assumption that the individuals analyzed are not closely related. We therefore performed analyses to infer to what extent the sequenced individuals are related. More specifically, we estimated the three relatedness coefficients k0, k1, and k2 for each pair of individuals within each of the 2 populations and based on these we made a list of individuals to exclude from analyses of individuals that are not closely related. The coefficients k0, k1, and k2 here denote the proportions of the genome where the pair of individuals analyzed share 0, 1 and 2 alleles identical by descent, respectively. Importantly, the less related a pair of non-inbred individuals is, the higher k0 is expected to be, with k0 being equal to 1 for completely unrelated individuals. We used k0 below 0.75 as a threshold for being closely related, corresponding to the expected value for k0 for first cousins. Korneliussen and Moltke, 2015 Korneliussen T.S.

Moltke I. NgsRelate: a software tool for estimating pairwise relatedness from next-generation sequencing data. Since many of the individuals are sequenced to quite low depth, we used the program NGSrelate () to infer k0, k1, and k2 for each pair. NGSrelate is a maximum-likelihood based program that allows inference of the three relatedness coefficients from genotype likelihoods instead of called genotypes and through that it takes into account the uncertainty of the true genotypes, which is inherent to low-depth sequencing data. When running NGSrelate, we used standard EM for optimization, allowed up to 500,000 EM iterations for each pair and used a stopping criterion of 1e-12 difference in likelihood between two consecutive EM-iterations. Furthermore, we ran each analysis 10 times with different random number seeds to be able to assess convergence of the EM-algorithm. For all pairs, the difference in log likelihood between the 10 NGSrelate solutions was less than 0.002, suggesting convergence was reached. Since several of the individuals reported themselves to be admixed and NGSrelate, like most relatedness estimation method, assume the individuals are not admixed, we ran the analyses both of the full dataset and of the subsets consisting of the 49 Bajau and 23 Saluan individuals that were self-reported to be unadmixed (here termed “unmixed”). We used the latter to check to what extent the potential admixture affected the estimates of the former dataset. Once we had the estimates of the relatedness coefficients, we identified all pairs of close relatives defined as pairs with a k0 estimate below 0.75. Next, we iteratively removed the individual with the highest number of close relatives until no pair of close relatives was left. Korneliussen et al., 2014 Korneliussen T.S.

Albrechtsen A.

Nielsen R. ANGSD: analysis of next generation sequencing data. For these analyses, we only included data from autosomal sites that overlapped with the PanAsia dataset, described further below. For each of the two populations we used ANGSD () to estimate allele frequencies and genotype likelihoods from reads with MapQ < 30 and bases with baseQ < 20 for the selected sites. This was done using the allele information from the SNP chip (-doMajorMinor 3) and with the SAMtools genotype likelihood model (-gl 1). For each of the two populations, we performed the allele frequency estimation and genotype likelihood estimation for both for all samples from the population and for the subset of “unmixed” individuals. The results from the analyses of all individuals are shown in Figure S2 . These estimates revealed numerous close familial relationships including parent-offspring and full siblings. Our subsequent filtering approach resulted in a list of samples to remove from analyses that should not contain data from closely related individuals. This list contains 16 Bajau and 1 Saluan individuals.

Testing for a difference in spleen size between Bajau and Saluan Before performing a selection scan, we first wanted to test if there is a difference in spleen size between the Bajau and Saluan in order to determine if this is an appropriate target for our selection investigation. As an exploratory test of the data, we performed a Welch two-sample t test comparing the spleen sizes of the Bajau to those of the Saluan. This yielded a p value of 3.538e−07, indicating the spleens of the Bajau are significantly larger than those of the Saluan. Importantly, we also performed the Welch two-sample t test within the Bajau, comparing divers with non-divers, and we found no statistically significant difference in spleen size (p value: 0.2663). This result suggests that the observed difference in spleen size between Bajau and Saluan is not attributable to a plastic response of the spleen to diving. However, other factors than whether the individuals are divers may affect the results of the Welch two sample t test. We therefore proceeded with a linear model, which allows us to account for additional factors, like age and gender as well. y = a + β p o p p o p u l a t i o n + β g e n d e r g e n d e r + β d i v e r d i v e r + β a g e a g e + β w e i g h t w e i g h t + β h e i g h t h e i g h t (1)

where y is the spleen size, α is the intercept, population is a binary indicator, which takes the value 0 for Saluan individuals and 1 for Bajau individuals and diver is a binary indicator of whether the individuals are divers. First, we used the R function lm to fit a linear model of the formwhere y is the spleen size, α is the intercept, population is a binary indicator, which takes the value 0 for Saluan individuals and 1 for Bajau individuals and diver is a binary indicator of whether the individuals are divers. pop is significantly different from 0, which would indicate that the Bajau and Saluan have different spleen sizes when taking into account gender, age, height, weight and whether the individuals are divers. We fitted the model using two different datasets to be able to assess if the results are affected by admixture: Dataset 1: all samples that are not closely related

Dataset 2: dataset 1 with samples estimated to have above 5% admixture removed We did this with the goal of testing if βis significantly different from 0, which would indicate that the Bajau and Saluan have different spleen sizes when taking into account gender, age, height, weight and whether the individuals are divers. We fitted the model using two different datasets to be able to assess if the results are affected by admixture: After fitting the model using these two datasets we checked if the residuals were normal distributed by making QQ-plots and running a Shapiro-Wilks test of normality, because the p values provided by lm are based on the assumption of the residuals being normal distributed. Because the Shapiro-Wilks test led to the rejection of the null hypothesis of the residuals being normal distributed for the first of the two datasets, we did not base our conclusions on the p values provided by lm. Instead we performed permutation tests to achieve p values using the function lmp in the R package lmPerm. When using this function, we set the stopping parameter Ca to 0.001 and the maxIter parameter to 109, meaning that the permutation sampling was set to stop when the estimated standard deviation fell below 0.1 of the estimated p value, or 109 permutations had been sampled (the criterion reached first is used). In practice, the function stopped before reaching 109 iterations in all our analyses. Second, we performed the exact same analyses using estimated Bajau ancestry proportions as the population predictor to explore how this would affect the results. pop being 0, which is provided by the standard lm function in R (“lm p-value” in The results of fitting the model in Equation 1 with the two datasets described in the data section can be found in Table S1 . This includes p values for the null hypothesis of βbeing 0, which is provided by the standard lm function in R (“lm p-value” in Table S1 ). However, we note that both QQ-plots and the results of the Shapiro-Wilks test for normality suggest that the residuals for the model fitted with data from all not closely related individuals are not normal distributed ( Figure S1 ). This means that one has to be careful using the p values provided by lm, since these are based on an assumption of the residuals being normal distributed. We therefore also performed permutation testing of the same null hypothesis (“permutation p-value” in Table S1 ) and used that as a base for our conclusions instead. This approach resulted in a β pop estimate significantly above 0 for both datasets, suggesting that the Bajau have a higher mean spleen size than Saluan even when correcting for gender, age, height, weight and whether the individuals are divers. And importantly, the fact that the result holds up for the subset of the individuals with less than 5% admixture, suggests that the results achieved from all not closely related individuals is not simply an artifact of admixture. pop being 0, which is provided by the standard lm function in R (“lm p-value” in The results of fitting the model in Equation 1 with the two datasets using estimated Bajau ancestry proportion instead of a simply binary variable as population predictor can be found in Table S1 . This includes p values for the null hypothesis of βbeing 0, which is provided by the standard lm function in R (“lm p-value” in Table S1 ). However, also for this predictor we note that both QQ-plots and the results of Shapiro-Wilks test for normality suggest that the residuals for the model fitted with data from all not closely related individuals are not normal distributed ( Figure S1 ). We therefore also performed permutation testing of the same null hypothesis (“permutation p-value” in Table S1 ). Importantly, this approach resulted in a β pop estimate above 0 for both datasets, but only significantly so (p value < 0.05) for the subset of the individuals with less than 5% admixture. For the larger admixed dataset that includes admixed individuals, the p value is slightly above 0.05. However, we note that admixture reduces the power to detect population differences if they are caused by genetics.

Bajau selection analysis using Ohana Auton et al., 2015 Auton A.

Brooks L.D.

Durbin R.M.

Garrison E.P.

Kang H.M.

Korbel J.O.

Marchini J.L.

McCarthy S.

McVean G.A.

Abecasis G.R. 1000 Genomes Project Consortium

A global reference for human genetic variation. The original selection analysis dataset contained 153 samples: 59 Bajau, 34 Saluan, and 60 CHS (Han Chinese) samples from the 1000 Genomes project (). After removing closely related individuals from the Bajau and Saluan sets, as described previously, the dataset contained 136 individuals: 43 Bajau samples, 33 Saluan samples, and 60 CHS. Korneliussen et al., 2014 Korneliussen T.S.

Albrechtsen A.

Nielsen R. ANGSD: analysis of next generation sequencing data. Auton et al., 2015 Auton A.

Brooks L.D.

Durbin R.M.

Garrison E.P.

Kang H.M.

Korbel J.O.

Marchini J.L.

McCarthy S.

McVean G.A.

Abecasis G.R. 1000 Genomes Project Consortium

A global reference for human genetic variation. We used ANGSD () to estimate genotype likelihoods from reads with MapQ < 30 and bases with baseQ < 20. This was done using allele information inferred directly from likelihoods (-doMajorMinor 1) and the SAMtools genotype likelihood model (-gl 1). SNPs were filtered for deviations from Hardy Weinberg Equilibrium (-hwe_pval 5e-7), strand bias (-sb_pval 5e-7), and the Wilcox rank sum test for qscore bias (-qscore_pval 5e-9). The full dataset contained genotype likelihoods for 2,665,716 markers. After masking using the 1000 Genomes filter (), it contained 2,333,499 markers resulting in a 12.5% reduction rate. We performed analyses both with and without the 1000 Genomes masking. In annotating markers, we assigned RSIDs to physical genome locations by comparing to them to dbSNP build 147 [Database of Single Nucleotide Polymorphisms (dbSNP). Bethesda (MD): National Center for Biotechnology Information, National Library of Medicine. (dbSNP Build ID: 147), 2016]. This dbSNP build contained 154,822,082 markers, of which 2,660,285 were among our 2,665,716 markers. 5,431 markers (0.2%) did not have assigned RSIDs, but none of these markers were identified in the selection scan as potential candidates for being targets of selection. To estimate genome-wide admixture proportions and the correlation structure and population tree needed for the selection scan, we used the same dataset as for the selection scan. However, to reduce the computational burden in performing these analyses, we down-sampled the dataset from the full 2,333,499 markers to 100,000 randomly chosen markers. The admixture and tree analyses were performed in the same manner as described earlier in the text, but using the Bajau, Saluan, and Han dataset described above. We performed the analyses both for K = 2 and K = 3, but only used the results from K = 3 in the downstream analysis ( Figure S5 ). Coop et al., 2010 Coop G.

Witonsky D.

Di Rienzo A.

Pritchard J.K. Using environmental correlations to identify loci underlying local adaptation. Pickrell and Pritchard, 2012 Pickrell J.K.

Pritchard J.K. Inference of population splits and mixtures from genome-wide allele frequency data. We used the “selscan” program provided in Ohana to detect SNPs that deviate strongly in the Bajau from the genome-wide covariance structure using a likelihood ratio test. For each SNP we introduce a scalar variable that is multiplied onto the variance associated with the Bajau population. This establishes two nested likelihood models, one which assumes the Bajau specific allele frequency change is as predicted from the genome-wide covariance pattern, and one which allows a larger change in the Bajau and hence a higher variance in the Bajau component than expected from the genome-wide pattern ( Figure S5 ). The resulting likelihood ratio test can be used to identify loci in which the Bajau population has experienced a larger than expected change in allele frequency as compared to the prediction from the genome-wide pattern, which is a signature of selection. This method is inspired by a number of similar, recently developed methods that use a Gaussian distribution as an approximation to model the distribution of allele frequencies among populations (). We removed signals driven by a single SNP that might be due to base-calling or mapping errors, and identified potential causal SNPs by only retaining SNPs for which: 1) the SNP does not have any neighbor within ± 100kb (200kb window) that has a higher likelihood ratio; 2) the SNP is not the only one within ± 100kb (200kb window) that is among the most extreme 1% likelihood ratios. Two example peaks from the results are shown in Figure S5

Functional annotation of selected SNPs Kircher et al., 2014 Kircher M.

Witten D.M.

Jain P.

O’Roak B.J.

Cooper G.M.

Shendure J. A general framework for estimating the relative pathogenicity of human genetic variants. ENCODE Project Consortium, 2012 ENCODE Project Consortium

An integrated encyclopedia of DNA elements in the human genome. We assessed how potentially disruptive each of the top 25 SNPs identified in the selection scan were through a query of the Combined Annotation Dependent Depletion online tool (CADD v1.3) (). We then focused on the top 5 most disruptive SNPs to determine the nature of their disruptive score by manually inspecting their annotations and visualizing their corresponding ENCODE segmentation tracks () in the Washington University Epigenomics Browser (v.44.1). Hoffman et al., 2012 Hoffman M.M.

Buske O.J.

Wang J.

Weng Z.

Bilmes J.A.

Noble W.S. Unsupervised pattern discovery in human chromatin structure through genomic segmentation. Siepel et al., 2005 Siepel A.

Bejerano G.

Pedersen J.S.

Hinrichs A.S.

Hou M.

Rosenbloom K.

Clawson H.

Spieth J.

Hillier L.W.

Richards S.

et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Davydov et al., 2010 Davydov E.V.

Goode D.L.

Sirota M.

Cooper G.M.

Sidow A.

Batzoglou S. Identifying a high fraction of the human genome to be under selective constraint using GERP++. The CADD score for each SNP is listed in Table 1 in the main text. The SNP with the largest CADD score (22.8, PHRED-scaled), rs10483896, lies in a Segway repressor region () upstream of NRXN3 and is expressed preferentially in brain tissues. The position is highly conserved across primates, mammals, and vertebrates (PhastCons scores excluding human genome = 0.98, 1 and 1, respectively ()) with a high GERP++ rejected substitution score (5.26) (). The SNP with the second highest CADD score (12), rs77280170, is in an intergenic region. It is not highly conserved and lies in a heterochromatic region of low regulatory activity. The SNP rs118149708, with the third highest CADD score (9.838), is in a Segway repressor region in an intron of RP11-499F3.2. The SNP with the fourth highest CADD score (8.527), rs16030, is a highly conserved synonymous change that also lies inside a transcription factor binding site in an open chromatin region. It is located in the gene CACNA1A. Visual inspection in the Epigenomics Browser suggests the site is in an enhancer region that overlaps an exon of the gene. Finally, the SNP rs28544477, with the fifth highest CADD score (6.521), is in a regulatory feature (Segway FAIRE-seq region) but is not highly conserved.

Spleen size association testing Skotte et al. (2012) Skotte L.

Korneliussen T.S.

Albrechtsen A. Association testing for next-generation sequencing data using score statistics. Initially, we tested for association between spleen size and each of the 25 SNPs with the highest score in the selection scan using the score statistic based method byas implemented in ANGSD. We chose this method for two reasons. First, it bases its test on posterior genotype probabilities and uses these to take into account the uncertainty of the genotype, which is inherent to low-depth sequencing data. Second, this method is based on a generalized linear model framework. It therefore allows inclusion of covariates in the tests and thus makes it possible to correct for potentially confounding factors, like admixture. y = a + β g e n o t y p e g e n o t y p e + β g e n d e r g e n d e r + β d i v e r d i v e r + β a g e a g e + β w e i g h t w e i g h t + β h e i g h t h e i g h t + β s e q u e n c i n g d e p t h s e q u e n c i n g d e p t h + β P C 1 P C 1 + β P C 2 P C 2 + β P C 3 P C 3 + β P C 4 P C 4 + β P C 5 P C 5

where y is spleen size and genotype is coded as 0, 1, or 2 corresponding to the number of copies of the selected allele carried by an individual. The test we performed for each of the 25 SNPs was whether β genotype is significantly different from 0. When assessing significance of the test results, we used a Bonferroni corrected p value threshold of 0.05/25 = 0.002. When performing these score statistics based tests, we assumed an additive effect model and included the five first principle components (PCs) as covariates to correct for admixture and other population structure. Additionally, we included age, gender, height, weight, sequencing depth, and whether the individuals are divers as covariates. Hence we assumed the linear model:where y is spleen size and genotype is coded as 0, 1, or 2 corresponding to the number of copies of the selected allele carried by an individual. The test we performed for each of the 25 SNPs was whether βis significantly different from 0. When assessing significance of the test results, we used a Bonferroni corrected p value threshold of 0.05/25 = 0.002. We note that an underlying assumption in model used is that the residuals are normal distributed. Unfortunately, ANGSD does not provide any possibility to investigate whether this assumption is violated. Therefore, we performed the tests not only using the raw spleen size measurements, but also using the spleen size quantile transformed to a normal distribution and on the spleen sizes log-transformed to be as sure as possible that any potential association signal is not simply caused by a violation of the underlying normality assumption. To follow up on the results from the initial tests, we performed two additional analyses, both with the purpose of investigating whether the initial results are affected by admixture (despite the fact that we included the first five PCs as covariates). We did this because it is well known that admixture and population structure in general can lead to false positives. First, we applied the same test to a large number of additional SNPs from across the genome and based on the p values from these tests we calculated genomic control inflation factors to assess if there is a general inflation in the –log(P)-values due to population structure. Second, we performed association tests very similar to the initial tests with the only difference being that we did not include any PC as covariates. Instead, we employed genomic control to correct for population structure. We did this to investigate if the results are robust to the approach of correcting for population structure. We based the genomic control correction on results from the same set of additional SNPs as in the previous analysis. 1) The same individuals as in the initial analyses (purpose: to investigate if the association signal from the initial analyses is an artifact of using the score test method)

2) The subset of individuals that are Bajau, excluding the one individual that, according the admixture analyses, has very little (if any) Bajau ancestry (purpose: to investigate if the association result from the first analyses is an artifact of the fact that the Saluan individuals were included in the initial analyses)

3) Three subsets of the Bajau individuals with different, limited amounts of non-Bajau ancestry (purpose: to further ensure the association signal is not an artifact of population structure unaccounted for in prior analyses) All previous score statistic based analyses provided evidence of an association between spleen size and one of the tested 25 SNPs. To further investigate this signal, we used the same linear model as in the initial analyses but this time fitted using the lm function in R based on imputed data. We did this using the following subsets of the data: Importantly, all of these lm analyses also gave us a chance to investigate the direction of the effect, which is not possible using the score-based test. After fitting the models, we checked if the residuals were normal distributed by running a Shapiro-Wilks test of normality, because the results provided by lm are based on the assumption of the residuals being normal distributed. All the score statistic based tests were applied to data from both Bajau and Saluan individuals. Before performing the tests all close relatives were filtered away, which left us with 43 Bajau and 33 Saluan with spleen size measurements. All the imputation based tests were applied to data from these same individuals and subsets of these, based on their amount of estimated Bajau ancestry. The analyses on which we based our relatedness filtering are described previously, as well as the analyses on which we based our admixture filtering. Korneliussen et al., 2014 Korneliussen T.S.

Albrechtsen A.

Nielsen R. ANGSD: analysis of next generation sequencing data. All the score statistic-based tests were performed on posterior genotype likelihood, which were estimated using ANGSD () with the SAMtools genotype likelihood model (-gl 1), and with major and minor alleles were inferred from the genotype likelihoods (-doMajorMinor 1). Only reads with MapQ > 35 and bases with baseQ > 35 were used to ensure that the data used were indeed of high-quality. Sites were restricted to those deemed accessible according to the 1000 Genomes accessibility mask. For the imputation-based tests, we imputed data for the SNP of interest using the same input and settings as for the score statistic-based tests. For phenotypes, we used the spleen size, weight, and height measurements obtained at the same time as DNA material was sampled. For age, gender, and diver (yes/no) we used values obtained through an interview also performed at the same time as the DNA material was sampled. The PCs were calculated using genetic data from the individuals included in the association tests only. We tested for association between spleen size and each of the 25 top SNPs from the selection scan using a score statistic based test. One SNP, rs3008052 in the gene PDE10A, showed evidence of being associated ( Table 1 from the main text). Notably, this result is consistent across three different transformations of the spleen size measurements, suggesting that the result is not simply an artifact of a potential violation of the normality assumption underlying the method used. In the initial tests, we included the five first PCs as covariates to correct for population structure. To investigate if the results we obtained from these tests were affected by population structure (despite this inclusion of PCs), we performed the same test on a large set of SNPs across the genome and, based on the p values from these tests, we calculated genomic inflation factors and made QQ plots ( Figure S6 ). This results suggested that the –log(p values) are in general not markedly inflated due to population. Additionally, QQ-plots of the 25 top SNPs from the selection scan suggest that the p values for these do not in general seem to be inflated either, despite the fact that these SNPs are characterized by having very different allele frequencies in Bajau compared to other populations ( Figure S6 ). We also performed score statistic-based association testing using genomic control instead of PCs to correct for population structure. This gave rise to the even lower p values for rs3008052 (3.8E-05, 3.0E-05, and 7.2E-05 for Untransformed, Qnorm transformed, and log transformed data, respectively). Hence, all in all, we did not detect any evidence that the observed association signal is an artifact of population structure. To further investigate the signal of association between spleen size and rs3008052, we imputed genotypes for rs3008052 and analyzed these for several subsets of the data: the exact same dataset as in the initial, the subset consisting only of Bajau individuals, and three different subsets of these Bajau individuals with only limited amounts of non-Bajau ancestry. First, we plotted the spleen sizes stratified by imputed genotype for all the datasets and all the transformations of the spleens sizes used in the initial tests. Visual inspection of these plots for all datasets showed a clear trend that, the more copies of the selected allele (T), the larger the spleen size, and thus an association between rs3008052 and spleen size in accordance with the initial results. Next, we tested if these associations are statistically significant using the exact same model as in the initial score statistic based tests, but this time using the R function lm to fit the model and perform the tests. We note that for these analyses the residuals were in no case rejected to be normal distributed, suggesting that the underlying assumption of normality of lm is not evidently violated. When performing these tests using data from the same individuals as in the initial score statistic based analyses, we got even lower p values than those obtained using the score statistic based test, suggesting that the initial association signal was not simply an artifact of using the score statistic based framework ( Table S2 ). Similarly, when performing the test based only on the subset of individuals that are Bajau, we also get highly significant p values despite the much smaller sample size (n = 43 as opposed to n = 76), which suggests that the initial association signal is not an artifact of including Saluan individuals in the initial analysis either ( Table S2 ). When performing the test of three different subsets of the Bajau individuals with only limited amounts of non-Bajau ancestry, the results were more mixed. For the largest subset consisting of the 50% of the Bajau individuals with least non-Bajau ancestry (where all individual are estimated to have < 16% non-Bajau ancestry) there was still a marginally significant p value, despite a sample size of only 21. But this was not the case for the even smaller datasets consisting of Bajau individuals with less than 10% and 5% non-Bajau ancestry. However, these two datasets are quite small (n = 18 and n = 15, respectively) and the effect sizes estimates obtained from them are consistent with the estimates obtained from all the larger datasets (main text Figure 4 ). Hence it seems likely that the lack of significance for these two datasets is due to their limited size, rather than a sign that the signals association signal observed in the larger more admixed dataset are caused by population admixture not corrected for by the PCs. It is also worth noting that the effect size estimates are all consistent with a positive effect on spleen size of the selected allele (T) i.e., having more Ts is associated with having larger spleen size.

Thyroid hormone association testing Shabalin, 2012 Shabalin A.A. Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Using data from the 500FG cohort, we tested for association using a linear regression technique as implemented in the Matrix eQTL package (). The method was applied to imputed “dosage” SNP data, where in imputation the genotypes are not discretely called, but rather values between 0 and 2 are given to each individual. Related and non-European ethnicity individuals were removed, and the tests were corrected for age, gender, BMI, and oral contraceptive usage by including these factors as covariate. Aguirre-Gamboa et al. (2016) Aguirre-Gamboa R.

Joosten I.

Urbano P.C.M.

van der Molen R.G.

van Rijssen E.

van Cranenbroek B.

Oosting M.

Smeekens S.

Jaeger M.

Zorro M.

et al. Differential effects of environmental and genetic factors on T and B cell immune traits. Thyroid hormone concentrations were scaled using an inverse rank transformation (IRT) algorithm, similar to the one applied in. We examined four SNPs: the lead SNP, rs3008052, and three other SNPs in high LD in Europeans with rs3008052 (rs2983527, rs3008050, and rs3008049). All of these SNPs have significant associations with hypothyroidism in the Global Biobank Engine data. We found clear, significant association for our investigated SNPs with the allele at higher frequency in the Bajau significantly associated with elevated T4 circulating plasma concentrations. Results for all tested SNPs are found along with boxplots illustrating the directionality of the associations in Figure S7