Novel SNP Identification and Genotyping with an Exon-Hybrid Capture Array and Sequencing

To identify functional and potentially novel SNPs associated with sensitivity or resistance to the development of obesity with high fat diet consumption present in our well characterized Japanese macaque (Macaca fuscata) model20,21,22,23,24,25,26,27,28,29,30,31,32,33, we designed an exon capture array that enriches for targeted exonic segments with similarity to the array probes (Supplemental Fig. 1). Our choice of utilization of exon hybrid capture for identification of potentially novel SNPs was based on the absence of a Macaca fuscata gene assembly, and subsequent concern for misclassification of whole genome sequencing data which would be inherent to mapping onto a moderate (6–12X) resolution Macaca mulatta rhesus assembly. Genes were selected for inclusion on the array based on a role in obesity, hyperlipidemia, insulin resistance, or lipid metabolism (such as those for cholesterol and triglyceride metabolism). Since the Macaca fuscata genome has not been sequenced, we used the currently available primate genome assemblies, with an emphasis on the rhesus macaque (M. mulatta) genome which diverged from M. fuscata 0.31 to 0.88 million years ago34, to identify sequences for inclusion on the array. Our custom array contained 78,450 probes spanning 783 genes. Our adult colony is genetically stable, and has been closed for 5 decades. However, it is not inbred and the mean kinship among the 186 animals in the M. fuscata breeding group is a grand mean of 0.01802 (inner and outer quartile of 0.01342 and 0.02223, respectively).

For the naïve discovery experiment employing our exon array, we selected a total of 8 M. fuscata adult females exposed to the HFD for a minimum of 3 years. We used data from non-pregnant females collected during their third year on the HFD for trait classification, and classified-animals as resistant (n = 4) or sensitive (n = 4) based on mean % bodyweight gain (22.3 vs 66.5%; range −10.98 to 113.51%), leptin/body weight ratio (0.08 vs 0.79; range 0.07 to 1.12 ng/ml relative to kg body weight), and insulin AUC (2589 vs 10210 μU/ml; range 1307 to 16480 μU/ml). Genomic DNA was extracted, hybridized to the array, and sequenced using the 454 FLX Titanium platform. SNPs of potential further interest for validation were first called based on an odds ratio approximating infinity (i.e., observed in all 4 obese resistant dams, but not in any of the sensitive animals, or in all 4 animals obese dams, but not in any of the resistant). These criteria were sequentially relaxed to allow for 3 of 4 and 2 of 4 in one cohort, but still absent in the comparison cohort (Supplemental Table S1).

To assure high quality SNP identification, we used three tools to call SNPs (Atlas-SNP235, SNPTools36, and ssahaSNP37) and only selected the 1534 SNPs called by all three for further analysis (Supplemental Fig. 2 and Supplemental Tables S1 and S2). The SNPs were initially called using ssahaSNP37 (http://www.sanger.ac.uk/science/tools/ssahasnp) which only identifies homozygous alternative allele SNPs. Genes were identified for further analysis on Atlas-SNP2 and SNPTools if ssahaSNP made homozygous alternative allele calls in all the samples in one group, but none of the samples in the other group (odds ratio approximating infinity). Among the 1534 SNPs called by all three SNP calling tools, Variant Effect Predictor (VEP)38 based on rhesus gene models was used to identify functional consequences of the SNPs. SNPs were then lifted over to the human genome in order to perform SIFT39 and PROVEAN40 predictions of the effect of amino acid differences. PLINK41 was thereafter used to identify associations between the genotypes with HFD resistant or sensitive phenotypes (Fig. 1a). Nonsynonymous SNPs with SIFT or PROVEAN predicted effects on protein function, and PLINK adaptive permutation approaching genome wide significance (odds ratios approaching 2.0 to 2.5, which corresponded to an initial p value < 0.063 in our small discovery cohort) were chosen for further analysis. A SNP in phospholipase A2, group IVA (Cytosolic, Calcium-Dependent) (PLA2G4A) (p = 0.012) (Fig. 1b) and two SNPs in apolipoprotein B (APOB) (p = 0.061 and p = 0.062) (Fig. 1c) met these criteria.

Figure 1 Manhattan plot (a) of exon sequencing identified SNPs compared between high fat diet sensitive and resistant Japanese macaques based on PLINK (initial ssahaSNP odds ratio approximating infinity, PLINK p of 0.012 to <0.063). −log p values are plotted on the y-axis, while chromosome location is plotted on the x-axis. Manhattan plots and linkage disequilibrium plots of SNPs in PLA2G4A (b) and APOB (c) identified as being putatively associated with lean versus obese response to chronic high fat diet in Japanese macaque dams. Full size image

The PLA2G4A and APOB SNPs were further interrogated by two means: functional annotation and validation in an expansive cohort. Functional annotation used a Combined Annotation Dependent Depletion (CADD)42 (Supplemental Table S3), which integrates multiple annotations to predict the likelihood of a SNP having a functional consequence. The PLA2G4A SNP had the highest PHRED scaled C-score (23.6). A PHRED score of ≥20 indicates the SNP is among the 1% of SNPs in the human genome most likely to have a functional effect. The PHRED scores for the APOB chr13:20932165 (9.018) and chr13:20940049 (15.07) approach or surpass a PHRED score ≥10 which indicates they are predicted to be among the 10% most likely SNPS to have a functional effect. Examination of individual constituents of the CADD scores showed that the PLA2G4A and APOB chr2:21235409 loci are highly conserved across primates, including humans.

In an effort to further identify significant associations with multiple additional traits that accompany or lend to the obesity resistant phenotype (i.e., exploration of potential mechanisms rendering an association with obesity resistance in our cohort of strictly experimentally defined animals (i.e., dams resistant to chronic high fat diet feeding for a minimum of 3 years), we took a dual approach. First, we completed expanded phenotyping among a total of 71 animals, inclusive of 43 adult female dams who had been placed on the HFD for at least 3 years and subjected to expansive phenotypic testing and measured analysis (the initial 8 dams, alongside 35 additional unrelated animals; n = 43 high fat diet challenged dams in total; 28 additional animals with partial phenotypic data were similarly included for a total of 71 animals). Second, the PLA2G4A and APOB SNPs were assayed by PCR based genotyping in an extended panel among all 43 adult female dams on the HFD for at least 3 years with extensive trait-correlated and trait-independent phenotypic data (Supplemental Table S4). In an effort to link genotype to the significantly expanded phenotype (as described further below), the extended SNP panel was designed to include the 8 individuals that were assayed by exon capture arrays along with their expanded trait data.

Expanded Phenotyping and Trait Characterization

While human epidemiologic studies have used categorical definitions in units of kg/m2 for ascribing normal weight, overweight, and obesity classifications43, they are in fact continuous variables and both body weight and body composition traits are quantitative in nature. To account for the complexity and heterogeneity of relative body mass, we performed extensive phenotyping of 37 traits on the individuals in the extended trait panel, including weight, percent and distribution body fat as measured by Dual-energy X-ray Absorptiometry (DXA) scan, cholesterol, insulin, glucose, leptin, and glucagon measurements as well as aggregate functions of the measurements such as HOMA-IR (Supplemental Table S5). A scatterplot matrix of correlations between a selected subset of phenotypes that show significant genotype-phenotype associations are shown in Fig. 2. The significant correlations between many of the traits highlights the relatedness of the phenotypic measures that serves to define an individual signature of obesity and insulin sensitivity in response to chronic HFD feeding in adult females. By identifying traits which independently segregate and distinguishing them from those that are co-linear, we were able to differentiate genotypes with pleiotropic affects. The significant genotype-phenotype associations of these phenotypes with the SNPs we identified in APOB and PLA2G4A suggest that the genotypes have pleiotropic effects on the phenotypes.

Figure 2 Scatterplot matrix of correlations among measured phenotypes for high fat diet exposed Japanese macaques. Only phenotypes that show significant genotype-phenotype associations based on NOIA are shown. The diagonal is labelled with the phenotypes and shows the distribution of the values for that phenotype. Scatterplots comparing two phenotypes together with red trend lines are shown below the diagonal. Pearson correlations for comparisons between two phenotypes are shown above the diagonal with significance denoted by (*p < 0.05), (**p < 0.01) and (***p < 0.001). Full size image

At the time of initial allocation onto the HFD, adult animals subsequently classified as obese resistant and sensitive did not significantly differ by virtue of number of prior pregnancies (resistant: nulliparous n 5, multiparous n 12; sensitive: nulliparous n 4, multiparous n 20; p = 0.45), age starting diet (resistant, 5.2 years vs. sensitive, 6.0 years; p = 0.12) nor age at first pregnancy after starting diet (resistant 6.4 years vs. sensitive 7.3 years, p = 0.10), nor initial baseline metabolic phenotyping (Supplemental Table S5). Initial body weight did show a statistically but unlikely clinically significant difference (resistant 7.93 kg vs sensitive 9.06 kg; p = 0.02).

Genotype-Phenotype Association

We first performed PLINK analysis to identify associations between the genotypes in our extended panel and the phenotypes. Case/control association analysis based on the initial sensitive/resistant phenotype classification did not show significant association with genotypes (Supplemental Table S6). Case/control analysis based on the revised sensitive/resistant classification showed a significant (p = 0.005) association between the classification and the PLA2G4A SNP. PLINK quantitative trait association analysis showed significant (p < 0.05) associations of both APOB SNPs with insulin and glucose AUC after year 3 on the HFD (Supplemental Table S7). The APOB.1 SNP showed a significant association with weight and weight change after 3 years on the HFD. In addition, the APOB.1 SNP showed a significant association with the initial BMI at the start of the diet and after 3 years on the HFD.

As a secondary analytic approach, we applied the Natural and Orthogonal InterAction (NOIA)44 model to the extended panel of SNPs suggested by PLINK analysis to refine genotype-phenotype associations. NOIA has the advantage of estimating interaction between genes (or epistasis), which is crucial in the determination of genomic variants in complex diseases and the adaptation and evolution of natural populations44. Since NOIA overcomes the duality of functional and statistical models of epistasis, it serves as a previously validated means to obtain estimates of both functional and statistical genetic effects from data. Moreover, explicit orthogonality is achieved regardless of the genotype frequency, making it an optimal tool for assigning polygenic loci to complex traits in relatively small populations44. We performed NOIA analysis as a three loci model including genotypes of the two APOB and one PLA2G4A SNPs in association with each of the 37 phenotype measurements. NOIA identified 62 significant (p < 0.05) models across 15 phenotypes (Supplemental Table S8). Of the significant models, 23 showed two loci having an effect on the phenotype suggesting a significantly robust polygenic effect. Genotype-phenotype plots were generated from the most significant model for each phenotype. Selected plots are shown in Fig. 3 and the remainder of the plots are in Supplemental Fig. 3.

Figure 3 Selected genotype-phenotype NOIA three loci models based on genotypes from the extended panel of high fat diet exposed Japanese macaques. The Genotype (x-axis) shows the genotypes for each of the loci (ordered as PLA2G4A, APOB.1, and APOB.2) and the genotypes predicted to associate with the phenotype are outlined with colored boxes. The Genotypic Effects (y-axis) shows the NOIA predicted effect of the genotype on a phenotype. The APOB.1 locus demonstrated an additive effect on weight change percent (a), leptin/body weight ratio (b), and total fat based on DXA (c). (d) The PLA2G4A and APOB.2 loci show a combined additive effect on insulin/glucose ratio. (e) The PLA2G4A locus shows an additive effect on triglyceride levels. (f) The APOB.2 locus shows an additive effect on free fatty acid levels. Full size image

Figure 3 projects six of the 37 selected genotype-phenotype NOIA associations in three loci models based on genotypes from the extended panel of high fat diet exposed Japanese macaques (n = 43 animals). The Genotype (x-axis, with 1 annotating the homozygous major allele, 2 the heterozygous state, and 3 the homozygous minor allele) shows the genotypes for each of the loci (ordered as PLA2G4A, APOB.1, and APOB.2). Significance of association is annotated by boxing, with the genotypes and allelic variation predicted to associate with the phenotype in a significant manner being outlined with colored boxes. The Genotypic Effects (y-axis) shows the NOIA predicted effect of the genotype on a phenotype, with the value being specific to the phenotype. For example, in panel Fig. 3a, weight change as a percentage of baseline (ranging from −50% to 100%) is projected on the y axis. In this model, locus 2 with APOB.1 is demonstrated to be significant in an additive model with a p = 0.035. In contrast, neither APOB.2 nor PLA2G4A demonstrated a significant effect on weight change (one phenotypic characteristic which renders of fails to render obesity). In fact, when examining all six other phenotypic characteristics projected in Fig. 3, the APOB.1 locus demonstrated an additive effect on weight change percent (Fig. 3a), leptin/body weight ratio (Fig. 3b), and total fat based on DXA (Fig. 3c). By comparison, the PLA2G4A (Fig. 3d) and APOB.2 loci show a combined additive effect on insulin/glucose ratio. Similarly, the PLA2G4A locus shows an additive effect on triglyceride levels (Fig. 3e) while the APOB.2 locus shows an additive effect on free fatty acid levels (Fig. 3f). This is summarily and systematically repeated with 9 of the 37 phenotypic measures demonstrating significance by NOIA projected in Supplemental Figure S3.