The Chicago Food Allergy Study

Both the discovery and replication samples were enrolled as part of the Chicago Food Allergy Study under a standard study protocol. All participants were recruited from the Chicago area from August 2005 to June 2011. Eligible families were those having either one or both parents with at least one biological child (aged 0–21 years) with or without FA, willing to participate in the study. Eligible FA case or control children (aged 0–21 years) were those with or without FA. For each family or participant, the following procedures were completed: (1) questionnaire interview by trained research staff to obtain information on each family member’s home environment, diet, lifestyle, history of FA and other allergic diseases; (2) clinical evaluation by nurse or trained research staff to obtain height, weight, waist and hip circumference, blood pressure measurement and lung function test; (3) allergy SPT; and (4) collection of venous blood samples for food specific IgE (sIgE) measurement, DNA extraction and subsequent laboratory assays. Detailed information on SPT and sIgE measurement is given in the Supplementary Methods. For each child, we also collected a detailed history of clinical allergic reaction on ingestion of specific foods. The study protocol was approved by the Institutional Review Board of Ann and Robert H. Lurie Children’s Hospital of Chicago and the Institutional Review Board of Johns Hopkins Bloomberg School of Public Health. Written informed consents were obtained from all participants or their legal guardian (for children aged <18 years).

Study sample included in the current GWAS of FA

In the discovery stage, we primarily used samples from nuclear families. A total of 2,759 subjects (853 families) were included. Among these families, 780 families (n=2,678) were included based on the following criteria: (1) at least one child had a convincing history of clinical allergic reaction on ingestion of specific foods and (2) two or more additional family members (parents/siblings) had archived DNA samples. Another 81 children from 73 families without parental genotyping data were also included (29 FA cases and 52 controls).

In the replication stage, we aimed to replicate the identified genetic associations with PA. We included 216 case–control samples (88 PA cases and 128 controls) from the Chicago Food Allergy Study, all independent of the discovery sample.

Definitions of phenotypes of interest

The main phenotypes of interest included ‘any FA’ and the three most common types of FA: PA, egg allergy and milk allergy. As we reported previously, we adopted stringent clinical criteria to define a specific type of FA35: (1) a convincing history of clinical allergic reaction on ingestion of specific foods35 and (2) evidence of sensitization to the same food, defined as having a detectable sIgE (≥0.10 kU l−1; detection limit of the instrument was <0.10 kU l−1) and/or a positive SPT to this specified food. A positive SPT for a specific allergen was defined based on criteria as follows: (1) the MWD for the negative control was <3 mm, the positive control was ≥3 mm and the difference of positive minus negative control was ≥3 mm; and (2) MWD was ≥3 mm for the specified allergen. Accordingly, we defined allergy to nine common foods (peanut, egg white, cow’s milk, soy, wheat, walnut, fish, shellfish and sesame seed) and ‘any FA’ if a child was allergic to any of these foods. Normal controls were defined if a child had neither clinical allergic reaction nor evidence of sensitization to any of the nine foods. All parents were defined as having uncertain FA phenotypes, as data on history of clinical allergic reaction subsequent to ingestion of specific foods were unavailable. We also performed sensitivity tests on FA definitions using other cutoffs for food-specific IgE and SPT, for example, food-specific IgE ≥0.35 kU l−1, SPT MWD ≥5 mm (ref. 36) or either food-specific IgE or SPT MWD≥95% PPV.

Genotyping and quality control steps in the discovery GWAS

Genomic DNA was isolated from EDTA-treated peripheral white blood cells. The concentration and purity was determined using a Quant-iT Broad-range dsDNA Assay Kit on a SpectraMax M2 micro-plate reader. Genotyping was performed using the Illumina HumanOmni1-Quad BeadChip in the Genome Technology Access Center, Washington University in St Louis, MO, according to specifications listed in Illumina’s protocol (Illumina, Inc.). Among 2,759 genotyped samples, 12 failed to yield high-quality genotyping calls (Supplementary Methods), resulting in an overall genotyping success rate of 99.6%.

Genotypes for 2,747 subjects were exported, with a total of 1,011,859 SNPs. We performed rigorous quality-control steps as suggested by Laurie et al.59, using the R/bioconductor package ‘GWASTools’60. Briefly, we examined the following parameters: (1) missing call rate per SNP, per chromosome and per sample; (2) the reproducibility rate among the 100 duplicated samples; (3) duplicate discordance estimates for each SNP to infer SNP quality; 4) genotyping batch effects: measured by comparing the difference in allelic frequencies between each plate and a pool of the other plates, and by comparing variation in log 10 of the autosomal missing call rate in each plate (no significant batch effects were detected); (5) gender identity: based on X chromosome heterozygosity and the means of the intensities of SNP probes on the X and Y chromosome; (6) autosomal heterozygosity; (7) Hardy–Weinberg equilibrium (HWE) test: performed among self-reported Caucasian parents or a sibling without FA if no parent was available. Sex-specific HWE tests were also performed; (8) Mendelian error check of 650 families with both parents available; and (9) pair-wise sample relatedness: pair-wise kinship estimates between every subject were computed using PLINK61.

We filtered 45,100 monomorphic SNPs and 14,948 SNPs with a >5% missing genotyping rate. A total of 595 SNPs with duplicate discordance estimates >2% in 98 pairs of duplicates and 1,784 SNPs that deviated from the HWE test (P<1 × 10−6) were also filtered. Mendelian error checks filtered 2,145 SNPs with Mendelian errors in ≥10 families (>1.5% families). Some SNPs were filtered under more than one criterion. We also removed 162,283 SNPs with minor allele frequency (MAF) <2% and 2,086 SNPs on the Y chromosome or on mitochondrial chromosomes. Finally, a total of 772,141 autosomal SNPs and 17,536 SNPs on the X chromosome were used in the downstream analyses.

We removed one subject with a missing genotyping call rate >5%, 12 subjects with gender discrepancies and 6 subjects with Mendelian errors in >5,000 SNPs. Pair-wise relatedness was checked for each pair of subjects by plotting the proportion of loci where the pair shared one allele identical by descent versus the proportion of loci where the pair shared zero allele identical by descent. A total of 34 subjects for whom the degree of relatedness was inconsistent with self-reported relationship were then removed. In total, 2,694 subjects were available for downstream data analyses.

Genetic ancestry was carefully computed by PCA using Eigenstrat37 and all European, American, African and Asian individuals in the 1000 Genomes Project were used as a reference (phase I, release_v3.20101123), as detailed in Supplementary Methods.

Statistical analyses in the discovery GWAS

To leverage the family-based data with a small number of case–control samples, the MQLS test (for autosomal markers)33 and its’ extension, the XM test34 (for X-linked markers) were applied to explore genetic associations for each phenotype of interest using MQLS-XM ( http://www.stat.uchicago.edu/~mcpeek/software/MQLS_XM/download.html), a programme for dichotomous outcomes. The MQLS can maximally use information available in a complex family structure by the following: (1) distinguishing between unaffected controls and controls of uncertain phenotypes (that is, individuals with unmeasured phenotypes), and incorporating both into the analyses; and (2) incorporating phenotype data for relatives with missing genotype data at each marker tested33. MQLS is a retrospective score test that treats the genotype data on sample individuals as random and the available phenotype information as fixed in the analysis, thus allowing for valid association testing in the presence of phenotype misspecification, and hence the method provides high power at the appropriate type I error rate33. Before the MQLS analysis, using PA as an example, the phenotype of interest was coded as follows: (1) PA-affected cases, (2) non-allergic non-sensitized normal controls and (3) controls of uncertain phenotypes (including children who did not meet the PA case or normal control definition, and all genotyped parents). We also performed a sensitivity test and found that the results were not significantly altered by removing children who did not meet the PA case or normal control definition from the analysis. The MQLS test was performed under an additive genetic model, with a specified prevalence of 5% for any FA or 1% for PA, milk allergy and egg allergy, separately, in the Europeans. We also repeated the analyses, while specifying a higher prevalence (10% for any FA, or 5% for PA, milk allergy and egg allergy, separately) and obtained very similar results. To perform MQLS analyses by conditioning on one of the top SNPs, we first calculated the residual using logit(Y=1)=β 0 +β G *G for subjects with non-missing phenotypes, where Y is the disease status and G is the genotype of the selected top SNP. The residual was set to 0 for subjects with missing phenotype. Similarly, to perform MQLS analyses in 497 non-European subjects adjusting for ancestry, the residual PA status for subjects with non-missing phenotypes was calculated using the first three PCs from the GWAS genotyping data as covariates and the residual for subjects with missing phenotypes was set to 0. The calculated residual was then used as the outcome to perform MQLS analyses using the QM-QXM programme, an approach that is an extension of the MQLS test to quantitative traits ( http://faculty.washington.edu/tathornt/software/QM_QXM/). As the MQLS is a score test and does not estimate effect size, the reported OR and 95% CIs were estimated using GEE models, with adjustment for age and gender in subjects of European ancestry. The first three PCs were also adjusted in the analyses for non-European subjects.

Genotyping and data analyses in the replication sample

The replication sample consisted of 88 PA cases and 128 normal controls from the same Chicago Food Allergy Study. SNPs rs7192 and rs9275596, which were suggestively or significantly associated with PA in the discovery GWAS (P<1 × 10−7), were selected for replication. As we needed to impute population ancestry using a similar strategy as was used for the discovery sample, and impute classical HLA alleles and AA polymorphisms based on a relatively dense SNP set, the Human OmniExpressExome BeadChip was selected for genotyping. DNA samples were prepared using the same lab procedures as for the discovery sample, and cases and controls were distributed evenly in each plate. Genotyping was performed according to specifications listed in Illumina’s protocol (Illumina, Inc.) at the Genomics Core Facility of the Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai. Similar quality-control steps were applied to the replication sample. Two subjects with gender inconsistencies and one subject from a monozygotic twin pair were removed from the subsequent data analysis. GEE models, adjusting for age and gender, were performed to test the association between each SNP and PA under an additive genetic model, in samples with European ancestry. When analyses were performed in samples of non-European ancestry, the first three PCs from the genome-wide SNP genotyping data were also included as covariates to adjust for potential population stratification.

SNP imputation

In the discovery sample, we performed phasing using SHAPEIT38 and imputation using IMPUTE2 (ref. 39) with all individuals in the 1000 Genomes Project as a reference panel. As MQLS does not support analyses using posterior probabilities, we computed best-guess genotypes, using a probability threshold of 0.95, as recently described in the literature62. We applied several post-imputation quality-control metrics including removal of SNPs with an IMPUTE2 info score <0.8, with a missing call rate >0.05, or with a MAF <0.02. A total of 6,459,842 genotyped or imputed SNPs were then analysed for their associations with any FA and three specific types of FA, respectively, using the MQLS test.

Meta-GWAS

SNP imputation was also performed for the replication sample of European ancestry, leading to a combined set of 6,174,271 genotyped or imputed SNPs. We performed the association tests for PA in the replication sample using the GEE model (in a case–control setting), adjusting for age and gender, similar to what was done for genotyped SNPs rs7192 and rs9275596. To maximize power, we performed meta-analysis based on the Stouffer’s weighted z-score method to combine the association results for PA from the discovery and the replication samples. Our GEE analyses in the replication sample did not include SNPs on the X chromosome (N=139,697) because of the small replication sample size and need to perform gender-specific GEE analyses (57 females and 74 males), and thus the meta-analysis was limited to 5,693,167 autosomal SNPs that overlapped in both sample sets.

Imputation of HLA alleles and AA polymorphisms

We used the HLA*IMP41 programme to impute classical HLA alleles from SNP genotyping data via reference to a training data set of over 2,500 samples of European ancestry with dense SNP data and classical HLA allele types. This framework is reported to have high imputation accuracy (92%–98% of imputations agree with lab-derived HLA types)41. We also applied the SNP2HLA framework to impute AA polymorphisms as well as classical HLA alleles, with genotype data from the Major Histocompatibility Complex Working Group of the Type I Diabetes Genetics Consortium as a reference panel42. The imputation was performed for subjects of European ancestry in the MHC class II gene region. We used best-guess genotypes for analyses, as MQLS does not support analysis using posterior probabilities. After applying several quality-control filters to the imputed data (that is, removal of imputed variants with call rate <95% and/or MAF <0.02), 50 four-digit HLA alleles from the HLA*IMP programme, 27 two-digit HLA alleles, 41 four-digit alleles and 165 AA polymorphisms from the SNP2HLA programme were analysed for their associations with PA, using the MQLS test in the discovery sample and using the GEE model in the replication sample, as described above. Multiallelic AA polymorphisms were analysed for associations with PA after converting K-alleles to K-bialleles.

DNA methylation measurement and quality-control steps

A total of 218 unrelated children of European ancestry in the discovery (N=199) or replication (N=19) samples had genome-wide DNAm data measured in genomic DNA isolated from EDTA-treated peripheral white blood cells. DNAm was measured using Infinium HumanMethylation450 BeadChips (including >485,000 CpG sites), according to the manufacturer’s instructions at the Center for Genetic Medicine, Northwestern University Feinberg School of Medicine. Several quality-control steps were performed with the ‘minfi’ framework63, as detailed in the Supplementary Methods. Both β-and M-values (representing methylation ratios) were computed for downstream analyses. M-values are reported as superior to β-values for identification of differential methylation64.

To account for potential batch effects, β- and M-values were ComBat-transformed using the ‘sva’ package65, with chip number as the surrogate for batches. The ComBat-transformed β- and M-values at each CpG site were applied to explore associations between DNAm, genotypes and PA.

Cell heterogeneity in blood may act as a potential confounder55,56 due to cell-specific patterns of DNAm57. Thus, with estimateCellCounts() function included in the ‘minfi’ package63, the distribution of six cell types (CD8T cells, CD4T cells, NK cells, B cells, monocytes and granulocytes) was inferred for each sample based on external reference DNAm signatures of the constituent cell type from Illumina HumanMethylation450 BeadChips55,56. The estimated cell composition was adjusted as a covariate in subsequent analyses.

Statistical analyses on DNA methylation mediation effects

To identify DMPs associated with the two validated PA-associated SNPs, we applied the ‘limma’ package58 in R/bioconductor, to fit a linear regression model in 218 unrelated children of European ancestry, with ComBat-transformed M-values at each CpG site (N=456,513) as a function of each SNP (under an additive genetic model), adjusted for age, gender and estimated cell composition. Genome-wide significance (P<5 × 10−8) cutoffs were applied. To report adjusted methylation differences in each genotype, ComBat-transformed β-values were analysed instead of ComBat-transformed M-values, which did not significantly change the results.

The identified genotype-dependent DMPs were tested for associations with PA, by fitting a linear regression model with ComBat-transformed M-values as outcomes, adjusting for the covariates mentioned above. These analyses were conducted in 73 PA cases and 67 controls, while the remaining 78 children with uncertain PA phenotypes were removed from these analyses. Bonferroni correction was applied to adjust for multiple testing. To report adjusted methylation differences in each group, ComBat-transformed β-values were analysed instead of ComBat-transformed M-values, which did not significantly change the results.

The SNP–DMP–PA relationships were then assessed using the CIT classification as methylation mediated, consequential, or independent40. We focused on the top DMP from each gene that was significantly associated with both SNPs and PA risk. Briefly, the CIT performs statistical tests for four conditions, all of which must be met to conclude that methylation mediation is occurring: (i) genotype and phenotype of interest (PA in the current study) are associated; (ii) genotype is associated with DMP after adjusting for phenotype; (iii) DMP is associated with phenotype after adjusting for genotype; and (iv) genotype is independent of phenotype after adjusting for DMP. The CIT P-value is defined using the intersection-union test framework as the maximum of the four-component test P-values. As the CIT was originally designed for continuous phenotypes, we applied a modified version based on logistic regression to examine the causal relationship for each SNP–DMP–PA pair in this study, which has been reported previously32.

Functional annotation using existing eQTL data sets

To identify potentially causal gene(s) underlying the identified genetic associations with PA, we queried existing eQTL databases in multiple tissues (including subcutaneous/omental adipose tissue43, liver tissue43 and lymphocytes44,45,46 ( http://regulome.stanford.edu)), to assess whether the top PA-associated SNPs were eQTL SNPs. We surveyed both cis- and trans-eQTLs of 10% false discovery rate and found that the two PA-associated SNPs influence gene expression mainly in a cis manner, and that corresponding cis-eQTLs were reported in the paper. For each gene whose expression level was significantly associated with the two PA-associated SNPs, the most significant cis-eQTL in the subcutaneous/omental adipose and liver tissues, separately, and its LD squared correlation coefficient with the PA-associated SNPs were also reported.