A large-sample study of photic sneeze reflex in the Chinese population

A sample of 3417 Chinese individuals were analyzed for the association between 419093 genetic variants and the PSR phenotype (Table S1). 874 participants self-reported as PSR-positive, and the corresponding prevalence of 25.6% (95% CI of [24.1%, 27.1%]) was similarly comparable to previously reported PSR prevalence in several western countries16,23,24. We observed a male prevalence of 30.1% (95% CI of [27.9%, 32.3%]) and a female prevalence of 21.1% (95% CI of [19.2%, 23.2%]), which were statistically different (P < 1 × 10−8). The sample was balanced regarding sex with 1700 males and 1717 females (P = 0.78), and no significant differences were observed in age distribution either between sexes (Fig. S2A) or between case-control groups (Fig. S2B). The aiming population was exclusively defined so all individuals recruited were Chinese citizens. In addition, principal component analysis (PCA) facilitated the classification of all participants into ancestry groups given their genetic variants. Scatter plots with the top principal components (PCs) revealed the clustering of most individuals into a single cohort and the limited range for diverting points (Fig. S3A), suggesting robust homogeneity of Chinese ancestry in the sample. Moreover, the top 10 PCs cumulatively explained over 55% of total genetic variation (Fig. S3B), and thus would be included in later statistical analysis as covariates to minimize any remaining genetic substructure.

GWAS identified both replicative and novel genetic loci

Based on an additive effect model and with a stringent Bonferroni-corrected threshold, GWAS identified 10 hit SNPs associated with PSR in 3 independent autosomal regions (Fig. 1 and Table 1). Given different PSR prevalence between two sexes in our sample, we also performed a sex-separated analysis and identified similar hit SNPs although the smaller sample size led to reduced statistical power and thus to the significance of some hit SNPs falling below the Bonferroni-corrected threshold (Fig. S4A for males and Fig. S4B for females). As quality control, SNP-level scores fell onto a line with the slope of unstandardized λ at 1.011, with no major shift from the line of y = x and thus without systematic bias (Fig. 2). For the additive effect model, the top hit in each region was respectively rs10427255 on chromosome 2, rs1032507 on chromosome 3, and rs6448862 on chromosome 4, to which the other 7 hit SNPs located closely adjacent and scored significant due to high linkage disequilibrium (Fig. S5). All hit SNPs were intergenic rather than within the open reading frame of any genes, thus unlikely to contribute to the PSR phenotype by directly modifying the structure or function of any potential proteins (Fig. 3). Among the 3 top regional hit SNPs, rs10427255 was 848 kb downstream of ZEB2 (Zinc Finger E-Box Binding Homeobox 2) and 2477 kb upstream of ACVR2A (Activin A Receptor Type 2 A); rs1032507 was 3100 kb downstream of GBE1 (1,4-Alpha-Glucan Branching Enzyme 1) and 97 kb upstream of CADM2 (Cell Adhesion Molecule 2); rs6448862 was 694 kb downstream of HS3ST1 (Heparan Sulfate-Glucosamine 3-Sulfotransferase 1) and 1245 kb upstream of RAB28 (Member RAS Oncogene Family). Most of these hit SNPs were without substantial linkage to any measured SNPs within their neighbor genes (r2 < 0.005, Fig. 3), and the only exception appeared to be rs1032507 with mild linkage to a few SNPs in the initial coding region of CADM2 (0.1 < r2 < 0.2, Fig. S6). Potential explanations of the observed associations could be 1). SNPs in promoter region could regulate gene expression (e.g. rs1032507 was possibly in the promoter of CADM2); 2). SNPs could affect phenotype through distal regulation so the nearest genes to the hit variants may often not be the responsibly causal ones; 3). SNPs may influence local epigenetic modifications and thus be indirectly associated with the phenotype. Nevertheless, GWAS identified 3 independent genetic variants significantly associated with PSR, but they did not directly affect the coding of particular genes, and thus the biology of PSR remained mechanistically unclear.

Figure 1 Manhattan plot of GWAS based on an additive effect model. Bonferroni corrected threshold and candidate threshold respectively correspond to 7.30 and 5.30 with regard to −log 10 (P). Full size image

Table 1 List of 10 hit SNPs in GWAS based on the additive effect model. Full size table

Figure 2 Quantile-quantile plot of observed vs. expected −log 10 (P) scores in GWAS. Full size image

Figure 3 Association plot of hit SNPs to SNPs within their nearest upstream and downstream genes. For each gene, SNPs within the region of 5000 bp upstream of start codon and 5000 bp downstream of stop codon were included. Full size image

PSR was reported as a likely autosomal dominant phenotype18, although the single-family-based evidence was not conclusive. Given its unknown mode of inheritance, in addition to the above additive effect model, we applied the same dataset to GWAS based on models of dominant effect or recessive effect. The same 4 hit SNPs on chromosome 2 and the same 3 hit SNPs on chromosome 3 remained significant in the dominant effect model (Fig. S7A); only rs10427255 on chromosome 2 and rs1032507 on chromosome 3 remained significant in the recessive effect model (Fig. S7B). The above results combined suggested that 2 independent genetic variants (rs10427255 and rs1032507) were robustly associated with PSR regardless of the effect models, but rs6448862 was only significant in the additive effect model. In all three models, rs10427255 scored a higher significance of association than rs1032507.

rs10427255 and rs1032507 display independent and combinational associations with PSR

With assumption of the additive effect model, rs10427255 had an odds ratio (OR) of 1.68 so each copy of minor allele (T) conferred a multiplicative 68% higher odds for PSR; in comparison, rs1032507 had an OR of 0.65 so each copy of minor allele (A) conferred a multiplicative 35% lower odds for PSR (Table 2). The effects of both SNPs should be independent given their locations on different chromosomes. Indeed, when combined together into the additive effect model, OR levels for both SNPs were largely unchanged compared to those in models with single SNP (Table 2). Regarding the ROC curve and the AUC value used for classification, rs10427255 displayed slightly better performance than rs1032507 (AUC of 0.628 vs. 0.617, Fig. 4), consistent with its higher association significance; interestingly, when combined, the 2 SNPs together demonstrated a substantial lift in PSR classification (AUC of 0.657, Fig. 4) compared to single SNP (bootstrap-based AUC test: P = 0.04 for combined vs. rs10427255 alone, and P = 2.3 × 10−6 for combined vs. rs1032507 alone), further articulating their independent but yet combinational associations with the PSR phenotype. Such phenomenon was also observed in both dominant effect model (Fig. S8A) and recessive effect model (Fig. S8B).

Table 2 Summary of hit SNPs: rs10427255, rs1032507, and both combined. Full size table

Figure 4 ROC curves for predicting PSR phenotype using rs10427255 alone (red), rs1032507 alone (blue), or rs10427255 and rs1032507 combined (purple). Full size image

Just as large sample size generally confers higher statistical power, the number of study participants is an influential factor for GWAS and serves as an indirect manifestation for the association strength between the hit SNPs and the phenotype-of-interest25,26. By setting the cohort of 3417 individuals as the population and maintaining the prevalence of PSR as observed, we conducted repeated random re-sampling to obtain subsets of different sizes and accordingly evaluated the association significance between PSR and hit SNPs (rs10427255 and rs1032507). All repeated trials were likely to score genome-wide significance with size over 1800 for rs10427255 and size over 2600 for rs1032507, further arguing for the underlying strong associations (Fig. 5). Undoubtedly, these two associated genetic variants were unlikely to be the causal ones and it still remained a challenge to reveal the genetic and physiological mechanisms behind PSR. However, GWAS successfully identified rs10427255 and rs1032507 as independent polygenic risk predictors for PSR in the Chinese population, not only proving the values in self-reported phenotypic studies but also navigating future research to the defined sites and regions within the genome.