To functionally link coronary artery disease (CAD) causal genes identified by genome wide association studies (GWAS), and to investigate the cellular and molecular mechanisms of atherosclerosis, we have used chromatin immunoprecipitation sequencing (ChIP-Seq) with the CAD associated transcription factor TCF21 in human coronary artery smooth muscle cells (HCASMC). Analysis of identified TCF21 target genes for enrichment of molecular and cellular annotation terms identified processes relevant to CAD pathophysiology, including “growth factor binding,” “matrix interaction,” and “smooth muscle contraction.” We characterized the canonical binding sequence for TCF21 as CAGCTG, identified AP-1 binding sites in TCF21 peaks, and by conducting ChIP-Seq for JUN and JUND in HCASMC confirmed that there is significant overlap between TCF21 and AP-1 binding loci in this cell type. Expression quantitative trait variation mapped to target genes of TCF21 was significantly enriched among variants with low P-values in the GWAS analyses, suggesting a possible functional interaction between TCF21 binding and causal variants in other CAD disease loci. Separate enrichment analyses found over-representation of TCF21 target genes among CAD associated genes, and linkage disequilibrium between TCF21 peak variation and that found in GWAS loci, consistent with the hypothesis that TCF21 may affect disease risk through interaction with other disease associated loci. Interestingly, enrichment for TCF21 target genes was also found among other genome wide association phenotypes, including height and inflammatory bowel disease, suggesting a functional profile important for basic cellular processes in non-vascular tissues. Thus, data and analyses presented here suggest that study of GWAS transcription factors may be a highly useful approach to identifying disease gene interactions and thus pathways that may be relevant to complex disease etiology.

While coronary artery disease (CAD) is due in part to environmental and metabolic factors, about half of the risk is genetically predetermined. Genome-wide association studies in human populations have identified approximately 150 sites in the genome that appear to be associated with CAD. The mechanisms by which mutations in these regions are responsible for predisposition to CAD remain largely unknown. To begin to explore how disease-specific gene sequences and disease gene function promotes pathology, we have mapped the loci and genes that are downstream of the transcription factor TCF21, which is strongly associated with CAD. By identifying genes that are regulated by TCF21 we have been able to link together multiple other CAD associated genes and begin to identify the critical molecular processes that mediate atherosclerosis in the blood vessel wall and contribute to the genesis of ischemic cardiovascular events.

To better understand the cellular functions of TCF21 in the SMC lineage, and to gain insights into how such functions might contribute to CAD risk, we performed chromatin immunoprecipitation coupled with high throughput sequencing (ChIP-Seq), examined the downstream target loci and genes that harbor TCF21 binding sites, and employed bioinformatic and experimental approaches to investigate how the target genes work together to mediate the risk of CAD. Pathway analysis of downstream target genes revealed that TCF21 regulates cell-cell and cell-matrix interactions as well as growth factor signaling pathways. Also, we found TCF21 target regions to be over-represented among CAD associated loci, and that genes in these regions assemble into pathways that mediate fundamental processes such as cell cycle, chromatin remodeling, and growth factor signaling. Taken together, these studies elucidate disease-associated genes and pathways that lie downstream of TCF21, and show how SMC related processes may be responsible for a substantial portion of the genetic risk for CAD.

Tcf21 is a member of the basic helix-loop-helix (bHLH) TF family and is critical for the development of a number of cell types during embryogenesis of the heart, lung, kidney, and spleen [ 2 – 5 ]. Tcf21 is expressed in mesodermal cells in the proepicardial organ that give rise to coronary artery smooth muscle cells (SMC) and loss of Tcf21 results in increased expression of smooth muscle markers by cells on the heart surface consistent with premature SMC differentiation [ 6 ]. Knockout animals also exhibit a dramatic failure of cardiac fibroblast development suggesting a role for Tcf21 in the fate decisions of a precursor cell for SMC and cardiac fibroblast lineages [ 2 , 6 ]. These data are consistent with the hypothesis that early expression of Tcf21 is important for expansion of the SMC compartment of the coronary circulation, with persistent Tcf21 expression being required for cardiac fibroblast development [ 2 , 6 ].

Recent large-scale GWAS have identified 46 genome-wide significant CAD loci and a further 104 independent variants associated at a 5% false discovery rate (FDR), yet the biological and disease-relevant mechanisms for these associations remain largely unknown [ 1 ]. It is estimated that at least two-thirds of the disease loci contain causal genes that are not related to known cardiovascular risk factors such as diabetes and lipid metabolism, suggesting that they are involved in disease promoting processes in the blood vessel wall. Thus the great promise of these genetic findings is the elucidation of atherosclerosis disease pathways, and further investigation of mechanisms by which genes in disease loci work together to regulate cellular and molecular functions that are involved in disease risk is sorely needed. Among the significant loci, a small subset of genes encode transcription factors (TFs) which are likely to impact disease risk by regulating disease relevant genes and possibly other CAD associated genes. Further study of downstream targets of these TFs, employing well established genome-wide methods, would be expected to provide biological insights through links to established pathways and to identify informative relationships among other apparently independent causal CAD loci.

Results

ChIP-Seq studies identify TCF21 core target loci and genes Our primary interest in these studies was the transcriptional network of TCF21-regulated genes contributing to the development of human CAD. Because of the known role of TCF21 in the embryonic development of coronary vascular SMC, we undertook these ChIP-Seq experiments in primary cultured human coronary artery SMC (HCASMC). Furthermore, we selected culture conditions that maintain these cells in the synthetic, undifferentiated state that most closely reflects the disease phenotype [7]. Two polyclonal antibodies raised against peptides representing different epitopes of TCF21 and previously validated by the manufacturers were employed in these studies. ChIP-Seq was performed with both antibodies (Ab1 and Ab2), with two replicates per antibody and an IgG control condition. We then followed best practices for computational analysis of sequence data as put forth by the ENCODE project, including genome alignment, peak calling, and replicate consolidation using the Irreproducible Discovery Rate (IDR) method to identify high confidence peaks for each antibody [8]. ChIP-Seq using Ab1 identified 10,523 peaks while Ab2 identified 4,900 peaks that largely overlapped with those identified by Ab1. These two sets of peaks were within 50 kb of 12,226 and 7,150 genes, respectively (Table 1). To better understand the disparity between the numbers of DNA regions immunoprecipitated by the two antibodies, we characterized the distribution and relationship between the two sets of peaks identified. The spatial distribution of each peak set was investigated by graphing the distance between peaks and the transcription start site of the nearest gene (S1A and S1B Fig). These distributions were nearly identical for the two peak sets. In each case, peaks were distributed primarily within 100 kb of the transcription start sites, with 90% of peaks being found within this interval. The similarity between antibody peak localization was further demonstrated by relating the peak coordinates to structural gene features (S1C and S1D Fig). The pattern of distribution for peaks associated with both antibodies revealed the majority of binding sites were located within intronic and intergenic regions, with a significant number of peaks also being found within the promoter and exonic regions and a very small number of peaks mapping to transcript untranslated sequences. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Table 1. TCF21 binding characteristics with different antibodies alone and when individual antibody data is merged. https://doi.org/10.1371/journal.pgen.1005202.t001 Next, we investigated the overlap in genomic regions represented by peaks associated with each antibody precipitation. Results of this analysis also revealed a high degree of overlap between the two peak sets, with all but 72 of the 4900 peaks identified by Ab2 sharing one or more basepairs with peaks identified by Ab1 (Fig 1A). Visualization of peaks with the IGV browser provided further evidence of extensive overlap of peaks, although Ab2 frequently showed decreased peak size compared to Ab1 (Fig 1C). Shown here are TCF21 peak regions in three genes that have been identified as replicated CAD GWAS loci, IL6R, SH2B3, and SMG6. Due to this overlap, as well as the similarities in peak binding patterns described above, we intersected the two datasets to refine the number of peaks to those identified by both antibodies (Ab_shared, Fig 1B) and, unless otherwise noted, employed this data set for the analyses presented below. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Fig 1. Two TCF21 antibodies show overlapping patterns of TCF21 chromosomal binding. A) Two replicate experiments with Antibody 2 (Ab2) identified 4828 binding sites. All but 72 of these peaks were also identified by similar replicate experiments with Ab1, which recognized an additional 5695 peaks. B) In addition to analyzing data for each of the two antibody ChIP-Seq datasets, we have intersected those identified with both Ab1 and Ab2 (Ab_Shared), with the smaller of the peaks being employed if there was complete overlap of one versus the other, and the region of overlap used if the two peaks shared incomplete overlap. C) High throughput next-generation sequencing reads were aligned to the genome, peaks present in both biological replicates of each of the two antibody precipitations were identified by IDR, and visualized on the UCSC browser [8]. In addition to the TCF21 ChIP-Seq data, also shown are ATAC-Seq data for HCASMC and DNse I hypersensitivity data obtained with human aortic smooth muscle cells (HAoSMC DHS) indicating that TCF21 peaks localize to regions of active chromatin conformation. https://doi.org/10.1371/journal.pgen.1005202.g001

ChIP-Seq peaks are located in regions of open chromatin and identify genes differentially regulated by TCF21 A number of approaches were employed to validate results obtained with ChIP-Seq. First, we investigated the overlap between TCF21 peaks and regions of open chromatin as defined for HCASMC by the assay of transposase accessible chromatin high-throughput sequencing (ATAC-Seq) performed in this laboratory [9] and with ENCODE data for human aortic smooth muscle cells (HAoSMC) as identified by DNase hypersensitivity assay. We found that the TCF21 ChIP-Seq peaks significantly overlap with ATAC-Seq signals (P<1.0e-300, fold enrichment = 1.85; S1 Table and Fig 1C). Similarly, the TCF21 ChIP-Seq peaks also significantly overlap with the HAoSMC DHS signals (P<1.0e-300, fold enrichment = 4.29; S1 Table and Fig 1C). Second, we performed technical replication with ChIP employing separately isolated chromatin from HCASMC derived from a different donor, with PCR primers flanking a number of TCF21 peaks. In these studies, the chosen genes showed 4- to 90-fold enrichment compared to a select non-target region (S2 Fig). To investigate whether genes with binding peaks are directly regulated by TCF21, we took advantage of existing co-expression networks to investigate overlap between ChIP-Seq identified target genes and those genes that track with TCF21 gene expression. We retrieved 2705 co-expression modules, each containing highly co-regulated genes, derived from 108 coexpression networks constructed from many different tissues in human and mouse populations [10–19]. We then performed enrichment analysis between the co-expression modules and genes associated with Ab_shared peaks using Fisher’s exact test. The target genes tended to form coexpression modules, and 296 such modules from 10 coexpression networks were significantly enriched with TCF21 target genes at FDR<0.01 (S2 Table). Vascular endothelial cell and vascular disease related adipose tissue coexpression networks were most strongly associated with TCF21 target genes [20]. Taken together, these data provide evidence that expression of target genes is highly coordinated by TCF21 and that identified peaks functionally regulate target gene expression.

Pathway analyses suggest that TCF21 regulates growth factor signaling as well as cell-cell and cell-matrix interactions To investigate the molecular and cellular processes downstream of TCF21 and possible mechanisms of disease association, studies were conducted to look for over-representation of TCF21 target genes among well annotated regulatory pathways. Here, we analyzed the common peak set (Ab_shared) with the Genomic Regions Enrichment of Annotations Tool (GREAT) [21]. GREAT assigned genes to peaks and queried a number of functional databases with the resulting gene list (Table 2). Evaluation of gene ontology (GO) Molecular Function, GO Biological Process, PANTHER Pathway, and Pathway Commons databases identified terms related to growth factor signaling (“platelet growth factor (PDGF) receptor binding/signaling,” “vascular endothelial cell growth factor (VEGF) signaling”), cell-matrix interactions (“integrin binding,” “cell adhesion”), matrix biology (“extracellular matrix structural constituent”), actin contractile function (“actin filament-based processes,” “actin cytoskeleton”). Mouse phenotype database terms included “abnormal cardiovascular system physiology”, “abnormal blood circulation”, and “abnormal blood vessel morphology”. Importantly, MSigDB Predicted Promoter Motifs ontology identified enrichment among the TCF21 target genes for those with JUN family member binding sites in their promoter regions. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Table 2. GREAT pathway terms from analysis of TCF21 Ab_Shared peaks. https://doi.org/10.1371/journal.pgen.1005202.t002

TCF21 peaks contain the CAGCTG E-box motif as well as an activator protein-1 (AP-1)-like motif that mediates JUN factor binding The availability of sequence information across a large number of TF binding sites allowed identification of the canonical binding sequence for TCF21. We employed the HOMER and MEME-ChIP algorithms for this task, investigating de novo TF motif enrichment within TCF21 peaks shared between Ab1 and Ab2 [22,23]. This analysis identified the nucleotide sequence CAGCTG in 67.2% of peaks (P = 1e-1010) (Fig 2A). This sequence matched the CANNTG sequence that is the common E-box binding motif used by bHLH factors, and is identical to the E-box motif that is known to mediate binding of bHLH partners of TCF21, including TCF12 [24]. An additional E-box motif (CATCTG) was found in 66.7% of peaks (P = 1e-627), and identified as identical to the motif recognized by bHLH factor Olig2, likely representing a second motif that is recognized by TCF21. Interestingly, an additional enriched TF binding motif was also identified in approximately 30% of peaks, corresponding to the bZIP motif TGA(G/C)TCA (P = 1e-336) that is known to bind the AP-1 family of TFs. Other motifs of interest included those that mediate binding of TEAD and CEBP transcription factor families. Graphing the distribution of these motifs in comparison to the summits of TCF21 peaks suggested that AP-1 and possibly ATF1 factors bind in a bimodal pattern flanking TCF21, suggesting a possible steric binding relationship between TCF21 and these bZip factors (Fig 2B). These motifs likely mediate binding of TFs that cooperate with TCF21 to direct transcriptional programs associated with target genes, as has been characterized for other TFs [25]. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Fig 2. Analysis of peak sequences identifies TCF21 binding motifs as well as motifs for JUN related and other transcription factors that likely cooperate with TCF21. A) HOMER analysis of known TF motif enrichment within TCF21 peaks in the Ab_Shared data revealed several distinct motif families. The bHLH motif CAGCTG is identical to that attributed to TCF12, a known heterodimer partner of TCF21 [26], and a second highly enriched bHLH motif CATCTG is attributed to nervous system TF OLIG2, suggesting that TCF21 can bind either of these two motifs. The bZIP motif TGA(G/C)TCA most closely resembles the binding sequence for TFs within the AP-1/ATF super family. Other motifs found to be enriched in the TCF21 peaks include those mediating binding of TEAD, CEBP, and ATF transcription factor family members, and an unknown element identified by ChIP-Seq with NANOG in human embryonic stem cells (ESC). B) Distribution (density) plots for top 7 motifs from panel A: TCF12, OLIG2, AP-1, unknown-NANOG (left), and CEBP, TEAD4, ATF1 (right). C) TCF21 binds in close proximity to JUN and JUND in a number of loci, including developmentally important WT1 and PDGFRB loci. https://doi.org/10.1371/journal.pgen.1005202.g002

Follow-up ChIP-Seq studies verify over-representation of JUN family member binding at TCF21 target genes We have previously shown that JUN and other AP-1-related transcription factors transactivate TCF21, and disruption of this pathway by disease-associated allelic variation in the related binding site may account in part for the CAD susceptibility observed at this locus [27]. To explore whether JUN factors may also bind in association with TCF21 to co-regulate target genes in arterial smooth muscle cells, we performed ChIP-Seq in HCASMC for JUN and JUND. Samples were processed, sequences aligned to the genome, and peaks called with the same algorithms as for the TCF21 experiments. We quantified the overlap of TCF21 and JUN/JUND binding regions against a background of all regions of open chromatin, with the analyses employing both ATAC-Seq study of HCASMC and DHS study of HAoSMC to define this background. For the analysis with ATAC-Seq regions we found significant overlap of TCF21 with JUN and JUND binding sites (P<4.12e-215, fold enrichment = 2.84), and employing the same methods with HAoSMC DHS regions as background, TCF21 overlap with JUN and JUND peaks remained significant (P<1.79e-183, fold enrichment = 2.21). Example genomic regions with overlap of TCF21, JUN and JUND binding are shown for the developmental WT1 gene and the developmental growth factor PDGFRB gene (Fig 2C). This and other labs have shown that WT1 regulates TCF21 [27,28], but TCF21 binding in the WT1 locus as demonstrated here is novel and provides support for a bidirectional regulatory interaction between these important developmental factors. Collectively, these analyses reveal common genome-wide binding patterns between TCF21, JUN and JUND, providing strong evidence for the coordinated binding of TCF21 and JUN family members in HCASMC. Taken together with previously published data showing that JUN factors are upstream regulators of TCF21 transcription, these results suggest a compelling functional link between these TF pathways.

Functional SNPs mapped to genes targeted by TCF21 are significantly enriched among CAD GWAS SNPs To look for more functional relationships between TCF21 target gene SNPs and those associated with other CAD genes, further analyses were conducted employing regulatory SNPs (eQTLS) which have been identified through studies in a variety of tissues investigating the genetic basis of gene expression [11,16,29–31]. We retrieved eQTLs from liver, brain, blood, human aortic endothelial cells (HAEC), adipose tissues and collected all the eQTLs/functional SNPs mapped to specific target genes [11–16,29,31–33]. eQTLs/functional SNPs mapped to genes targeted by TCF21 were significantly enriched among SNPs with low CARDIoGRAM GWAS P-values (P<0.01) (fold enrichment 1.78 to 2.42, P = 1.24e-12 to 3.69e-95 in all eSNP sets tested; Table 3). Additionally, we obtained all the functional SNPs from RegulomeDB (http://www.regulomedb.org/, based on ENCODE) and evaluated them in the context of their functional annotation. Functional SNPs for TCF21 target genes as defined by the RegulomeDB Category I (i.e., SNPs with highest level of evidence that they have functional influence on genes) showed the highest fold enrichment for SNPs with CAD GWAS P-values < 0.01 (Category I fold enrichment 2.06; P = 1.04e-155). Category II SNPs (SNPs with less functional evidence than Category I) also showed highly significant enrichment for low P-values (fold enrichment 1.42; P = 2.60e-40) (Table 3). All analyses were controlled for LD, with SNPs possessing r2>0.3 removed. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Table 3. Enrichment of functional SNPs (based on eQTL mapping and ENCODE) with low P-value associations in CARDIoGRAM GWAS for SNPs of TCF21 target genes. https://doi.org/10.1371/journal.pgen.1005202.t003

TCF21 target genes show significant over-representation among GWAS genes associated with CAD We noted previously that a number of the CAD loci that have reached genome wide significance contain TCF21 peaks (Fig 1C), and reasoned that since TCF21 is a transcription factor one mechanism for its disease association might be through regulation of these other CAD loci. We were thus interested to perform an enrichment analysis with GWAS data to test the hypothesis that TCF21 affects CAD by modulating a larger than expected number of CAD-related genes. Significant enrichment of TCF21 targets among CAD loci would support this hypothesis. To investigate this possibility, we took two complementary approaches to look for over-representation of TCF21 binding regions among CAD GWAS loci. The first approach was based on gene-level overlap by assessing enrichment of TCF21 target genes among candidate genes in the CHD GWAS loci. The second approach was based on SNP-level linkage information by evaluating whether the average linkage disequilibrium (LD) between TCF21 peak SNPs and CAD GWAS loci SNPs is greater than expected by chance. To test the specificity of TCF21 target regions to CAD, we also included additional phenotypes for comparison. The traits/phenotypes that we investigated with both analyses included: i) coronary artery disease phenotypes, ii) risk factors that are known to be associated with CAD, iii) non-atherosclerotic vascular diseases that are not specifically associated with CAD, iv) primarily inflammatory disease phenotypes that are known to involve molecular pathways that are also linked to CAD, v) disease phenotypes related to tissues where TCF21 is known to not be expressed and which were predicted negative controls. Also, we focused analyses on traits with the largest number of associated variants, with the goal of strengthening the statistical analysis, and primarily employed traits/phenotypes with at least 20 associated variants. Phenotypes or traits that passed P<0.05 from both methods were deemed significant, yielding a combined cutoff of P<0.0025. Considering ~20 disease sets were tested, this combined statistical cutoff is equivalent to a Bonferroni-corrected P<0.05. In the first analysis, we investigated over-representation of TCF21 binding region genes among CAD locus genes. As a preliminary analysis, to test for possible confounding, we tested whether TCF21 binding is by chance more likely to be near genes mapped for various GWAS phenotypes by running an enrichment analysis between genes linked to TCF21 binding sites and all GWAS genes from the GWAS Catalog [34]. As shown in S3 Table, although there is statistically significant over-representation of Ab2 and Ab_Shared binding site genes among CAD GWAS genes, the fold enrichments are all close to 1, indicating very minor enrichment of overall GWAS signals among the TCF21 targets. We thus assigned genes to TCF21 peaks employing a distance metric of 50 kb (S4 Table), compiled a list of candidate genes for each phenotype/trait, and tested for enrichment of TCF21 target genes among disease/trait candidate genes. Enrichment analyses for chosen phenotypes/traits were conducted using all GWAS genes as background to correct for the slight over-representation of GWAS genes among those that are TCF21 targets. In addition, to correct for any potential bias in the large numbers of GWAS genes for certain traits such as height and CAD, we implemented a permutation strategy by generating 1000 random GWAS gene sets of matching size for each trait to derive permutation-based enrichment P-values. The methodology for this permutation-based analysis is provided in the Methods section. Employing genes in the GWAS catalog associated with CAD phenotypes, enrichment was found for TCF21 target genes among CAD genes compared to a background of all GWAS genes (CAD, 1.34-fold enrichment, permutation P = 0.014) and these results did not change substantially with exclusion of lipid trait genes (CAD no lipid, 1.34-fold enrichment, permutation P = 0.03) (Table 4). When CARDIoGRAM+C4D data was included in the analysis (CAD extended) the fold enrichment increased to 1.51 (permutation P<1.0e-03) and again this did not change substantially with removal of lipid trait variation (CAD extended no lipid, 1.53-fold enrichment, permutation P<1.0e-03). We also found that the candidate sets of GWAS genes associated with the CAD related trait platelet number and a disease phenotype related to a dysfunctional immune system, inflammatory bowel disease (IBD), also showed a high degree of enrichment for TCF21 target genes among the GWAS gene sets. At Bonferroni corrected P<0.05 (raw P<0.0022) in this test, height, CAD extended, CAD extended no lipid, IBD, and platelet phenotypes reached statistical significance. Importantly, we found little evidence of enrichment of TCF21 target genes among GWAS candidate genes for risk factors blood pressure, lipids, and glucometabolic related traits. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Table 4. Enrichment of TCF21 target genes from Ab_Shared among GWAS candidate trait genes using all GWAS genes as background and a permutation strategy to correct for the differences in the numbers of GWAS genes between traits. https://doi.org/10.1371/journal.pgen.1005202.t004 In a second type of analysis conducted at the SNP level, we investigated whether common variants in regions targeted by TCF21 binding tend to demonstrate higher LD with SNPs associated with CAD by GWAS. Such a link would provide additional evidence for the involvement of TCF21 in the genetic pathways that contribute to CAD risk and serve as a complementary approach to the gene-based analysis described above. After pruning SNPs associated with both the CAD loci and TCF21 peaks for LD, we investigated whether SNPs in the TCF21 binding sites were in higher than expected LD with CAD-associated genetic variants compared to random SNPs. A similar analysis was done for other GWAS phenotypes to test the specificity to CAD. For each trait-associated SNP set, permutation analysis was utilized to generate distributions of average r2 using 10,000 random sets of TCF21-GWAS SNP pairs, and statistical significance was assigned to those categories where fewer than 5% of permutations produced an average r2 greater than or equal to the true data. Results from this analysis showed that SNPs in TCF21 peaks have significantly greater LD than expected by chance with SNPs for CAD related phenotypes: CAD (permutation P = 0.0209) and for CAD Extended (permutation P = 0.0086) that analyzed GWAS SNPs plus those from CARDIoGRAM+C4D (Table 5 and Fig 3). With these analyses, the CAD categories without lipid variants were marginally more significant than CAD categories including lipid loci. Interestingly, the greatest enrichment was found for non-CAD phenotypes, including height and IBD as found for the gene enrichment analysis, as well as schizophrenia. With Bonferroni correction for multiple testing these three non-CAD phenotypes reached statistical significance of P<0.05. However, when considering the consistent phenotypes between this test and the previous gene level enrichment analysis for GWAS phenotypes, the CAD phenotypes CAD, CAD extended, CAD nolipid, CAD extended nolipid, as well as height, IBD, and platelet number were found to be significant at P<0.05 in both tests, yielding a combined P<0.0025 which is equivalent to Bonferroni-corrected P<0.05. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Fig 3. TCF21 target regions contain variation that is in linkage disequilibrium with CAD GWAS variation. SNPs in TCF21 peaks were evaluated for association with GWAS SNPs on the basis of linkage disequilibrium; r2 values for each SNP pair were averaged for the GWAS categories inflammatory bowel disease (IBD), height, CAD, CAD plus CARDIoGRAM+C4D (CAD extended), Parkinson’s disease, and breast cancer. To test for enrichment in each category, permutation analysis was utilized to generate distributions of average r2 using random sets of SNP pairs and statistical significance was assigned to those categories where fewer than 5% of permutations produced an r2 greater than or equal to the true data. https://doi.org/10.1371/journal.pgen.1005202.g003 PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Table 5. Analysis of linkage disequilibrium between SNPs in GWAS loci of select phenotypes and TCF21 peak regions for chosen phenotypes. https://doi.org/10.1371/journal.pgen.1005202.t005