Understanding how genes are regulated provides complementary information to gene expression. We and others have studied gene regulation by focusing on accessible chromatin, which allows transcriptional regulators to bind to the otherwise highly condensed nuclear genome. Chromatin accessibility has been associated with promoters, enhancers, silencers and insulators, 7 and changes as cellular identity is established through differentiation and development 8 , 9 or in response to cellular stresses. Chromatin profiling provides a mechanism as to why expression is changing, and whether observed changes may be transient or persistent. We have shown that chromatin accessibility can differentiate disease subtypes 10 and helps to describe genetic and environmental contributors to disease. 11 Therefore, we sought to determine whether multiple clinically distinct subclasses of adult CD exist by examining both gene expression, using RNA-seq, and chromatin accessibility, using formaldehyde-assisted isolation of regulatory elements (FAIRE-seq), 12 in unaffected colon mucosa from patients with CD and non-IBD.

Crohn's disease (CD) is a chronic heterogeneous inflammatory disorder with distinct patterns of clinical behaviour. CD may present or evolve with time into a more complex phenotype with patients developing strictures, fistulae and/or abscesses, and many patients experience highly variable response to therapies. Genetic associations 1 , 2 and a recently defined lipid metabolism-based gene expression signature predictive of disease involvement 3 suggest that molecular or genetic factors are associated with and may contribute to disease heterogeneity, but precise mechanisms are poorly understood. Molecular subtypes defined by gene expression that impact clinical phenotypes have also been documented in other complex diseases, especially cancers. 4–6 Whether adult CD can be similarly separated into two or more subgroups and whether these molecular classes can explain disease phenotypes remains largely unknown.

Results

Whole genome interrogation of the colonic transcriptional and chromatin landscape reveals two distinct molecular classes in CD To determine, in an unbiased manner, whether gene expression levels separated samples into distinct molecular groups, we performed a principal components analysis (PCA) using gene expression profiles from a combined set of 21 patients with CD and 11 patients with non-IBD. A striking clustering pattern emerged, whereby the individuals with CD were divided into two distinct, expression-based subclasses, one of which clustered with the non-IBD controls (figure 1A). To specifically interrogate these two CD subclasses, we identified genes differentially expressed between these two groups of patients with CD (849 genes at False Discovery Rate (FDR)<0.05; figure 1B, see online supplementary table S1 for top 20 differentially expressed genes in each CD subclass). Surprisingly, when looking at the top 25 differentially expressed genes regardless of direction, most had tissue-specific expression patterns that discriminated colon from the small intestine (ileum), including NXPE4, CWH43 and CA2 (colon-specific) as well as RBP2, TM6SF2, APOB, MTTP, CREB3L3 and CPS1 (ileum-specific).13 CD samples similar to non-IBD controls (figure 1A) exhibited abundant expression of the above colon-specific genes, whereas the other CD subclass showed expression patterns more consistent with ileum despite being sampled from the colon. To explore this more globally, we compared all these differentially expressed genes with 947 genes with known significant differential expression between colon and ileum (figure 1C).14 We found that 34% of the genes more highly expressed in the colon-like CD samples were indeed markers of normal colon, and 44% of ileum marker genes were more highly expressed in the ileum-like CD samples (p<1×10−95, hypergeometric test). To validate these expression differences, we performed reverse transcription-quantitative PCR on 18 CD samples of unaffected colon mucosa (9 colon-like, 9 ileum-like) using CEACAM7 and APOA1 as a proxy for colon-like and ileum-like expression patterns (see online supplementary figure S1A). In agreement with the RNA-seq data, CEACAM7 was significantly more abundant in colon-like CD samples (p=0.017, one-sided t-test), whereas APOA1 was significantly more abundant in ileum-like CD samples (p=0.020, one-sided t-test). supplementary data [gutjnl-2016-312518supp.pdf] Figure 1 Two distinct molecular subtypes in adult Crohn's disease (CD). (A) Principal components analysis (PCA) analysis of RNA-seq data from colon tissue from 21 patients with CD and 11 patients with non-IBD shows two distinct clusters. (B) Eight hundred and forty-nine genes are differentially expressed between the two CD subclasses (adjusted p<0.05, DEseq), defined as colon-like and ileum-like. Known markers of colon and ileum are highlighted. (C) Genes upregulated in colon-like (top) and ileum-like (bottom) CD subclasses overlap previously defined colon-specific and ileum-specific genes, respectively. (D) Differentially accessible regions (DARs) identified using formaldehyde-assisted isolation of regulatory elements (FAIRE-seq) (p<0.05, two-sided t-test, normalised read counts, 300 bp windows) show distinct profiles in colon-like and ileum-like CD samples. (E) Colon-like-associated DARs are enriched and ileum-like-associated DARs are depleted near genes upregulated in colon-like samples (p≤0.01, permutation test). (F) Ileum-like-associated DARs are enriched and colon-like-associated DARs are depleted near genes upregulated in ileum-like samples (p<0.001, permutation test). To determine whether these expression changes represented a fundamental shift in the functional cellular identity of these tissues, we investigated chromatin accessibility by performing FAIRE-seq12 on the same samples from both CD subclasses. Supporting a fundamental shift in underlying molecular phenotypes, we identified 3339 300-bp regions with significantly differential chromatin accessibility between colon-like and ileum-like CD samples (figure 1D; p<0.05, two-sided t-test), hereafter referred to as differentially accessible regions (DARs). These DARs could be divided into two classes based on greater accessibility in colon-like or ileum-like CD samples, and further, an unsupervised PCA of FAIRE-seq data nearly separated ileum-like from colon-like CD subclasses (see online supplementary figure S1C). Subclass-specific changes in the chromatin landscape corresponded strongly to differences in nearby (within 50 kb) gene expression (figure 1E,F; p≤0.01, permutation). Additionally, both colon-specific and ileum-specific DARs exhibited a significant enrichment for CD genome-wide association study (GWAS) loci15 compared with what was expected due to random chance (colon-specific p=0.018, ileum-specific p=0.006; permutation), suggesting that changes in chromatin accessibility occur at disease-relevant regions of the genome. We next sought to annotate these DARs based on tissue-specific gene regulatory information. Post-translational modifications on histone proteins serve to compartmentalise the genome and demarcate putative function of regulatory elements.16 Using Chromatin Immunoprecipitation (ChIP-seq) data from the Roadmap Epigenomics Project,17 we assessed the enrichment of six histone modifications reflective of underlying regulatory activity (active: H3K4me1, H3K4me3, H3K27ac, H3K36me3; repressive: H3K27me3, H3K9me3) around colon-specific and ileum-specific DARs. We found that colon-specific DARs were demarcated by H3K27ac and H3K4me1 modifications present in colon but not ileum (see online supplementary figure S1B), suggesting these DARs function as active regulatory regions only in the normal colon. In contrast, ileum-specific DARs demonstrated positive H3K27ac and H3K4me1 enrichment found only in normal small intestine, despite these samples originating from colon tissue. These suggest that regulatory activity in DARs contribute to the colon-like and ileum-like expression levels. To confirm regulatory activity, we cloned three DARs (two with colon-specific and one with ileum-specific chromatin accessibility) into luciferase vectors upstream of a minimal promoter in both orientations using THP-1 monocytes (see online supplementary figure S1D). Relative to empty vector controls, two DARs (associated with SATB2-AS1 and DEPDC7) exhibited a significant increase (p<0.01, one-sided t-test) in luciferase activity in both orientations, strongly suggestive of enhancer function. The third DAR (associated with SLC16A9) also enhanced luciferase activity significantly (p=8.9×10−5, one-sided t-test), however, only in the reverse orientation. Together, these data support the existence of two molecularly distinct subclasses of CD. Furthermore, chromatin accessibility data suggest these subclasses exist due to stable molecular transformations of the genomic architecture in colon tissue cells, and not transient differences due to external cellular signalling.

Whole genome RNA-seq analysis reveals colon-like and ileum-like subclasses in treatment-naïve paediatric patients with CD Gene expression profiles in adult patients with CD may vary due to treatment history. Therefore, we sought to determine whether treatment-naïve paediatric patients with CD also segregated into similar molecular classes. We performed PCA on previously published RNA-seq data from ileal biopsies in age-matched paediatric patients with CD (n=201) and non-IBD (n=40) generated within the Pediatric Risk Stratification Study.3 Although a clustering as distinct as with the adult samples was not observed, non-IBD ileum samples clustered with some CD samples along the first principal component, whereas the other CD samples were separate (see online supplementary figure S2). To determine whether this pattern was related to the adult CD molecular subtypes, we performed PCA on combined adult colon and paediatric ileum expression data (figure 2A). Unsurprisingly, samples predominantly separated by tissue of origin (colon vs ileum; first principal component). However, a separation indicative of two molecular subclasses was evident along the second principal component, and correlated nearly exactly with the first principal components in single cohort PCAs (see figure 1A online supplementary figure S2). Furthermore, paediatric CD samples fell on a spectrum highly correlated with expression of APOA1, a marker gene of the ileum and indicator of disease outcome in the paediatric cohort.3 This pattern aligned well with the ileum-like (APOA1-high) and colon-like (APOA1-low) subclasses we identified in the adult CD colon. Figure 2 Treatment-naïve paediatric Crohn's disease (CD) samples show similar molecular subtypes. (A) Principal components analysis (PCA) analysis of combined RNA-seq data from adult colon tissue and paediatric ileum tissue from patients with CD and non-IBD shows separation of tissue types (PC1) and replicates ileum-like and colon-like clusters (PC2). Expression of APOA1 (blue-pink, low-high) in paediatric samples aligns well with subclasses. (B) Hierarchical clustering of RNA-seq data using 500 colon-specific and ileum-specific genes show clusters of genes associated with ileum-like and colon-like samples across both the adult colon and paediatric ileum cohorts, as well as genes associated with tissue of origin. To closely examine the relationship between the two CD subclasses across the adult and paediatric cohorts, we assessed gene expression patterns across the 500 most variably expressed known colon and ileum marker genes14 using hierarchical clustering (figure 2B). To focus this analysis, we selected the 50 paediatric ileum samples each that were most colon-like and most ileum-like based on the PCA (figure 2A, second principal component). Many of the colon and ileum representative genes described above (e.g., APOA1, CEACAM7, MTTP, LEFTY1 and CA2) exhibited highly consistent expression patterns across all samples in a defined molecular subclass, regardless of cohort. Interestingly, for these 500 genes, expression patterns were extremely consistent between colon-like CD and non-IBD colon samples, as well as between ileum-like CD and non-IBD ileum samples. Importantly, we note that a subset of genes differentiate all colon tissues from all ileum tissues indicating that tissue-of-origin-specific expression is not completely lost. Together, these data strongly suggest that the colon-like and ileum-like molecular signatures define two forms of CD present regardless of tissue sampling location, patient age or treatment status.

Metabolic and immune activation gene expression profiles characterise distinct molecular phenotypes in adult and paediatric CD Gene expression differences between colon-like and ileum-like subclasses in the adult and paediatric CD cohorts went beyond the marker genes described above (figure 1B,C). To evaluate this more broadly, we computed and compared pathway-level expression patterns of both CD subclasses and non-IBD controls in both patient cohorts.18 ,19 We then grouped significantly altered pathways based on similarity in both gene composition and direction of expression difference (see figure 3 online supplementary figures S3 and S4). First, numerous pathways related to interferon signalling, G-protein coupled receptor (GPCR) signalling, and antigen processing were significantly upregulated in patients with CD as a whole relative to non-IBD controls in both cohorts (see online supplementary figure S3: ‘CD-enriched’, red). Given that the adult patient cohort consisted of disease-unaffected tissue, and the paediatric cohort was treatment-naïve, this suggests a basal activation of the immune system in CD. In contrast, many pathways related to RNA processing, translation and transcription were downregulated in CD in the paediatric patients (see online supplementary figure S3: ‘CD-depleted’, blue). These results strongly corroborate studies in mice and humans linking overall defects in cellular protein processing in CD to immune activation and unabated inflammation.20 Figure 3 Pathways enriched for differentially expressed genes associated with Crohn's disease (CD) phenotypes. Pathway enrichments were determined using gene set association analysis (GSAA) (FDR<0.1, permutation test) for all pairwise comparisons between all CD, colon-like CD, ileum-like CD and non-IBD samples. Separate analyses in (A) adult colon and (B) paediatric ileum show similar and unique pathways associated with CD phenotypes. Each circle represents a Reactome-defined pathway with the size reflecting the number of genes in the pathway. Pathways were grouped based on similarity in gene membership, and labels describing multipathway clusters are shown. See online supplementary figures S3 and S4 for CD versus non-IBD comparisons and full list of pathways. Next, we identified significantly altered pathways that described how the ileum-like and colon-like CD subclasses differed. Among the most pronounced effects in both cohorts reflected strong differences in metabolic activity including pathways involved in lipid metabolism and metabolism of foreign (xenobiotic) agents (figure 3A: ‘ileum-like-enriched’, red; figure 3B: ‘colon-like-depleted’, blue). Interestingly, energy production by way of the tricarboxylic acid (TCA) cycle was significantly affected in opposing ways: in adults, it was upregulated in the colon-like class and simultaneously downregulated in the ileum-like class in colon tissue (figure 3A: ‘colon-like-enriched, ileum-like-depleted’, pink); whereas in paediatric patients it was downregulated in the colon-like class and upregulated in the ileum-like class in ileal tissue (figure 3B: ‘ileum-like-enriched, colon-like-depleted’, purple). This suggests that energy production increases in patients where the subtype is more similar to the tissue of origin, and decreases when gene expression adopts patterns of the opposite tissue. In addition, several pathways related to GPCR signalling were upregulated in the ileum-like subclass in colon tissue (figure 3A: ‘ileum-like-enriched’, red) and upregulated in the colon-like subclass in ileum tissue (figure 3B: ‘colon-like-enriched’, green). GPCRs are highly expressed in monocytes and macrophages central to the development and progression of inflammation in CD, mainly through migration and accumulation within the inflamed tissues.21 Taken together, dysregulation of metabolic pathways may represent defining features of CD subtypes. Although dysregulation of lipid metabolism has been previously described in CD,3 our data suggest these alterations may be specific only to patients within a certain subclass and dependent on the tissue being assayed. Furthermore, these data indicate that despite a striking similarity in expression of key ileum and colon marker genes (figure 2B), there are key differences in pathway-level expression patterns between adult patients with CD and paediatric patients with CD, such as the immune response (e.g., nucleotide-binding oligomerization domain (NOD) signalling, toll-like receptor (TLR) signalling signalling, interleukin signalling), which point towards clinically relevant phenotypes and characteristics of each subclass.