In a first step toward a better understanding of CF pathophysiology, we developed and validated an integrative genomics approach to characterize the extracellular milieu associated with CF by using PBMC responses as the key output ( 28 , 55 ). We examined global transcriptional profiles of healthy donor PBMCs cocultured with plasma from CF patients or age-matched unrelated healthy controls (HCs). We correlated these profiles with common outcome measures including pancreatic function, spirometry, and infection status. We also characterized subjects through immunophenotyping of whole blood immune cells via flow cytometry ( 8 , 45 , 51 ) and plasma cytokines via multiplex enzyme-linked immunosorbent assay (ELISA). Our overall goal was to identify molecular signatures that differentiate the study groups and may lead to candidate biomarkers for assessing prognosis, monitoring disease progression, and predicting and evaluating response to treatment ( 41 ).

Lung biopsies, the ideal sample source for these molecular characterization studies, cannot be obtained from children with CF easily or without considerable risk. Although peripheral blood leukocytes can be obtained from patients, they may not precisely reflect immune activity in the lung. PBMCs are a source of the dense mononuclear infiltrate present at sites of CF airway injury and acquire a unique transcriptional “molecular signature” as a result of repeated passage through the pulmonary vascular bed ( 44 ). Moreover, they respond to soluble disease-associated factors in plasma ( 37 ), including factors from bacterial and viral pulmonary infections ( 30 , 42 ).

Several lines of evidence suggest that differences in CF clinical course, including lung infection status, are associated with differences in expression of genes involved in airway immune defense. Candidate gene association studies and genome-wide association studies have identified several loci involved in immune response and regulation, including components of the response to lipopolysaccharides, that associate with disease severity in CF ( 9 , 10 , 15 , 16 ). Direct profiling of CF patients’ circulating peripheral blood mononuclear cells (PBMCs) revealed a molecular signature that reflects the resolution of pulmonary exacerbations ( 44 ) and identified differentially expressed genes involved in immune response, phagocytosis, and extracellular matrix degradation. Gene expression signatures linked to differences in disease severity have also been extracted from nasal respiratory epithelia from CF patients ( 25 , 38 , 58 ), bronchial tissue biopsy specimens from patients with mild or severe CF ( 53 ), and transformed lymphoblastic cell lines from CF patients ( 35 ). Despite this growing molecular signature data set, studies are needed to further examine variations in the levels of gene transcripts and their products that underlie the extensive heterogeneity in CF presentation, progression, and severity. Such analyses will ultimately lead to a better understanding of CF pathophysiology and the identification of reliable biomarkers that stratify patients based on molecular subtype. This information will allow us to optimize treatment strategies for each individual patient.

Cystic fibrosis (CF) is an inherited multisystem disease characterized by pancreatic insufficiency and progressive deterioration of lung function. CF is attributed to mutations in a single gene leading to dysfunction of the encoded protein, the cystic fibrosis transmembrane conductance regulator (CFTR) ( 43 ). We have gained insights into the pathological mechanisms underlying CF phenotypic heterogeneity through the evaluation of the impact of different CFTR mutations ( 5 , 47 ). Further insights have come from the identification of modifier genes through candidate gene analyses ( 16 ), genome-wide association studies ( 9 , 20 , 57 ), or direct gene sequencing ( 18 ). These studies have revealed the association of mutations in CFTR with changes in the expression of genes that regulate fluid and electrolyte transport, intracellular trafficking, and inflammation. Mutations have also been associated with changes in proteins that directly bind and interact with CFTR, underscoring that CFTR functions in a complex transcriptional and functional framework within the cell ( 59 ). Given the complexity of the system, we do not fully understand the relationships between the abnormal CFTR gene product, modifier genes, environmental exposures, and the development of inflammation and progression of lung disease. This lack of complete understanding limits our ability to predict an individual patient’s clinical course and treatment response.

A nonparametric Mann-Whitney-Wilcoxon test was used to compare levels of cytokines and gene expression between 56 CF patients and six HCs. Cytokines with a P < 0.05 were selected for further analysis. Seven genes met the criteria |logFC| ≥ 2 and FDR ≤ 0.05 and were also present in the set of 216 genes; these seven genes were selected for further analysis. A canonical correlation analysis was performed with the selected 15 cytokines and seven genes. Wilks’ Lambda test statistic was used in the permutation test to examine the significance of canonical correlation coefficients.

Raw cytokine ELISA data (mean fluorescence intensity) were analyzed using StarStation version 1.0 (Applied Cytometry, Sheffield, UK). A five-parameter curve-fitting algorithm was applied for standard curve calculations. Results were presented as mean concentrations for all variables that were examined. Mean concentrations were adjusted to account for dilution. Cytokine concentrations in at least one sample needed to be within the assay range for inclusion in this analysis. The nonparametric Mann-Whitney-Wilcoxon test was used to compare cytokine levels between HC and CF patients (PI and PS combined). Analysis was performed in SAS version 9.4.

Due to the highly interdependent relationship of many genes, we performed nonparametric analyses via a random forest that was a collection of decision trees. Random forest analysis can overcome the limitations of decision trees, such as low prediction accuracy and high variance. Each tree was grown on a different random subsample with approximately two-thirds of the training data. A random sample of potential predictors at each node was selected for splitting at any node. The remaining data were used to evaluate the performance of each tree. We used 500 trees, with the number of predictors considered for each node being approximately the square root of the number of potential predictors, which required two parent nodes. The importance of each variable was assessed by a standard method: a variable in each tree was tested by scrambling its values and measuring the decline in model accuracy. The target variables were the subject groups (HC vs. CF, PI CF vs. all CF, PS CF vs. all CF). These analyses were performed in the SPM Salford Predictive Modeler software suite Random Forests (Salford Systems, San Diego, CA).

To determine the replication consistency of our approach, we also calculated Pearson correlation coefficients and the ratios of the duplicates for five pairs of HC samples and 11 pairs of CF samples. The mean and interquartile range (IQR) for ratios were then calculated for the 1,094 probe sets. To validate the results, we randomly selected 80% of the samples in each group (CF vs. HC, PI CF vs. HC, and PS CF vs. HC) without replacement. P values for the Mann-Whitney-Wilcoxon test and Benjamini-Hochberg FDR, as well as the |log 2 ratios| for the CF and HC groups were calculated. Significance was defined as FDR ≤ 0.01 and |log 2 ratio| ≥ 0.5. This process was repeated an additional nine times, and significance was determined each time. If it was significant for all 10 iterations, then the final result was significant for that gene; otherwise, it was not significant. The analysis was performed in SAS version 9.4 (SAS Institute, Cary, NC).

To verify that the CF and HC samples comprised distinct groups, we performed unsupervised principal component analysis (PCA) of the unfiltered gene expression data with Partek Genomics Suite 6.5. This procedure allows projection of the data onto a three-dimensional space spanned by three principal components, which are linear combinations of normalized expression values of the factors that capture the highest proportion of the variance. To group similarly expressed genes, we performed hierarchical clustering with Genesis ( 49 ) using average linkage clustering with Euclidian distance.

To determine whether the mean of the log expression levels of the probe set was the same across all groups, an analysis of variance (ANOVA) was conducted using Partek Genomics Suite 6.5 (Partek, St. Louis, MO). For repeated measures, a mixed model was used. In particular, for the evaluation of consistency, variables were defined as fixed (age of patient at diagnosis, age of patient at sample collection, P. aeruginosa colonization, sex, pancreatic sufficiency, and sweat chloride level) or random (age of sample, GeneChip lot number, plate number, sample collection date, sample scan date, time until GeneChip expiration). The resulting P values were corrected for multiple hypothesis testing using the Benjamini and Hochberg false discovery rate (FDR) method ( 2 ). Statistical significance of differential gene expression was determined with ANOVA and FDRs using Partek Genomics Suite and the Mann-Whitney test where relevant. A P value below an arbitrary cut-off of 0.01 FDR rejected the null hypothesis. Probe sets differentially expressed with regard to at least one fixed effect, but none of the random effects, were considered universal signatures of these fixed effects.

To satisfy assumptions for statistical tests, continuous variables such as gene expression were log transformed, and other transformations were explored when needed. When parametric assumptions were not met for the outcome of interest by the Shapiro-Wilk normality test, then nonparametric statistical analyses were performed. Two-sided tests were used, and, unless otherwise stated, P < 0.05 was deemed significant.

We used a standard fold change (FC) of log 2 0.5 (equivalent to 1.41-fold) as a threshold. With ~50,000 genes evaluated in 100 CF and 30 HC subjects, a t -test of the log 2 expression at an alpha of 10 −7 would detect a difference of at least 1.539 standard deviations (SD) with at least 90% power. We observed SDs ranging from 0.15 to 0.3. With this variability, we would be able to detect at least 1.17- to 1.5-fold difference, providing adequate power to detect genes of interest.

The issue of batch effect in our microarray data was addressed by empirical Bayes methodology and the Partek Statistic “Remove Batch Effect” function. The observed differential expression was not due to any batch effect [Gene Expression Omnibus (GEO) Series accession number GSE71799]. To control for batch effects, cases and controls were randomized and intermixed on the arrays with equal representation and distribution. For probe-level gene expression, we applied robust multiarray average (RMA) background correction, quantile normalization, and the median polish method for probe set summarization. We also standardized expression data at the probe level when testing for correlation of expression profiles, due to the potential for sizeable probe effects in microarray data that can inflate the correlation among replicates and unrelated samples.

Twenty plasma cytokines [eotaxin 2/CCL24, granulocyte-macrophage colony stimulating factor (GM-CSF)/CSF-2, IFN-γ, interleukin (IL)-1β, IL-1ra/IL1R1, IL-10/CSIF, IL-12/p70, IL-13, IL-15, IL-2, IL-4, IL-5, IL-6, IL-7, IL-8/CXCL8, monocyte chemoattractant protein-1 (MCP-1)/CCL2, MIP-1α/CCL3, MIP-1β/CCL4, transforming growth factor (TGF)-β, and TNF-α] were simultaneously measured in samples from CF and HC samples via ELISA using the Human Procarta Cytokine Assay Kit (Luminex, Ocean Ridge Biosciences, Palm Beach Gardens, FL). In brief, serum Assay Buffer (25 μl; Procarta, eBioscience, San Diego, CA) and serum samples or standards (25 μl) were transferred into the wells of a filtration plate and incubated at room temperature with the beads for 60 min with shaking at 500 rpm. All samples and standards were assayed in duplicate. The wells were washed with a vacuum manifold and incubated for 30 min with a biotinylated detector antibody. The beads were washed again and incubated for 30 min with streptavidin-conjugated phycoerythrin. After washing to remove the unbound streptavidin-phycoerythrin, we analyzed the beads (>50 beads per cytokine) with a Luminex 100 instrument (Luminex, Austin, TX), which monitored the spectral properties of the beads while simultaneously measuring the amount of fluorescence associated with phycoerythrin. TGF-β1 was assayed separately, since it required special sample dilution and acid activation that could not be combined with the 20-plex system. Standard curves for each cytokine were generated from the supplied reference cytokine concentrations.

We determined the capability of neutrophils to undergo oxidative metabolism to produce reactive oxygen species with dihydrorhodamine 123, an uncharged, nonfluorescent indicator that passively diffuses into cells where it is oxidized to the fluorescent rhodamine 123 that remains trapped in the cells. Whole blood was loaded with dihydrorhodamine 123 and stimulated in vitro with phorbol-12-myristate-13-acetate and N-formyl-met-leu-phe, a chemotactic peptide that acts as a physiological ligand. The mean fluorescence of rhodamine 123 in CD16+ (clone 3G8) neutrophils over unstimulated cells constituted the index of production of reactive oxygen species.

PBMCs were labeled with the fluorescent dye 5,6-carboxyfluorescein diacetate succinimidyl ester (Life Technologies), then stimulated in vitro for 4 days with the mitogens concanavalin A, phytohemagglutinin, phorbol-12-myristate-13-acetate, and calcimycin (all from Sigma-Aldrich, St. Louis, MO) and with CD3 (Becton Dickinson). The dye dilutes with each cell division during T lymphocyte proliferation. On day 4 , cells were harvested and stained with antibodies against CD4 and CD8 to determine the percentage of T cell subsets undergoing cell division.

Whole blood was collected from CF patients in sodium heparin tubes. Evaluation of T cells, B cells, and NK cells was performed within 24 h of specimen collection. Blood was stained with antibodies against CD3 (SK7), CD4 (SK3), CD8 (SK1), CD14 (MoP9), CD16 (B73.1), CD19 (SJ25C1), CD20 (L27), CD25 (2A3), CD27 (L128), CD45 (2D1), CD127 (hIL-7R-M21), and TCR-αβ (WT31) from Becton Dickinson; CD56 (N901) from Beckman Coulter (Brea, CA); and CD45 (2D1), CD45RO (UCHL1), and CD69 (CH/4) from Life Technologies. Staining was carried out at room temperature for 15 min. Flow cytometry data were acquired after red blood cells were lysed with FACS Lyse (Becton Dickinson) on a FACS Calibur (Becton Dickinson).

Commercial cryopreserved PBMCs from a single healthy Caucasian male donor were thawed and washed in accordance with the manufacturer’s protocol (Cellular Technology Limited, Shaker Heights, OH). PBMCs were cultured at 37°C in 5% CO 2 with 20% autologous plasma, HC plasma, or CF plasma as previously described ( 28 ). Cultures were prepared from 500,000 cells/well in RPMI 1640 medium supplemented with 100 U/ml penicillin and 100 μg/ml streptomycin in a total volume of 500 μl. After culture (9 h for cryopreserved PBMCs), total RNA was extracted with TRIzol Reagent (Invitrogen Life Technologies, Waltham, MA). With purified total RNA (100 ng), cRNA was synthesized, amplified, and labeled with the Affymetrix Express Kit (Affymetrix, Santa Clara, CA), then fragmented and hybridized to the GeneChip Human Genome U133 Plus 2.0 (Affymetrix). After hybridization, arrays were washed and stained with Affymetrix fluidics protocol FS450_0001 and scanned with a 7G Affymetrix GeneChip Scanner. This microarray interrogates >47,000 transcripts and was selected for its overall comprehensive coverage. Image data were analyzed with Affymetrix Expression Console software version 1.4.1.46 and normalized with Robust Multichip Analysis ( 23 ) ( www.bioconductor.org ) before determining signal log ratios.

At the time of sample collection, all CF patients were screened for microbiological flora; Pseudomonas aeruginosa infection was reported as one positive microbiological growth from nasopharyngeal, sputum, and/or bronchoalveolar lavage specimens within 6 consecutive months of study enrollment. As the Children’s Hospital of Wisconsin is an accredited CF Care Center, patients received standard CF care as outlined by the CF Consortium guidelines for detection, recording of infection, and treatment ( 32 , 40 ). HC subjects were free of known infection at the time of sample collection.

CF diagnoses were based on pilocarpine iontophoresis [sweat chloride levels were obtained in accordance with CF Foundation guidelines ( 27 )], symptoms, pancreatic status, CFTR mutation class, family history of CF, and information about the phenotypes of CFTR mutations ( 3 , 4 , 13 , 19 ). To be included in our analyses, patients needed to have a confirmed CF diagnosis based on at least one sweat chloride value ≥60 mmol/l and/or CFTR genotype documenting two CF-causing mutations, and follow-up at least once per year. CFTR genotyping was performed at the time of diagnosis or, in older subjects, after genetic testing became available. Patients diagnosed via newborn screening were evaluated by the Wisconsin Newborn Screening Laboratory for the recommended American College of Medical Genetics panel of 23 CFTR mutations ( 56 ). Additional genetic testing was carried out for patients with one identified mutation, including expanded mutation panel testing (Genzyme Genetics, Cambridge, MA), modified temporal temperature gradient electrophoresis of CFTR (Ambry Genetics, Aliso Viejo, CA), and multiplex ligation-dependent probe amplification for deletions and duplications (Ambry Genetics). Individuals with CF and documented pancreatic function status were included ( 4 ). Pancreatic insufficient (PI) status was defined as fecal pancreatic elastase <200 μg/g. Clinical lung function measurements confirmed that spirometry values were performed at baseline according to American Thoracic Society-European Respiratory Society Task Force guidelines ( 26 ) and not during a pulmonary exacerbation defined as a forced expiratory volume reduction of 15% absolute volume compared with the best forced expiratory volume in 1 s (FEV1) value during the prior 12 mo. At a routine clinic visit, a single sample of plasma from CF patients and age-matched, unrelated HCs was aseptically collected in acid citrate dextrose solution A or K+ EDTA anticoagulant for the PBMC-based bioassay.

This study was approved by the Institutional Review Boards of Children’s Hospital of Wisconsin (CHW 07/72, GC 390, CTSI 847, CHW 01-15) and Ann and Robert H. Lurie Children’s Hospital of Chicago (IRB # 2015-400). Written informed consent was obtained from subjects or parents/legal guardians during a clinic visit. Minors were included in this study. Assent was obtained from minors who were able to assent. DNA and plasma samples were obtained from patients recruited consecutively during clinic visits for routine care at the Cystic Fibrosis Center at the Children’s Hospital of Wisconsin (Milwaukee, WI). Recruitment was not restricted to patients with certain disease subtypes to ensure that the sample set adequately represented the general CF population seen in the clinic. Our goal was to include >100 CF patients in the study to examine the clinical variation within CF and ensure that less common disease characteristics, such as pancreatic sufficiency, were adequately represented. Both CF patient and HC samples were collected and processed in parallel by the same methods in the Pediatric Translational Research Unit of the Children’s Hospital of Wisconsin.

This case-control study analyzed blood plasma-induced gene expression profiles with a noninvasive PBMC-based bioassay and correlated gene expression profiles with clinical measures of CF ( Fig. 1 ). Gene expression profiles for a subset of CF patients were further evaluated based on immunophenotyping of B cell, T cell, and natural killer (NK) cell percentages and proliferation, as well as assays of neutrophil elastase. An additional historical cohort of HCs ( 45 , 51 ) was also included in the immunophenotyping analysis.

Based on these data, networks were constructed ( Fig. 7 ). The hubs common to all three comparisons (CF vs. HC, PI CF vs. HC, and PS CF vs. HC) centered on TNF, a multifactorial proinflammatory cytokine mainly secreted by macrophages and involved in the regulation of the inflammatory response, cell proliferation, and apoptosis. Gene Ontology (GO) annotations related to this gene included identical protein binding and cytokine activity. All three comparisons also contained the hub TP53, a tumor suppressor protein with transcriptional activity. GO annotations related to this gene included sequence-specific DNA binding transcription factor activity. The PI CF vs. HC network highlighted genes such as those encoding IL17A, IL1A, and CCL5. Uniquely, PS CF vs. HC network hubs included Rnr, ERK, DOT1L, RPSA, and Alp, which are a nuclear receptor, extracellular signal-regulated kinase, histone methyltransferase, ribosomal protein laminin receptor, and secretory leukocyte peptidase inhibitor, respectively. This network included genes encoding IL1B and TLR4. Expression of pro-IL-1b is mediated in part through the activation of TLRs involved in response to lipopolysaccharides.

Ingenuity’s canonical pathway analysis showed that each cohort pair (CF vs. HC, PI CF vs. HC, and PS CF vs. HC) had comparable numbers of unique canonical pathways and common canonical pathways, while the PI CF vs. PS CF comparison yielded far fewer unique canonical pathways and common canonical pathways ( Fig. 6 A ). Common top-ranked canonical pathways upregulated (positive z-score) or downregulated (negative z-score) in the CF vs. HC ( Fig. 6 B ), PI CF vs. HC ( Fig. 6 C ), PS CF vs. HC ( Fig. 6 D ), and PI CF vs. PS CF ( Fig. 6 E ) comparisons included EIF2 signaling and EIF4 regulation, both of which play important roles in identifying the translational start site and recruiting mRNAs.

Other major groups of genes differentially expressed between CF patients and HCs included activators of myeloid cells (such as genes encoding CAMP, CD24, CD40LG, ERK, and FOXXL2), genes that regulate cell migration and the cytoskeleton (such as that encoding ICAM1), host defense genes (such as those encoding TLR2, TLR4, and IL-1RN), and genes encoding cytokine and chemokine signaling molecules (such as CXC chemokine ligand CXCLXX, IL-1B, TNF, and NF-κB). CD83, a host defense protein whose gene was differentially expressed, encodes an Ig supergene family member that is best known as a marker of mature dendritic cells but has also been detected inside monocytes and macrophages and is rapidly expressed on the cell surface upon lipopolysaccharide-induced activation ( 29 ).

Fig. 6. Ingenuity Pathway Analysis reveals the top canonical pathways in CF and HC samples. A : canonical pathway annotations for the indicated comparisons identify unique pathways in each group. Top 10 significant canonical pathways of genes altered in CF ( B ), PI CF ( C ), and PS CF ( D ) vs. HCs, as well as PI CF vs. PS CF ( E ). The upper x -axis and bars indicate the –log ( P value) of overrepresentation, while the orange line indicates the threshold 2.0. The color of the bars indicates the value of the z-score, as indicated in the legends. Only pathways with z-scores greater than 2 or smaller than −2 and P < 0.001 were considered significant. The lower x -axis and orange squares represent ratios calculated from the number of genes detected in the comparison divided by the total number of molecules in the canonical pathway. ARP, actin related proteins; Cdc42, cell division cycle 42; CF, cystic fibrosis; CXCR4, C-X-C chemokine receptor type 4; EIF2, eukaryotic initiation factor 2; iNOS, inducible nitric oxide synthase; FMLP, formylmethionine-leucyl-phenylalanine; GTP, guanosine triphosphate; HC, healthy controls; IL-8, interleukin 8; PI, pancreatic insufficient; PS, pancreatic sufficient; WASP, Wiskott-Aldrich syndrome protein.

The largest group of differentially expressed genes was a collection of upstream regulators ( Fig. 5 ) and canonical pathways ( Fig. 6 ) involved in response to lipopolysaccharides, including transcription-related genes encoding transcription factors, transcriptional activators, chromatin remodeling and modifying proteins, DNA helicases, and other DNA-binding proteins. The upstream regulator that was most significantly increased was interleukin-13 ( IL13 ), which regulates inflammation and inhibits macrophage activity ( Table 3 ). The most significantly decreased regulatory gene was TREM1 , which stimulates neutrophil- and monocyte-mediated inflammation. The second largest group of genes was classified into the category of immune regulation and response (for example, signaling pathway genes).

We utilized Ingenuity Pathway Analysis software for analysis of upstream regulators, canonical pathways, and networks to identify candidate chemokines, cytokines, and signaling intermediaries responsible for the observed plasma-induced gene expression signatures in PBMCs. Upstream regulator analysis of the differentially expressed genes uncovered a consistent pattern of upstream regulator enrichment in CF patients vs. HCs, PI CF patients vs. HCs, and PS CF patients vs. HCs. All three comparisons shared the most enriched molecule types of upstream regulators, including transcriptional regulators, cytokines, transmembrane receptors, enzymes, and kinases ( Fig. 5 , A – C ). A total of 151 functional regulators significantly differed ( P < 0.0015) between CF patients and HCs ( Fig. 5 A ); similar upstream regulators were identified for PI CF patients vs. HCs ( Fig. 5 B ) and for PS CF patients vs. HCs ( Fig. 5 C ). The molecule type “growth factor” dominated the comparison of PI CF patients with PS CF patients ( Fig. 5 D ), but not other comparisons. A list of the top 50 upstream regulators in the CF:HC comparison is provided in Table 3 .

A canonical correlation analysis was performed with data from the 15 cytokines with P < 0.05 and a set of seven probe sets that met the criteria |logFC| ≥ 2 and FDR ≤ 0.05 and were present in the set of 216 genes ( Fig. 3 A ; overlap of t -test, Wilcoxon test, and random forest). The permutation test showed that there was significant correlation between the selected cytokines and genes. MCP-1 was positively correlated with expression of the seven genes, IL-8 was weakly positively correlated, and the rest of the cytokines were negatively correlated.

Fig. 4. Proinflammatory and anti-inflammatory cytokine levels differ in CF and HC plasma samples. The scatter plot shows the distributions of ELISA data for 20 cytokines in CF (both PI and PS subjects) and HC plasma. Each dot represents an individual sample. The horizontal bars represent the range of values for each group. The vertical line indicates the mean value for each group. CF, cystic fibrosis; HC, healthy controls; PI, pancreatic insufficient; PS, pancreatic sufficient. * P < 0.0001.

Plasma cytokine and chemokine levels were measured in a separate set of 56 CF patients (PI and PS) and 16 HCs ( Fig. 1 ). Because CF patients tend to spontaneously secrete cytokines [TNF-α, IL-1, IL-6, and IL-8 ( 12 )] and IGF-1 levels have been shown to be elevated in bronchoalveolar lavage specimens ( 50 ), we measured cytokines in plasma from CF patients and HCs to determine correlations with underlying molecular signatures ( Fig. 4 ). TGF-β, interleukin-1 receptor antagonist (IL-1-RA), GM-CSF, MIP-1β/CCL4, and IL-7 were expressed at significantly higher levels in CF patients than HCs ( P < 0.0001) ( Fig. 4 ). Eotaxin 2 [chemokine (C-C motif) ligand 24; CCL24] and cytokines involved in the regulation of inflammation (such as IL-2, IL4, IL-6, IL-1β, IL-13, IL-15, IL-12/p70, and TNF-α) were also present in plasma at significantly higher levels in CF patients than in HCs ( P < 0.001) ( Fig. 4 ). However, other cytokines secreted by many types of cells, including macrophages, such as MCP-1/CCL2 and IL-8, were expressed at significantly lower levels in CF patients (both PI and PS) than in HCs ( P < 0.001) ( Fig. 4 ). Levels of IL-10, IFN-γ, and MIP-1α/CCL3 did not significantly differ in the plasma of CF patients and HCs ( Fig. 4 ). Of the cytokines that differed significantly between HCs and CF patients, IL-8 and MCP-1/CCL2 showed diminished plasma cytokine levels and expression in the reporter PBMCs ( Figs. 2 C and 4 ). Results for others, such as TGF-β, IL-1β, IL1RA/R1 differed in the plasma and reporter PBMC data.

We sought to determine whether the underlying induced expression signatures were consistent with traditional measures of immune cell subsets. Immunophenotyping of whole blood via flow cytometry was performed to compare immune cells from 40 CF patients and two historical cohorts of HCs ( 45 , 51 ) ( Fig. 1 ), stratified by age, relative to published reference ranges ( 8 , 39 , 45 , 51 ). Specifically, the percentages of total T cells and T helper cells were significantly higher in the CF 2 to 6 yr old age group than in subjects 6 yr of age and older ( Table 2 ). B cell percentages were significantly lower in the 2 to 6 yr old age group than in HCs despite elevated white blood cells and lymphocytes. Counts of cytotoxic T cells were significantly lower in CF 6 to 12 yr olds than in HCs, but those of naïve T cells were significantly higher in 12 to 18 yr olds ( Table 2 ). Counts of activated helper and cytotoxic T cells were significantly lower in CF 2 to 6 yr olds than in HCs ( Table 2 ). Memory T cells remained consistently higher in CF patients than in HCs in all age groups ( Table 2 ). In addition, evaluation of regulatory T cells ( 32 ), defined as CD4+ CD25+ CD127lo FoxP3+ or CD127− in CF patients, yielded a positive correlation with age ( Table 2 ). The number of NK cells also did not differ between CF patients and HCs ( Table 2 ), as reported previously ( 21 ). The capacity of neutrophils to undergo oxidative metabolism remained intact and did not differ between CF patients and HCs. A 4-day in vitro T cell mitogen proliferation assay demonstrated that the functional T cell mitogen responses did not differ between CF patients and HCs.

To determine the replication consistency of our approach, we held back a subset of plasma samples that had already been analyzed and determined Pearson’s correlation coefficient for the entire Affymetrix microarray. Pearson’s coefficient for the whole chip was 0.9854 for five pairs of HC samples (three replicates) plus 11 pairs of CF samples; Pearson’s coefficient was 0.9682 for the entire set of 1,094 differentially expressed probe sets. To check agreement between replicates, we calculated the ratio of the gene expression level for each gene of one replicate of one individual over another replicate of the same individual. The overall mean and IQR was 0.9758 (0.9542–0.9972) for the ratio of the five pairs of HC samples and 11 pairs of CF samples for 1,094 probe sets. In addition, the means (IQRs) of HC samples and CF samples were 0.9511 (0.9049–0.9875) and 0.9870 (0.9706–1.0063), respectively. Integration of the random forest analysis with t -test and Wilcoxon analyses determined that 216 of 1,002 genes had a nonzero standard importance score (overlap) when comparing CF patient data with HC data ( Fig. 3 A ); for PI CF patients vs. HCs, 231 of 1,043 probe sets were identified by this analysis ( Fig. 3 B ). In addition, 168 of 870 probe sets with nonzero standard importance score were identified for PS CF patients vs. HCs ( Fig. 3 C ). When 80% of the data were randomly selected and analyzed, 866 of 1,002 genes were significantly different for each CF vs. HC comparison. Similarly, 890 of 1,043 probe sets differed significantly between PI CF and HC. Only 438 genes were significantly different for PS CF vs. HC, which may be due to the smaller sample size or the milder disease phenotype. Evaluation of PI vs. PS by the same criteria utilized for all the comparisons, t -test FDR ≤ 0.01 and |log 2 ratio| ≥ 0.5, only one gene, GAPT, which negatively regulates B cell proliferation following stimulation through the B cell receptor, was identified as having a significant diminished expression in PI compared with PS ( P = 0.007). The Venn diagrams ( Fig. 3 ) show the overlap of genes selected by three distinct approaches (random forest, t -test, and Mann-Whitney-Wilcoxon) to indicate the level of consistency and robustness in the methodology (see Statistical analysis in materials and methods). Indeed, in a PCA with the group of genes identified by all three methods, the first principal component accounted for 66% of the variation, and the HC and CF sample sets were completely separated. Although it is possible that the results contain false positives, the 216 genes common to the three statistical tests had an FDR < 0.01 in both the Wilcoxon test and the t -test ( Fig. 3 ).

To further characterize the biological significance of the differential gene expression induced by CF vs. HC plasma, we performed unsupervised hierarchical clustering of transcripts related to common clinical variables such as pancreatic function, infection with P. aeruginosa , and lung function. As described above, 1,043 probe sets distinguished between PI CF patients and HCs, and 870 probe sets distinguished between PS CF patients and HCs ( Fig. 2 B ). Fifty-one genes were associated with PS, 224 genes were associated with PI, and 819 genes were associated with both PI and PS ( Fig. 2 B ). The overlapping gene set included many genes in circulating neutrophils that respond to Gram-negative bacteria, as well as a high proportion of transcriptional regulators, protein-conjugating enzymes, chaperones, and immune response proteins. No differentially expressed probe set was associated with measures of pulmonary function that included evaluation of FEV1% predicted, slope of decline, and maximum and minimum values obtained at baseline (not during a pulmonary exacerbation).

To better understand the inflammatory and regulatory status associated with CF and disease subtypes or disease severity, we extended our previous analysis ( 28 ) and compared signatures from CF patients with varying pancreatic function, infection, and lung function with those induced by plasma from HCs. This analysis yielded signatures unique to CF ( Fig. 2 , B and C ). There were 1,094 (total) probe sets (FDR ≤ 0.01, |log 2 ratio| ≥ 0.5) that were significantly differentially expressed between CF and HC cohorts. A total of 1,043 probe sets were identified that distinguish CF patients who were PI from HCs, and 870 probe sets were identified that distinguish CF patients who were pancreatic sufficient (PS) from HCs; 819 genes were present in both comparisons ( Fig. 2 B ). Transcripts downregulated by CF plasma relative to HC plasma included numerous genes related to lipopolysaccharide response and genes associated with immune response and immune regulations, such as those encoding immune receptors (CD36, CCR5, TLR8, TLR4), cytokine receptors (CCL2, IL-1B, IL-6, IL-8, IL-10, TNF), chemokines [macrophage inflammatory protein-1α (MIP-1α/CCL3)], CXCL1, CXCL2, CXCL3, CXCL5, CXCL6, CXCL16), and proteases (CASP1) ( Fig. 2 C ). Serine protease inhibitors (SERPINA1 and B2) were also downregulated. Upregulated genes in PBMCs incubated with CF plasma relative to HC plasma encoded products such as proteins involved in protein ubiquitination (PSME2), transcriptional regulators (BATF, TAF4B, and ERAP2), and cell surface receptors (CD19). These products converge on the same set of cytoplasmic transcription factors, including KLF2, NF-κB, SMAD4, and STAT family members, as determined by Ingenuity Pathway Analysis.

Fig. 2. PCA and hierarchical clustering establish unique plasma-induced signatures associated with CF. Plasma samples from 103 CF patients and 31 HCs were analyzed. A : unsupervised PCA in three dimensions utilized the complete unfiltered data set (54,613 probe sets). PC1, PC2, and PC3 explained 16.5, 5.37, and 3.56% of total variance, respectively. B : one-way hierarchical clustering (probe sets only) of the 1,094 differentially expressed probe sets identified significant probe sets in PI CF patients vs. HCs (PI:HC) and PS CF patients vs. HCs (PS:HC). Pseudomonas aeruginosa is indicated by +/– sign. The Venn diagram ( left ) shows that within the set of 1,094 probe sets, 51 genes were associated with PS, and 224 genes were associated with infection with P. aeruginosa. C, left : combined data for each patient subgroup for representative differentially regulated transcripts are provided. Right : hierarchical clustering (probe sets and individual subjects) of the expression levels of annotated probe set union in individual samples. Expression levels were normalized against the mean expression for both CF and HC cohorts. CF, cystic fibrosis; HC, healthy controls; PCA, principal component analysis; PC, principal component; PI, pancreatic insufficient; PS, pancreatic sufficient; +, positive for P. aeruginosa infection; –, negative for P. aeruginosa infection.

Commercial PBMCs from a single healthy male donor were cultured with autologous plasma, HC plasma, or CF plasma for 9 h ( 28 ), thus acquiring transcriptional signatures associated with the extracellular milieu in these subjects ( 30 , 37 , 42 ). Unsupervised PCA ( Fig. 2 A ) and hierarchical clustering ( Fig. 2 B ) of the unfiltered data set revealed that plasma-induced PBMC gene expression profiles for the CF and HC cohorts were distinct. Three components were extracted from the unfiltered PCA. The first component explained 16.5% of the variance, while the second and third components explained 5.37 and 3.56% of the variance, respectively ( Fig. 2 A ). The first principal component differentiates between the CF and HC groups. In each group, the signal strongly indicates that variation is dominated by the difference between the groups and not age or sex. When we restricted the analysis to probe sets that met our threshold for significance (|log 2 ratio| ≥ 0.5), we identified 1,094 probe sets differentially expressed between CF patients and HCs (1.4-fold; FDR ≤ 0.01) ( Fig. 2 B ). PCA results were not significantly affected by sample storage time, microarray lot number, sample collection date, or time until microarray expiration (data not shown).

The majority of CF patients were PI ( n = 87, 84%) and homozygous for the F508del mutation ( n = 65, 63%). Four CF patients (4%) harbored the F508del +G551D or +R117H-7T mutations in CFTR ( Table 1 ). CF patients were considered to be at their clinical baseline, without clinical evidence of pulmonary exacerbation. Forty-nine CF patients (48%) tested positive for P. aeruginosa within 1 yr of study enrollment ( Table 1 ).

Characteristics of the subjects included in the study are shown in Table 1 . Each of the 134 subjects (103 CF patients and 31 unrelated age-matched HCs) provided a plasma sample for PBMC-based gene expression analysis. Of the 134 subjects, 56 CF patients and 16 HCs provided an additional sample for ELISA-based immunological analysis ( Fig. 1 ). A separate subset of the CF patients ( n = 40) provided a peripheral blood sample for lymphocyte subset analyses via flow cytometry. These additional samples were obtained at the same time as the plasma sample for PBMC-based gene expression analysis. Additional flow cytometry data were obtained from two published cohorts of 270 HCs ( 45 ) and 81 HCs ( 51 ). The median age at sample collection was 14.2 yr (range 2–53 yr) for CF patients and 16.0 yr (range 6–38 yr) for HCs ( Table 1 ).

DISCUSSION

Growing evidence suggests that genetic and molecular differences contribute to the significant variability of the CF phenotype. Genomic technologies, including transcriptomics, offer unprecedented opportunities to understand the contributions of environmental, genetic, and epigenetic factors to CF disease progression. In this study, we used a bioassay in which genes in healthy donor reporter PBMCs are induced by soluble factors in subject plasma. This strategy has previously been applied to study type 1 diabetes and other diseases (6, 24, 55), including CF (28, 44). Our results confirm that plasma samples from CF patients induce a distinct set of gene expression changes, likely caused by altered extracellular milieus within the CF patient cohort due to CFTR mutations. Among the gene expression changes were numerous changes in levels of inflammation- and immune-related genes, consistent with the distinct differences we observed in cytokine and immunophenotyping assessments of peripheral blood of CF patients.

Overall, plasma from CF patients induced a remarkably robust and broad array of alterations in healthy donor PBMC gene expression. We detected significant differences between people with and without CF in the levels of transcripts from multiple gene classes including cytokines, receptors, host defense, and apoptosis-related genes, as well as a large number of genes encoding transcription factors and chromatin remodeling factors. The inclusion of some transcription factors, such as members of the Fos, Jun, and NF-κB families, was predictable, as they have been studied individually in phagocytic cells (1, 34, 54), but the present study indicates a much broader range of transcriptional regulators at play in CF (11, 29, 33, 58). Canonical pathway and upstream regulator analyses implied that the transcriptional signatures in healthy donor PBMCs induced by plasma from patients with CF were consistent with the presence of lipopolysaccharides and the response to lipopolysaccharides. Importantly, clustering our data also revealed patterns of gene expression associated with pancreatic function and infection status, suggesting that our bioassay can be used to define signatures of disease subtypes.

We also found that a well-orchestrated balance of proinflammatory mediators and anti-inflammatory factors may underlie the varied disease course seen among CF patients. Recent reports indicate that CFTR is an important modulator of the inflammatory response (7, 36, 52). The connection between suppressed inflammation response and CFTR activity is supported by the results of our comprehensive genomics-based bioassay. In our functional classification of identified genes, the category of immune system regulators and mediators of gene expression included genes encoding molecules ranging from general RNA polymerase transcription factors to protein chaperones. Downregulation of critical inflammatory response genes was also evident in our data (Fig. 2C), consistent with a previous report of downregulation of genes involved in airway defense and antigen presentation in CF (58). Diminished expression of these genes could paradoxically contribute to the net activated but ineffectual immune response that occurs in CF patients chronically colonized with P. aeruginosa (10, 48). These gene expression patterns have not been reported in other CF investigations that relied on microarray data from CF patient leukocytes. Such differences likely derive from differences in experimental design (PBMC reporter expression vs. CF patient PBMCs), sample number, and sample processing. These differences may also reflect temporal changes in the transcription of early and late response genes. The overlap of genes associated with pancreatic function and infection indicates a possible set of core immunostimulatory response genes impacted by the CFTR genotype.

Equally striking was the prominence of transcription factors and chromatin-regulatory genes in this repertoire. Gene regulation at the level of chromatin structure has been demonstrated for CFTR (44). Alterations in histone acetylation and chromatin structure may play a role in the host’s response to infection, possibly by mediating suppression and fostering an immune tolerance (46). Future studies will evaluate the interplay of these transcriptional factors and chromatin-remodeling elements with immune response and regulation as it relates to the CFTR genotype.

Immunophenotyping of lymphocyte subsets in CF and HC cohorts by flow cytometry displayed increased immune activation in younger CF patients that appeared to normalize as patients aged (Table 2), correlating with the skewing of memory T cell counts in CF patients compared with controls. There was a higher percentage of regulatory T cells in samples from CF patients 6 yr and older vs. historical HCs, although we did not evaluate their functional equivalence or suppressive capacity. Understanding the balance among regulatory immune processes, which are initially robust in CF patients but are later ineffectual, may provide insight into why CF patients become infected and then tolerate colonization by bacterial pathogens. In contrast, persistent immune activation may lead to the immune system in CF patients becoming tolerant of chronic infections due to immune senescence or immune exhaustion after persistent immune activation. The upstream regulators, canonical pathways, and network hubs identified in our analyses, perhaps mediated by overproduction of proinflammatory cytokines, may represent an effective and synergistic response that over time becomes dampened, yielding a dysregulated immune-mediated response that results in chronic infection.

We observed clear differences between HC and CF samples in both the gene expression and cytokine ELISA data, indicating that changes in proinflammatory and anti-inflammatory mediators occur in patients with CF. In our correlation analysis, we identified correlations between cytokines in plasma and expression levels of genes differentially expressed in CF. In particular, MCP-1 and, to a lesser degree, IL-8 were both correlated. MCP-1 is a chemokine that recruits monocytes and basophils.

This study is unique in several aspects. To our knowledge, it is the first study to provide proof-of-principle evidence for a robust new plasma-based genomics signature approach for subtyping disease in CF. If this approach is validated in larger populations, then such profiles may provide a complementary tool for identifying patients with certain risk profiles who may benefit from more focused and individualized treatment. The genes differentially expressed between our cohorts are potential markers of inflammation, disease subtype, or disease severity. Although there are immune-related changes common throughout the CF patient cohort, there is compelling heterogeneity within the group that suggests these changes may be indicative of clinically relevant disease features (Figs. 2 and 4). For several subpopulations of patients with CF, the clinical implications of our findings include the application of personalized medicine to predict clinical course and to develop individualized treatment plans.

A blood-based approach for identifying molecular signatures of CF is especially valuable for two reasons. First, blood samples can be noninvasively obtained from subjects at any age without sedating patients with respiratory compromise, which is particularly attractive for monitoring children with CF. Second, the blood-based signatures we identified in patients with CF may reflect the status of inflammation throughout the lung. In contrast, bronchoalveolar lavage provides information about the status of inflammation in a single segment of the lung. Our approach offers advantages over the common approach of directly profiling patient PBMCs in that immune responses are local events and participant cells may not be highly represented in peripheral circulation; expression differences measured in PBMCs incubated with case vs. control plasma are more robust than expression differences observed using direct gene expression profiling of case vs. control PBMCs (55).

This study had several limitations. First, although the overall sample size was large, the smallest subgroup, PS CF patients, had only 16 samples. Similarly, few CF patients were available in the 2 to 6 yr old age group for immunophenotyping. Though the small numbers in these categories (Table 2) were sufficient for generating hypotheses in this study, future validation studies will be needed to ensure that these patient subgroups are sufficiently represented to enable more rigorous statistical testing with corrections for multiple comparisons. Second, the nature of this study was to document the gene expression changes observed in PBMCs after culturing with plasma and identify potential inflammatory mediators, but this study was not designed to address the specific contents of the plasma that may be mediating any of these changes. Third, this study did not include samples from other lung diseases to control for inflammation not specific to CF. However, we are encouraged by data from Levy et al. (28) in which we compared samples from CF patients and patients with Type I diabetes or bacterial and viral infections and found a distinct CF signature. Fourth, the data in the reporter cell assay were generated from frozen PBMCs from only one male donor; this design was selected to reduce the variability observed in prior studies (28, 55). Commercial providers now offer highly viable, cryopreserved PBMCs wherein a single draw collected from a healthy well-characterized donor by aphaeresis contains billions of cells, sufficient for thousands of assays. We initially tested PBMCs from five donors. All of our subsequent human studies have used UPN727 PBMC (Cellular Technology), as they closely mimicked the mean response of the fresh PBMC of multiple donors previously used (28, 55). Finally, although healthy controls were enrolled and their samples were analyzed in the Affymetrix array, a separate set of historical controls was analyzed by ELISA and flow cytometry.

Future studies to follow up on these results could take two directions. First, studies should be designed to more completely describe the molecular changes in PBMCs exposed to CF plasma. Analyses could include exon arrays, miRNA monitoring, and RNA sequencing to evaluate low-abundance transcripts and biologically critical isoforms. The experiments could also be conducted in CFTR mutation carriers. Second, candidate biomarkers should be identified and validated in independent studies. While we believe that our study begins to define molecular signatures in CF patient subgroups, we do not feel that our data support the identification of individual biomarkers at this time. The next step would be to further explore these signatures in more extensive sample sets, including incorporation of medication use, prospective longitudinal cohorts with larger sample sets, and more extensive representation of disease subtypes. The transcriptomic data we report provide unique evidence of differential gene expression in biologically informative pathways, congruent with underlying concepts that immune responses are relevant to CF lung disease severity and likely dependent on the infecting organism, time of infection, and host response. However we did not evaluate the effect of antibiotic use. Experiments could also be conducted in CFTR mutation carriers. These additional data would empower us to narrow the signatures to the most informative biomarker(s) and identify the contents of the CF plasma that induce these changes.

In conclusion, we utilized a novel blood-based bioassay to characterize the gene transcriptional profile of CF patients and identified distinct molecular signatures associated with CF. Our findings lay the foundation for the development of a noninvasive bioassay that can be used in clinical settings to comprehensively identify molecular subtypes of CF with potential applications in determining therapeutic response. Similar patient-based model systems may better capture the disease complexity of CF compared with current systems and may have applications in other chronic lung diseases such as asthma and chronic obstructive pulmonary disease.