Analysing patient groups with EVD

To identify changes in the host response following exposure to EBOV, the transcriptome of blood samples from infected patients was analysed. These samples were collected by the European Mobile Laboratory in Guinea during 2014 and 2015. The samples were taken with the foremost aim of diagnosing the presence of EBOV using qRT-PCR, which was then used in patient management in the Ebola Treatment Centre (ETC). For this purpose, RNA was extracted in the setting of the European Mobile Laboratory in Guinea. Discarded samples were then held in an archive and used in this study under the auspices of the EVIDENT project– Ebola Virus Disease correlates of protection, determinants of outcome and clinical management. These discarded samples were then analysed by RNA-sequencing (RNA-seq) to identify and quantify messenger RNA (mRNA) expression. Following sequencing of 138 individual samples from individual patients, sample selection criteria were used to identify and remove datasets from samples that showed evidence of having degraded mRNAs (expected from field sample collection) (Table 1). These criteria included the removal of samples that showed poor mapping and low correlation values (Fig. 1 and ‘Methods’). Here, samples were selected that had a similar within-sample transcriptional profile (with an average correlation co-efficient above 0.8), thus reducing extreme variation within one patient group due to mRNA quality issues. From an initial set of 138 individual sequenced samples from separate patients, application of these selection criteria led us to discard 26 and analyse the data from 112 unique patients: acute-survivor (n = 24) and acute-fatal (n = 88). The outcome for these patients was unknown at the time of sample collection and subsequently recorded as they either succumbed to a fatal infection or survived EVD. These were from a mixture of Plasmodium (the causative agent of malaria) negative and positive patients. Importantly, there was no significant difference in the time between symptom onset (as reported by the patient) and taking of the sample during acute illness. The mean time to the onset of symptoms for the acute survivors was 6.4 days and for acute fatalities was 5.9 days with a range of 2–19 and 2–22 days, respectively.

Table 1 Details of discarded diagnostic samples from patients used in the study. The numbers in parentheses indicate the number of patients in each group Full size table

Fig. 1 The sample selection criteria based on correlation value to within group expression. A mean correlation within the acute fatal (a) and acute survivors (b) was used to determine samples with evidence of unreliable sequencing. A cut-off value of 0.8 was used (dashed line) and samples that fell below this within-group mean correlation were removed from analysis. This led to selection of 88 acute fatal samples and 24 acute survivors Full size image

To provide a baseline for gene expression, a blood sample was taken from a separate group of 16 survivors that were EBOV-negative by PCR and convalescent for the disease and recovered from infection (recovered control group) (Table 1). This control group represented a critical comparison population. During the 2013-2016 outbreak, amid the breakdown of the in-country healthcare system and the stigma of being seen associated with ETCs, acquiring samples from non-infected patients was not possible. The EBOV survivors represented a known EBOV-negative population that were also tested and confirmed to be Plasmodium-negative with no other overt signs of a febrile illness (which could have impacted the results in a control group). We also made use of historical datasets obtained from the RNA-seq analysis of peripheral blood taken from healthy volunteers based in British Columbia (Canada) (n = 6) and thus would not have been exposed to EBOV or a range of other pathogens present in West Africa [18] (GEO Number GSE53655) (Table 1).

Analysis of gene responses of patients that differentiate fate after EBOV infection

We compared the transcriptomes from these different groups to identify mRNAs in survivors and in fatal cases that showed greater than twofold changes (false discovery rate (FDR) 5%) in abundance compared to the recovered control group. In the acute-survivor group, almost 1300 genes were increased in transcript abundance compared to the survivor control group. The corresponding value for the acute-fatal group was 2200 of which half were redundant with those in the acute-survivor group and approximately 1200 were unique to the fatal cases (Fig. 2a and Additional file 1). Additional file 1 shows any gene transcript with an absolute log2 fold change greater than 2 with a FDR of 5%. Most of the gene transcripts were common to both the acute-fatal versus acute-survivor group. Additionally, some gene transcripts that had abundance differences were restricted to the acute-survivor group. Thus, in the blood from individuals with acute EBOV infection, mRNAs of various pro-inflammatory factors such as CXCL10, CCL2/MCP-1, CCL8/MCP2 and CXCL11 showed increased abundance when acute-fatal cases were compared to acute-survivors (Additional file 2). Similar changes in mRNA abundance were observed in a publicly available dataset from non-human primates (NHPs) that survived the challenge with EBOV (Additional file 2).

Fig. 2 Host transcriptional responses in acute EBOV infection. a Venn diagram representing genes that are differentially expressed from control to fatal (blue) and control to survivor (red). Shared genes are shown in magenta. b Histogram illustrating genes increased in abundance in different functional groups. Genes significantly increased in abundance in fatal only (red), survival only (green) or in both (blue) are shown. c Heatmap of pathway upregulation intensity calculated by Ingenuity Pathway Analysis (IPA) for acute-fatal to convalescent control (F/C), acute-survivor to convalescent control (S/C) and acute-fatal to acute-survivor (F/S). Stars within boxes represent calculated p value significance of increased abundance (* < 0.05, ** < 10–3, *** < 10–6). d Spider plot of the z scores from the IPA. Increased abundance in fatal to control is in red, survival to control in green Full size image

Analysis of gene pathways differentiating fate after EBOV infection

Gene set enrichment analysis (GSEA) showed that in acute patients both surviving and having a fatal infection were associated with a significant enrichment of genes from within the same signalling pathways (Fig. 2b). The most significantly represented included gene sets associated with interferon signalling, complement, coagulation, hormone receptor and acute phase signalling. Many interferon-stimulated genes (ISGs) were strongly increased in abundance in all acute infection cases compared to the recovered control group. There was greater increased in abundance of these gene transcripts in acute-fatal compared to acute-survivors.

In addition to GSEA, we also analysed the patterns of differentially expressed genes using Ingenuity Pathway Analysis (IPA) (Qiagen Bioinformatics) to identify signalling pathways associated with infection. Pathways showing significant upregulation in acute infection are shown in Fig. 2c. This analysis identified many of the same pathways identified in GSEA such as complement, acute phase signalling and coagulation factors, but also identified signalling pathways not identified by GSEA, such as IL-6 and IL-8, indicating strong activation of these cytokines in EVD. IL-6 signalling has previously been shown to be associated with fatal EBOV infections in some studies [8, 19] but not others [11]. Our data support the conclusion that IL-6 signalling was upregulated in both the acute-survivor and acute-fatal cases. Spider plot analysis (Fig. 2d) showed that in almost all gene expression categories, acute patients that succumbed to infection showed a more robust immune response.

There were also distinctive differences in gene expression patterns when the transcriptomes from acute-fatal cases were compared to those from acute-survivors. A total of 246 transcripts were identified as being differentially abundant (log2FC > 1, FDA 5%), of which 220 transcripts or more increased (higher in fatal cases) and 26 decreased (Additional file 3). This differential abundance was most strongly observed in genes associated with coagulation and acute phase signalling. Figure 3 shows the five most strongly differentially expressed genes, of which four were associated with the clotting cascade genes, including fibrinogen (FG) alpha chain (FGA, +54-fold), beta chain (FGB, +28-fold) and gamma chain (FGG, +19-fold). The increased abundance of these mRNAs is again consistent with deposited transcriptomic data from EBOV-infected NHP studies [20] (Additional file 4).

Fig. 3 Coagulation associated mRNAs accumulate in the blood of EBOV-infected patients. Box and whisker plots illustrate the expression of the top acute phase genes that are differentially expressed between controls (blue), fatal (red), and survivors (green) based on log2 fold change. From left to right, the graphs illustrate mRNA expression for albumin (ALB), fibrinogen alpha (FGA), fibrinogen beta (FGB), fibrinogen gamma (FGG) and fibrinogen gamma-like 1 (FGL1). Black brackets indicate significance between control and survivor or fatal levels and grey brackets indicate significance level between survivors and those who will succumb to disease. All brackets represent a log2(fold change) > 2 and FDR < 5 % Full size image

The transcriptome profile of peripheral blood from the recovered control group may have been influenced by any remaining EBOV antigens and/or other infections (although the patients were Plasmodium-negative at the time the peripheral blood sample was taken to confirm convalescence). This may therefore have skewed any transcriptome comparison to the acute patients. To test this hypothesis, we made use of a historical dataset deposited on GEO (GSE53655) that was an analysis of peripheral blood from healthy volunteers [18] (n = 6) (Table 1) and compared this to the transcriptome profile from the recovered control group. Given the location of the study for the health volunteers was the University of British Columbia in Canada [18], we assumed that these individuals had not been exposed to EBOV, Plasmodium or other pathogens prevalent in Guinea. Differential gene expression (DGE) and GSEA indicated no significant difference in the transcriptome of peripheral blood between the healthy volunteers and the recovered control group (Fig. 4).

Fig. 4 Comparison of the convalescent survivors to healthy controls. The healthy controls comprised a group of healthy volunteers in British Colombia and were found in GEO under GSE53655. a The top fold change differentially expressed genes when comparing convalescence survivors (blue) to acute infections and when comparing acute survivor (green) to acute fatal (red). Healthy controls are shown (purple). The healthy controls show no significant expression of these genes similar to the convalescent survivors. b The comparison of the two control groups as a PCA plot of their overall expression values after normalisation using edgeR. The healthy controls are indicated in blue and the convalescent survivors in red. c The results of the gene set enrichment analysis comparing the convalescent survivors to the healthy controls. Genes significantly upregulated are in red and downregulated in blue. Through a Fisher’s exact test, no group of genes had a significant enrichment when comparing the two groups (p cutoff of 0.05) Full size image

Changing immune cell types in the acute phase are related to patient outcome

Changes in mRNA abundance in the blood sample between the acute-survivor and acute-fatal patient groups may have been due to changing gene regulation in the cell types present or changing proportions of specific immune cell types. We used digital cell quantification (DCQ) [21] to identify which cell types may have been differentially abundant in acute-survivors and acute-fatal relative to the recovered control group. In both the acute-survivor and acute-fatal groups, DCQ predicted a decrease in CD4 T cells [22] and a significant increase in CD8 memory T cell signature in acute-survivors that is consistent with flow cytometry analysis from patient samples [9] (Fig. 5a and Additional file 5). This analysis also demonstrated a potential decrease in the population of circulating monocytes during acute infection with EBOV (Fig. 5b) with a stronger potential decrease in the acute-fatal compared to the acute-survivor group (Additional file 5).

Fig. 5 Differentially abundant cell types present in human blood samples. a Relative abundance of specific T cell types with acute-fatal on the left of the heatmap and acute-survivor on the right as predicted by DCQ compared to convalescent survivors. Within the heatmap, darker blue represents a decrease in the abundance of a given cell type and darker red represents an increase in the abundance of a given cell type. The colour bar on the left is showing if the given cell type is significantly differentially abundant (greater than or less than 0 with a p value < 0.05) in acute-survivors only (green), acute-fatal only (red) or both (blue). b Similar heatmap for dendritic cell types and (c) is for natural killer cell types. Loss of circulating monocytes characterises fatal EVD. d Box and whisker plots depicting frequencies of peripheral blood monocytes in fatal EVD patients (black), EVD survivors (blue), malaria patients (brown) and other febrile patients (green) as shown by FACS. Horizontal bars represent median values and the edge of the boxes represent 10–90 percentiles. Statistical analysis was performed via non-parametric Kruskal–Wallis test followed by Dunn’s post-test. ns non-significant, *p ≤ 0.05. e Correlation analysis indicating positive correlation between frequency of CD14+ monocytes and the Ct values. Non-parametric Spearman’s rank correlation analysis was performed. The line indicates a linear regression Full size image

To investigate the DCQ-based predictions, flow cytometry analysis was used to compare blood samples taken from an independent and geographically separate group of patients (in Guinea) classified as acute-fatal (n = 20), acute-survivor (n = 17), EBOV-negative but with Plasmodium (n = 5) and other febrile illness (but EBOV-negative) (n = 10) (Table 1). The flow cytometry analysis indicated that in the patients with EVD, who went on to have a fatal outcome, they displayed low frequencies of circulating CD14+ classic monocytes in peripheral blood (Fig. 5d and Additional file 6) compared not only with surviving patients, but also with patients with other acute infectious diseases including malaria (Fig. 4d), consistent with the predictions from the DCQ analysis. In addition, the frequency of blood monocytes in patients positively correlated with viral load, which indicated a statistical association between the loss of circulating monocytes and high viremia (Fig. 5e). Natural killer (NK) cells were predicted to accumulate in acute-survivors when compared to the acute-fatal group (Fig. 5c). Indeed, we predicted that NK cell types were more abundant in acute-survivors than in the acute-fatal group (Fig. 5c), which was consistent with an increase in mRNA abundance in the acute-survivor group of the NK markers interferon gamma (+4 fold over convalescent controls) and perforin (+3 fold over convalescent controls).

Predictive models based on differentiated gene responses between patient groups

Differential gene expression analysis might provide suitable biomarkers for predicting the eventual outcome of infection in individual patients and thereby better direct their treatment. We assessed if a subset of differential expression (DE) genes could be correlated with outcome using three independent machine learning-based methods. The first method used support vector machine (SVM) analysis. Predictive models based on the top ten genes ranked by p value (79% accuracy) performed better than models based on the level of EBOV RNA in a qRT-PCR [14]. However, genes identified through this approach were not linked to known components of the immune response and both models showed wide dispersions for each gene, which contributed to overlaps between the acute-survival and acute-fatal groups (Additional file 7). Including samples from the recovered control group in the analysis improved predictive performance slightly and included some of the invoked genes that showed DE responses to EBOV exposure.

To investigate the predictive power of a host-response-based gene set, a substitution method on groups of ten randomly selected genes was performed to define a better SVM classifier (Figure 6c and Additional file 8). The resulting group of ten genes improved the accuracy of outcome predictions from 79% to 85% (97 of 112 case outcomes predicted correctly). Figure 6a illustrates the resulting separation of acute-fatal cases from instances of acute-survival, while Fig. 6b shows a receiver operating characteristic (ROC) curve compared to Ct. The optimistic bias for the area under the curve (AUC) prediction for this model for the training and testing data was 0.11.

Fig. 6 Identification and testing of a small set of host mRNAs whose expression predicts survival during acute EBOV infection. a Mean correlation plots where each sample is represented. Survivors in green, fatal in red. Line indicates prediction inflection point. Individuals above the line would be predicted to be survivors using these mRNAs, while individuals below would be predicted to succumb to the disease. b ROC comparing prediction of survival using EBOV PCR Ct value (green) and host mRNA expression classifier (blue). The error bars represent SD of the average true positive rate. c Box and whisker plots illustrating expression changes in log2(CPM) for the ten genes in the Host Classifier in control (blue), fatal (red) and survivors (green) Full size image

Two other methods were also used to identify gene sets that correlated with outcome. The second method used random forest (RF) and this improved accuracy to 89% (Additional file 8 and 9), again with ten genes as classifiers of outcome. The third method was an entirely different approach and used an intensive optimisation protocol to establish a panel of gene pairs (Additional file 8) for which the consensus expression profile across the panel for acute-survivors were maximally differentiated from the acute-fatal group (‘Methods’, Additional files 10 and 11). We termed this the Paired Gene Profiling method. Classification of a query patient was achieved by correlation of his/her expression profile across the panel to one or other of the alternative classifier profiles. This technique achieved 92% accuracy, using a select group of ten genes that were included within the panel of classifier genes for the improved SVM model (Additional file 8). Thus, the ability to predict outcome was independent of the method used to achieve it.

To test whether these predictions of patient fate were not overly optimistic, we adopted a ‘bootstrapping’ technique. The random resampling of subsets of the overall patient data provides a large number (i.e. 1000) of independent estimates of the predicted status of each patient. The consistency of the outcomes can then be explored from the resulting histogram (Additional file 9), which if normally distributed, as indicated by the linearity of the corresponding QQplot (Additional file 9), is described by a mean ± 95% confidence interval (CI). We show for each of the different predictive methods employed that the distributions were well described by a normal distribution, and that the 95% CIs for predictive accuracy for the RF test were 0.758–0.876, while those for the paired gene profiling (PGP) test were 0.793–0.953. These all lay within ± 8% of the corresponding mean values. Given the absence of data for independent validation, this outcome offers good support to a highly consistent predictive accuracy across the different models.

Effects of viral load on predictive outcome and testing on a separate group of patients with EVD

The DGE analysis and gene classifiers were generated from patients that had a diverse range of viral load. We wanted to examine whether both were independent of viral load and instead just reflected the host response to EBOV infection. To investigate this, we determined the transcriptome of blood samples taken from an independent group of patients from that used in the initial DGE analysis and for gene classifiers. We focused specifically on samples from new patients whose viral loads were not significantly different (Ct = 20–22) between a fatal (n = 10) or non-fatal outcome (n = 10) (Additional file 12). This is a crucial range for viral load as between these Ct values the outcome of EVD cannot be predicted as the case fatality rate in this group of patients was approximately 50% from the outbreak in Guinea. From a disease biological perspective, this allowed us to investigate whether the observed differentially expressed genes between a fatal and survivor outcome were potentially related to viral load. For example, a lower viral load promoted survival or whether DGE was potentially host-moderated. All of these selected patients were antigenically negative for Plasmodium and thus the outcome and the host response were not complicated by malaria. DGE analysis between these two groups of patients identified the enrichment of very similar transcript abundance to that described in the original DGE analysis on the acute-fatal and acute-survivor patients groups (Additional file 13), suggesting that the host response (at least at the transcript level in the blood) was mainly independent of viral load. There was also good correlation between the top five differentially abundant gene transcripts expressed in the n = 20 study and the n = 112 patient study (Additional file 14). When the ten gene SVM classifier was compared to the Ct value in this group, it greatly outperformed the Ct value in predicting survival as seen in the ROC and partial (pROC) in Fig. 7. The ten gene RF classifier was also able to predict survival in this group of patients where the Ct value failed. This shows that the ability of the biomarker to predict outcome was independent of Ct value and is useful predicting outcome in patients where the Ct value is uninformative.