Enteric infection leads to extensive changes in transcript isoform ratios

We used RNA-sequencing data generated from the whole guts of 38 DGRP lines that were infected with Pseudomonas entomophila (P.e.). Among these 38 lines, respectively 20 and 18 lines are susceptible and resistant to oral infection with P.e [13]. In addition, we sequenced the guts of control flies, which were fed sucrose, for a total of 76 samples (Additional file 2). Since the lines are highly polymorphic, we opted to use individualized genomes and gene annotations for our analyses using available single nucleotide polymorphism (SNP), indel, and structural variation data for each line [14] (see the “Methods” section). Given the focus of this study on gaining insights into changes in isoform composition of each gene after infection, we used a multivariate distance-based approach described in [15]. Briefly, we estimated the isoform ratios, that is, the relative ratio of alternative isoforms of each gene, using MISO [16]. We then identified genes showing significant infection-induced differences in isoform ratios [17]. Of the 1877 genes that passed filtering (see the “Methods” section), 40% were significantly changed after infection (Fig. 1a, p value of homogeneity > 0.05, BH-corrected p value < 0.05, effect size > 0.2, Additional file 3). Interestingly, only 25% of the differentially spliced genes were among the 2471 genes that were differentially expressed after infection, suggesting that gene-level differential expression-type analyses may overlook important molecular aspects of the gut transcriptional response to enteric infection (Additional file 3). Gene ontology analysis revealed that genes associated with mRNA splicing, organelle organization and biogenesis, as well as tissue development are enriched within the set of differentially spliced genes (Fig. 1b). Surprisingly however, this set was not enriched for immunity terms. This may reflect different regulatory properties of genes involved in the immediate innate immune response (i.e., in the resistance mechanisms [20]), many of which are significantly induced after infection, versus those involved in homeostasis (i.e., the tolerance mechanisms [20]), which might be required to function in the normal and infected state. When comparing resistant and susceptible lines within each condition, we were not able to find differentially spliced genes, although some genes showed modest trends (Additional file 1: Figure S1a).

Fig. 1 Enteric infection leads to extensive changes in transcript isoform ratios and to greater isoform diversity. a Top: schematic illustrating how genes with different isoform ratios are compared between two conditions. MISO [16] was used to calculate the ratios of different annotated isoforms, and thereafter, the rasp package [15] was used to determine significance (p-homogeneity > 0.05, BH adjusted p value < 0.05, effect size > 0.1). Bottom: Venn diagram of the number of expressed genes whose isoform ratios were significantly altered after infection. b Graphical representation of enriched biological process gene ontology terms based on the list of genes whose isoform ratios were altered after infection. The GO analysis was performed using the GOstats R package [18] (Hypergeometric test p value < 0.005), and REVIGO [19] was used to reduce redundancy in the ontology groups and plot them by semantic similarity (allowed similarity = 0.7). The size of each circle indicates the number of genes belonging to a certain GO category, and the color indicates enrichment significance. c The distribution of Shannon entropies of transcript ratios of each gene per DGRP gut transcriptome. Uninfected (control) and P.e.-infected samples are depicted in gray and brown, respectively. The densities were obtained using R’s base density function. d Breakdown of average Shannon entropy per sample by isoform number, susceptibility class, and treatment condition Full size image

The transcriptional response is characterized by higher isoform diversity

We next examined the effect of infection on the diversity of the transcriptome by calculating the gene-based Shannon entropy for each sample. This is a measure of the evenness of the proportions of a gene’s isoforms. We found that infection leads to a small but consistent increase in diversity in the infected state (p value for treatment effect on average Shannon diversity = 3.7e−05, Fig. 1c, Additional file 1: Figure S1b-c, Additional file 4). The density plot of Shannon entropies revealed that after infection, there is a bias towards increase in the number of genes with higher diversity, and consequently fewer genes with lower diversity, where across the different DGRP strains, there is an average of 20, and a maximum of 330, more genes that increase in diversity after infection (Fig. 1c). This net increase was consistent in 37 different strains irrespective of their resistance class (Additional file 1: Figure S1b), suggesting that this is not a stochastic phenomenon. Interestingly, a breakdown by isoform number revealed that for genes with 2, 3, or 4 isoforms, resistant lines show a tendency for greater average diversity than susceptible lines (Fig. 1d, Additional file 1: Figure S1c). With the exception of genes with four isoforms in the infected state (linear model p value for resistance class = 0.0192), this tendency is not statistically significant. These observations suggest that upon infection, the transcriptional output of many genes is less dominated by a single or few isoforms. This phenomenon is more marked in lines that are resistant to P.e. infection, which may point to a link between increased isoform diversity and greater infection resistance.

The effect of natural variation on splicing increases after infection

We have thus far established that enteric infection leads to a change in isoform abundance of a large set of genes, thereby increasing overall isoform diversity. We next sought to establish whether genetic variation affects isoform ratios. To this end, we identified local splicing quantitative trait loci (local-sQTLs) in the two infection states using sQTLseekeR [21]. We restricted our analysis to SNPs within a 10-kb window around each gene (see the “Methods” section), hence our annotation of “local-sQTLs”. We identified 359 and 646 control- and infection-specific local-sQTLs, and 282 local-sQTLs that are common to both conditions (Fig. 2a, Additional file 5). Interestingly, there were around 80% more local-sQTLs in the infected state, affecting more than twice as many genes as in the control state (96 vs. 39 genes) although a similar number of genes were tested in the two conditions (1238 vs 1248 for controls and infected, respectively). In addition, a greater percentage of genes with a local-sQTL in the infected state showed significant differences in isoform ratios upon infection (Fig. 2a). These results demonstrate that inter-strain differences in isoform ratios can be attributed to alterations in the genomic DNA sequence and that enteric infection unmasks a substantial amount of otherwise silent genetic variants that affect splicing.

Fig. 2 The effect of natural variation on splicing is enhanced by infection. a Venn diagram showing the result of the local-sQTL analysis (and number of associated genes) using sQTLseekeR [21] (BH adjusted p value < 0.05, maximum difference in ratio > 0.1). The barplot shows the number of genes with a local-sQTL as well as the overlap with the set of genes with significantly different isoform ratios after infection. b GO enrichment of the genes with local-sQTLs. The analysis is similar to that in Fig. 1, but the three groups in a were tested separately, then the GO categories were pooled in REVIGO. The color of each circle’s outline indicates the gene subset that is enriched with a specific term. c Metaplot of the pooled local-sQTL results with respect to normalized gene length, and d intron length. Orange bars represent the density of local-sQTLs, while gray bars represent the density of a random sample of variants that matches the sQTL allele frequencies and is within 10 kb of genes. e ESE and ISE locations were predicted along all gene bodies using pattern matching to the reference genome after which the percentage of local-sQTLs that overlapped a predicted element was computed and plotted in red. A null distribution of the percentage overlap was produced by randomly sampling variants within gene bodies with a similar allele frequency distribution as the local-sQTLs. This was repeated 100 times and the percentage, as well as the mean (blue solid line) and standard deviations (dashed lines) were computed. A solid line shows the maximum overlap obtained through random permutations Full size image

To gain insights into which biological processes are enriched in the genes that have local-sQTLs, we performed separate gene ontology enrichment of the three sets of genes: control, infected, and shared local-sQTLs genes. Figure 2b shows a combined graphical representation of the three GO enrichment results. In the control state, we observed an enrichment of GO terms related to cellular and nitrogen compound metabolic processes. In the infected state, other categories emerged, namely cellular response to stress, cell cycle, and aging. As in the enrichment for infection-induced splicing changes, we did not find any enrichment for immune-related processes, but mostly homeostatic mechanisms. This could either mean that splicing is not a major regulator of canonical immunity pathways or that there is strong selective pressure against genetic variation that affects splicing in immunity-related genes.

Next, we examined the location of the detected local-sQTLs in relation to their respective genes. We used a gene-centric and intron-centric approach to obtain metaplots. Since natural variation density along genes is not uniform, and tends to be higher towards the 5′ ends [14], we generated a null distribution by considering sets of randomly selected variants that are located within 10 kb around genes and that have a matching allele frequency spectrum to the local-sQTLs. We found that both the null and the observed local-sQTL distributions show a peak around the TSS of genes (Fig. 2c, Additional file 1: Figure S2a). However, while the null distribution had a single symmetrical peak with wide tails, the local-sQTL density one had a higher density at the main 5′ end, as well as an elevated plateau along the metagene body. This density distribution could be the reflection of multiple possible effects of variants on isoform ratios. One such effect is at the DNA level, where alternative TSS selection could be affected by variants around the 5′ end. Other effects can be through directly modulating splicing all along the transcript. A third type can be modulating transcript stability, which can also be located anywhere on the gene body.

To gain further insights into how local-sQTLs could be mediating differences in splicing, we also calculated the local-sQTL density distribution around introns as well as a respective null distribution. Interestingly, we observed a pattern that is very distinct from the null distribution. While the latter showed a wide peak that is centered around the 5′ end of introns, the local-sQTL distribution exhibited a sharp peak at the 5′ end, with a much greater density of sQTLs immediately upstream compared to downstream of the intron (Fig. 2d, Additional file 1: Figure S2a). In addition, the number of sQTLs dropped sharply at the boundaries of introns. As may have been expected, these data support the notion that genetic variants that affect splicing largely act by inducing differences in the processes that are required for splicing, predominantly around the 5′ splice site. One such local-sQTL example is in the gene Psi, which has a local-sQTL at a splice site (Additional file 1: Figure S2b-d). Lines with different alleles at this locus showed markedly different splicing patterns, with a clear shift in the major isoform produced in both conditions. However, not all local-sQTLs could be assigned such a direct mechanism of action, as some might have more subtle effects, for example by affecting exonic and intronic splicing enhancers (ESEs and ISEs) that affect the recruitment of RNA binding factors. To assess this possibility, we asked whether it is more likely that a local-sQTL overlaps with an ESE or ISE. Since these splicing enhancer sequences are short hexamers, predicting them along the genome produces many false positives. Nevertheless, we considered a set of 330 published enhancers [22] and looked for matches along all the gene bodies (Additional file 5). We then counted the overlap between the local-sQTLs and 100 random sets of variants with a matching allele frequency spectrum. Interestingly, 70% of the local-sQTLs overlapped a predicted enhancer, which is 10% higher and 6.1 standard deviations away from the mean of random samples (Fig. 2e). This enrichment indicates that some of the local-sQTLs that lie within ESEs and ISEs could be mediating isoform ratios by affecting splicing enhancer function. Taken together, our local-sQTL data shows that we can detect effects of natural variation on splicing, even more in the infected state, and suggests that these effects are due to direct changes in splice sites, as well as other mechanisms predominantly at or around the splice donor site. These results also again indicate that splicing changes in the infected state are regulated processes and not merely a result of stochastic perturbations.

Post-infection transcripts tend to be longer, mainly because of longer 5′UTRs

We next sought to characterize the effect of the splicing changes on the length of the produced transcripts. To do so, we estimated an effective length measure for each gene. Briefly, for each gene in each sample, we estimated the effective gene length as the weighted mean of its individual transcripts (taking indels of individual lines into account) by the isoform ratios (Additional file 6). Similarly, we extended this method to specific regions within the transcript, namely the 5′UTR, 3′UTR, and the coding sequence. We then compared the effective length before and after infection to determine the number of genes with an increased, decreased, or unchanged effective length (Fig. 3a). We generated a null distribution of effective length differences by performing 100 permutations of the data, by randomly assigning infection status to the samples, and compared this to our observed set using G-tests. The effect of indels on the coefficient of variation in feature length—that is when we calculate the effect that indels have on the sequence length in the DGRPs without taking expression levels into account—was most prominent in 3′UTRs. However, when we factor in isoform ratios, and calculate the variation in effective lengths, 5′UTRs showed the highest variation (Additional file 1: Figure S3a, Additional file 6). 3′UTR lengths deviated the most from the null distribution, and their infection-induced differences were lower than expected. However, the proportion of those that increased in effective length was close to those that decreased in response to infection (23.2% vs. 24.1 respectively, Fig. 3b, Additional file 1: Figure S3b-c). Furthermore, by classifying genes based on how 3′UTRs may affect their effective length, we found no difference in the contribution of polyadenylation site usage and splicing (Additional file 1: Figure S3d). In contrast to the 3′UTR, we found that around 7% more genes increase rather than decrease in transcript and 5′UTR effective length (paired t test p values = 1.9e−05 and 1.2e−06 respectively). Predicted polypeptide length, however, did not show differences from the null distribution nor any skew. Importantly, the distribution of this shift in effective length was consistent across the DGRP lines, with transcripts and 5′UTRs having an excess of increased effective lengths, thus supporting that this is a reproducible and genotype-independent phenomenon (Additional file 1: Figure S3b-c). To show which feature contributes to the effective length change the most, we performed a similar analysis, this time calculating the infection-induced change in transcript effective length after the removal of a specific feature. We found that the removal of 5′UTR length and not the predicted polypeptide nor 3′UTR abolished this skew in the proportions (Fig. 3c). Together, these results suggest that infection-induced differences in isoform ratios preferentially affect 5′UTRs and favor the production of isoforms with longer 5′UTRs across genotypes.

Fig. 3 Post-infection transcripts tend to be longer, mainly due to the generation of longer 5′ UTRs. a The line-specific effective length of each gene’s transcript, CDS, 5′UTR, and 3′UTR length was obtained by calculating the weighted sum of each gene’s isoform features by its isoform ratios. The difference in effective length between the P.e. infected state and the uninfected (control) state was then calculated for each line. b The percentage of features that increased, decreased, or did not change in average length (across samples) after infection. Error bars are the standard deviation. A null distribution was generated by performing 100 permutations by randomly shuffling the samples. The gray bars indicate the average obtained by permutations. Repeated G-tests were used to compare the feature length change in each line to the null distribution. The boxplots show the –log 10 (p values) of the tests, with the dotted red line representing a Bonferroni-corrected p value threshold. c Similar to previous panel, but this time the effective length of each transcript without either the predicted polypeptide, 3′UTR, or 5′UTR was calculated Full size image

Intron retention increases following infection and its prevalence scales with the degree of pathogenicity

The increase in effective gene length prompted us to investigate splicing at the intron level. Using an available annotation that is specific to intron retention events from the MISO annotation website, we estimated the percent spliced-in (PSI or Ψ) value for each of the 32,895 introns using MISO [16] (Fig. 4a, Additional file 7). This annotation was established based on RNA sequencing of 30 whole animal samples from 27 distinct development stages as part of the modENCODE project [23]. The reliance on two sources of annotations, a gene-centric one with full transcript isoforms from Ensembl and intron-centric one, renders the task of mapping the effect of changes in individual events on whole isoform abundance non-trivial, especially when using short-read sequencing. A limitation that we therefore acknowledge is that not all intron retention events can be directly mapped to an annotated gene. However, despite this limitation, we hypothesized that if a systematic and consistent increase in intron retention based on intron-centric annotations is detected, this may explain why transcripts tend to be longer after infection.

Fig. 4 Enteric infection with different pathogens leads to widespread, directed changes in intron retention. a Diagram depicting how intron retention changes are calculated. For each sample, delta PSI values for different splicing events [23] were calculated by subtracting the PSI value of the uninfected control sample from that of the infected one. b Histogram of average delta PSI values of intron retention (RI) events whose PSI values are significantly different after infection in at least 4 DGRP lines. c, d Histogram of delta PSI values of intron retention events whose PSI values are significantly different (Bayes factor > 10, delta PSI > 0.2) from the control (sucrose fed) state 4 h after infection with c Pe and d Ecc15 in w1118 flies. e Venn diagram of the overlap between events that are significant in 1 DGRP line, at least 4 DGRP lines, w1118 strain infected with Pe, and w1118 strain infected with Ecc15 Full size image

PSI reflects the number of intron retention reads (i.e., spanning the exon-intron boundary as well as the reads in the intron) divided by the sum of the number of intron-retention and intron-splicing reads (i.e., spanning the exon-exon boundary as well as in the flanking exons). In contrast to steady-state analyses, our population-level data from two conditions allowed us to investigate infection-induced changes in intron retention and whether they are restricted to specific transcripts or reflect mere random splicing events. We thereby defined introns with increased retention as introns that significantly increase in PSI (positive delta PSI, bayes factor > 10) whereas introns with reduced intron retention are those that significantly decrease in PSI (negative delta PSI, bayes factor > 10). As shown in Fig. 4b, we uncovered a large number of introns with increased retention (535) and decreased retention (331) that are significant in at least 5 DGRP lines (bayes factor > 10, delta psi > 0.2, also see Additional file 1: Figure S4a-b). These data thus suggest that DGRP strains react similarly to infection. For instance, among the 535 events with increased intron retention in 5 strains, 510 never decreased in retention, 13 decreased in one DGRP strain, 11 in two strains, and one in four strains. Moreover, using the R package SuperExactTest [24], we found that the overlap of introns with increased retention between strains was highly significant. For example, the expected overlap in two and four DGRP lines is less than 10 and 0.001 events, respectively, whereas the median observed overlap was 133 and 59, again suggesting non-random RNA splicing changes. Interestingly, there were 1.6 times more events with a positive compared to a negative delta PSI (535 vs 331 respectively), indicating a net increase in retention post-infection.

It is not clear whether the observed intron retention change is specific to P.e and whether different pathogens induce a similar response. We addressed this point by generating paired-end RNA-sequencing data of adult female guts of the widely used w1118 strain infected with the lethal P.e. and a non-lethal pathogen, Erwinia carotovora carotovora 15 (Ecc15). Adult female flies were either fed with sucrose (1.5X), P.e (OD 600 = 100 and 1.5X sucrose), or Ecc15 (OD 600 = 100 and 1.5X sucrose). When we compared the two infection conditions to the uninfected control state, we found that both conditions differed from the control in intron retention events (Fig. 4c, d, 493 and 200 events in P.e. and Ecc15 respectively, bayes factor > 10, delta psi > 0.2). In addition, we found a high degree of overlap among the DGRP lines, as well as between the DGRP and the w1118 data (Fig. 4e), supporting the notion that this phenomenon deterministically affects a specific set of introns. Nevertheless, Ecc15 infection yielded fewer differences overall and had proportionally fewer retention events, 40% of which were shared with the P.e. condition (Additional file 1: Figure S4c-d). While we only tested infection as an insult in this study, we nevertheless speculate that other interventions may lead to similar changes in splicing. Thus, we postulate that infection-induced splicing differences occur in response to different pathogens, and scale with the degree of virulence, infection severity, or stress.

Introns with increased retention have exon-like characteristics and are enriched for known RNA-binding motifs

We next aimed at characterizing the retained and spliced introns. A meta-analysis of the location of introns with increased and decreased retention showed that the density of introns with increased retention is very high at the 5′ end of transcripts, which partly explains why longer UTRs are being produced after infection (Fig. 5a). We then compared their length and GC content, both of which are known parameters that determine exon and intron specification [28, 29]. In terms of length, introns with increased retention tend to be shorter than the ones with decreased retention (Fig. 5b, Additional file 1: Figure S5a). In addition, their GC content tends to be higher, and consequently, the difference in GC content between the introns and their flanking exons was lower (Fig. 5c). Next, we performed RNA Polymerase II ChIP-seq on female guts under control and infected conditions to consider its intron occupancy as an additional characterization parameter (see the “Methods” section). Interestingly, we found that introns with increased retention also show greater enrichment for RNA polymerase II irrespective of treatment condition (Fig. 5d, Additional file 1: Figure S5b, see the “Methods” section). We did not find any enrichment of biological processes for the genes affected by intron retention. Together, these results suggest that retained introns tend to exhibit exon-like characteristics. To formally and independently validate this hypothesis, we overlaid a list of experimentally verified Drosophila upstream Open Reading Frames (uORFs) with our data [30]. We found that introns with significantly increased retention in more than 4 DGRP lines are more likely to contain a uORF (paired one-tailed t test p value = 8.2e−8, Fig. 5e, see the “Methods” section). In fact, when we investigated introns with increased retention in each DGRP line separately, we found that there is generally a greater proportion that overlaps a uORF (Additional file 1: Figure S5c). Thus, our observations suggest that many introns with increased retention may act as uORFs.

Fig. 5 Introns with increased retention have exon-like characteristics. Throughout the figure, blue and gray represent retained and spliced out introns, respectively. a The density of intron retention events along the normalized length of the gene. b Length of introns (in log 2 ) with significant intron retention changes (one-tailed t test p value < 2.2e−16). c GC content of those introns and their flanking exons. d Normalized PolII ChIP-seq signal of these introns and their flanking exons in the P.e.-infected state. e Proportion of significant intron retention events that overlap with a uORF (paired one-tailed t test p value = 8.2e−8). f The enrichment of D. melanogaster RNA binding motifs [25] calculated using AME [26], in the MEME suite [27]. Blue and gray points indicate enrichment among the sequences of introns with increased and decreased retention, respectively Full size image

The extensive overlap in introns with increased retention among DGRP lines suggests that this process is driven by a deterministic mechanism, possibly involving specific RNA-binding proteins whose differential activity may be responsible for the observed differences. Indeed, it is known that RNA-binding proteins contribute to splicing by binding specific targets in nascent transcripts in a context-dependent manner [31, 32]. We therefore assessed enrichment of RNA-binding motif (RBM) sites in the introns with decreased and increased retention, using as background those introns that did not change significantly. We used AME [26], from the MEME suite [27], to determine enrichment of experimentally derived RBMs in the sequences of introns and the 50 bases flanking them from each side [25]. We found enrichment of many RBMs in the introns with decreased retention, but few RBMs in those with increased retention (Fig. 5f, Additional file 1: Figure S5d,e). Furthermore, upon scanning for motif sequences in these introns, we observed that introns with increased retention not only have more predicted motif binding sites, as expected because of their longer sequences, but also tend to have more motif matches close to the introns’ 5′ splice site. These results suggest that introns with increased retention after infection have generally weaker and fewer splicing signals than those introns that efficiently undergo splicing.

The RNA-binding protein Lark mediates gut immunocompetence

The lower number of enriched RBMs in the introns with increased retention may indicate that intron retention is generally driven by infection-induced splicing impairments. However, the fact that these introns are shared across inbred lines and distinct pathogens suggests the involvement of a non-random process. To further address this hypothesis, we focused on Lark, since its RBM was the most enriched in the sequences of introns with increased retention, and investigated its possible involvement in the gut’s response to infection. Lark is the ortholog of human RBM4, an RNA-binding protein implicated in splicing, translation, and the stress response. In humans, it has been shown to be activated through phosphorylation by the p38 MAPK pathway in response to stress, where it shuttles out of the nucleus and affects translation of different targets [5]. The MAPK pathway, specifically through p38c, has been shown to mediate the Drosophila gut immune response to enteric infection through its effect on the transcription factor Atf-2 [33].

To investigate Lark’s involvement in the defense response, we performed overexpression and knockdown specifically in the adult gut enterocytes using the Myo1A-Gal4 driver in conjunction with tub-Gal80ts (Myo1Ats). Surprisingly, we observed that both knockdown and overexpression of lark in adult enterocytes resulted in enhanced survival compared to WT (Myo1Ats > w1118), with the overexpression transgenic flies being the most resistant to P.e. infection (Fig. 6). We validated lark knockdown and overexpression by performing RT-qPCR on dissected guts and found that indeed, there was up to 80% knockdown and 80–100 times overexpression in comparison to WT levels. Our observations point to a significant contribution of Lark in the gut response to infection, whereby modulation of its expression levels (either up or down) significantly impacts on overall pathogen susceptibility.

Fig. 6 Lark dosage perturbation leads to global changes in gene expression as well as enhanced survival to infection. a Left: general schematic of the crosses to generate enterocyte (EC)-specific expression of transgenes in adult female flies. Myo1Ats virgins were crossed to either UAS-lark RNAi, UAS-lark-3HA, or w1118 males, and their F1 progeny were maintained at 18 °C. After eclosion, adults were kept at 29 °C for 7 days, then infected with P.e. Middle: survival of lark overexpression and knockdown flies driven by the Myo1Ats Gal4 driver. Right: relative ratio of lark in dissected guts of those flies 4 h after infection with P.e. All experiments were performed with three biological replicates and n > 30 flies or guts. b Gene set enrichment analysis of the lark perturbation effect and infection effect as obtained by gene-level differential expression analysis. Each point is a gene set from the biological process gene ontology whose normalized enrichment score (NES) is plotted in two analyses. Overexpression and knockdown lead to similar changes in gene expression and common pathway enrichments Full size image

The experiments described above do not provide insights, however, into whether Lark affects intron retention. We therefore performed RNA-sequencing of control and infected guts of flies in which lark was overexpressed or knocked-down in adult enterocytes. We first performed gene-based differential expression analysis to characterize Lark-mediated differences. Interestingly, compared to the control and in line with our phenotypic observations, both Lark perturbations led to similar expression differences in terms of genes and gene sets (Fig. 6b, Additional file 1: Figure S6b, Additional file 8). Notably, we observed an enrichment for gene sets related to cell fate determination and cell recognition in the upregulated genes.

We performed the same intron retention analysis as before, but this time, we compared guts with perturbed lark expression to wild type (control and infected). We observed a similar increase in intron retention in all the genotypes, meaning that Lark is not strictly required for infection-induced intron retention (776, 918, and 829 events in the control, knockdown, and overexpression flies, Fig. 7a). However, when compared to infected wild-type guts, their lark knockdown counterparts exhibited less intron retention (318 vs 691 events, Fig. 7b). Interestingly, overexpression of lark led to an important increase in intron retention, even in the control state (474 and 691 in control and infected, respectively, Fig. 7b), and the distribution of introns with increased retention remained concentrated at the 5′UTR, especially when lark was overexpressed (Fig. 7c, d). In addition, enrichment of the Lark RBM in introns that were retained due to infection was proportional to lark levels (Fig. 7e). Moreover, introns with increased retention due to lark overexpression in the uninfected state were also enriched for the Lark RBM (Fig. 7f), indicating that increasing Lark levels directly leads to intron retention of a specific set of genes. We also found an enrichment of the Lark RBM in the introns that are less retained in the knockdown compared to controls (Fig. 7f), providing further evidence for the direct contribution of this RNA-binding protein in infection- and stress-induced splicing regulation.