In this Review, we will focus on the taxonomic composition, dynamics, and spatial structure of bacteriophage populations, as well as certain aspects of their interaction with their hosts in the healthy human gut. We will also review the available methodology, specifically discussing the challenges associated with metagenomic analysis pipelines. We will not be discussing phageome interactions with the host immune system, its role in various disease states, as well as phageome-based therapeutic approaches to treat gut disorders, as these topics are covered by other reviews in this Phage Focus issue of Cell Host & Microbe.

Widespread bacteriophage predation and lysogenic conversion in bacterial populations plays a major role in regulating bacterial biomass, maintaining biodiversity, horizontal gene transfer and driving biogeochemical cycles in the Earth biosphere (). With phage-bacterial ratios of ∼1:1 in the human gut (), we can expect that bacteriophage predation, lysogeny, and gene transfer will play major roles in controlling the density, diversity, and network interactions inside gut-associated symbiotic bacterial communities as well. Importantly, specific and lasting changes of phageome composition were detected in a number of diverse gut-related and systemic conditions such as inflammatory bowel disease (IBD), malnutrition, and AIDS (). Additionally, evidence of the efficacy of sterile fecal filtrate transfer in the treatment of C. difficile infection (CDI) points toward the potential ability of gut phages to restrict pathobiont growth and promote normal richness of the gut microbiota (). Interestingly, however, the majority of gut bacteriophages seem to engage in lysogenic interactions with their hosts, thereby persisting for prolonged periods of time and with much slower evolution rates than the minority virulent bacteriophages (). Lack of evidence of “kill-the-winner” dynamics and Red Queen co-evolution () suggests that ecological strategies and modes of interaction between gut bacteriophages and their hosts are fundamentally different from those observed in other well-studied ecological systems, such as oceans, where lytic lifestyles prevail and play a central role in shaping and controlling bacterial populations ().

The advent of high-throughput metagenomic sequencing technology has allowed us to appreciate the complexity and richness of human gut bacteriophage populations (). The first metagenomic studies of fecal viromes revealed that most bacterial viruses in the gut (81%–93%) are novel and can be neither assigned a taxonomic position nor linked to a bacterial host (). This is further complicated by the fact that human gut phageomes are highly individual specific, with only a small overlap between subjects (). The term “viral dark matter” has been coined to describe the existing gap in knowledge about the taxonomic composition and population structure of the gut phageome (). Description of the viral landscape in the gut would be incomplete without mentioning minority populations of circular, replication initiator protein (Rep) encoding, single-stranded DNA (CRESS-DNA) eukaryotic viruses (), and even pathogenic plant RNA viruses, which are likely of dietary origin but retain infectivity during transit through the gut ().

It is often postulated that viruses of bacteria are the most numerous biological entities on the planet and in many environments outnumber the counts of their prokaryotic hosts by a factor of 10 (). The original hypothesis of linear virus-to-microbe ratio (VMR), based on early data from marine and freshwater microbial communities (), has been revised recently, with power law and unimodal models seeming to more accurately reflect extensive variation in VMR (2.6–160 in the oceans) (). It has long been known that abundant and diverse communities of non-pathogenic viruses, mainly tailed bacteriophages, colonize the mammalian gut (). Up until the last decade, however, the phageome remained the “known unknown” of the gut microbiome. This was mainly due to a very limited toolkit, which included direct observation and counting of virus-like particles (VLPs) using transmission electron (TEM) and epi-fluorescence microscopy (EFM) techniques, as well as isolation of individual bacteriophages infecting specific host strains in culture. Microscopic methods helped to reveal a large diversity of viral morphotypes (up to several tens per individual) with total counts of bacterial viruses in human feces, caecal contents, and colonic mucosa reaching ∼10–10VLPs g). These were largely members of the Caudovirales order, represented by the Siphoviridae, Podoviridae, and Myoviridae families. Culture-based methods were mainly used to isolate bacteriophages against a limited set of model and clinically important microorganisms such as Escherichia/Shigella (), Enterococcus faecalis (), Clostridioides difficile (), and a few other bacteria. Because >95% of bacteria residing in the distal gut, including non-pathogenic strict anaerobes belonging to families Bacteroidaceae, Prevotellaceae, Ruminococcaceae, Lachnospiraceae, etc., are difficult to culture, the available collections of phage strains of human fecal origin clearly still do not reflect the true diversity of human gut bacteriophages.

The human body has been referred to as a “superorganism” () in which microbial cells are present in numbers (∼10) comparable to human cells (). An overwhelming majority (>99%) of these microbes are located in the distal segments of the gastrointestinal tract (GIT). They occupy different ecological niches in the gut lumen and on mucosal surfaces, forming complex biochemical interaction networks between themselves and with the host organism. The dynamic equilibrium of the gut microbiome is essential for normal host physiology. For instance, gut microbes participate in host metabolic processes (), stimulate normal development of immunity and brain functions in early ontogenesis (), provide a barrier against incoming pathogens, and balance local immune responses throughout life (). This has led to an appreciation of the human gut microbiome as a “forgotten organ,” an essential, albeit genetically and antigenically foreign, component of the human body (). The gut microbiome contains all three domains of cellular life, Bacteria, Archaea, and Eukarya, as well as viruses, albeit at very different relative concentrations ( Figure 1 ). Bacteria and Archaea account for more than 99% of the unique characterized gene repertoire and biomass () and have received most of the attention in human microbiome studies. At the same time, recent works have also highlighted the role of fungi and protozoa, microbial eukaryotes that constitute a smaller but potentially important part of the gut microbiome ().

Colonization and succession within the human gut microbiome by archaea, bacteria, and microeukaryotes during the first year of life.

A number software tools, databases, and websites have been specifically designed for processing high-throughput virome sequencing data. A concise selection of such software, along with some general-purpose tools useful for the steps from read filtration, trimming, and assembly to gene prediction, host prediction, taxonomic classification, and multivariate analysis of community composition, are listed in Table 1

De novo identification of novel viral genomes in the metagenomic datasets against a background of bacterial and eukaryotic DNA contamination presents an extremely challenging task. Viral lineages are polyphyletic by origin and fast evolution of many viruses leads to further sequence diversification, sometimes to an extreme degree where no DNA sequence similarity can be captured even between members of the same viral family (). Unlike in cellular life forms, lack of any conserved phylogenetic marker genes prevents easy identification and taxonomic assignment of novel uncultured viruses. We utilize a rigorous multi-stage approach to filtering out bacterial DNA contamination from VLP-enriched metagenomic sequencing samples. Putative viral contigs are identified via several selection steps, including positive results with the VirSorter classifier (), alignments to viral genomes in NCBI RefSeq database and our in-house database of crAss-like phages (), as well as the presence of multiple genes with above-threshold similarities to conserved bacteriophage proteins in pVOGs database () and/or circular contig topology ( Figure 2 ).

Methods based on density gradient ultracentrifugation are able to deliver phage nucleic acid samples that are virtually free from contaminating bacterial DNA (). These methods are impractical for routine use because of high manual workloads and hence operator-to-operator variability, low throughput, and a tendency to introduce bias by excluding viruses with atypical buoyant densities (). Therefore, most metagenomic studies of the human gut virome use more practical methods based on filtration, with subsequent precipitation or ultrafiltration of VLPs (). These protocols can introduce considerable amounts of residual bacterial DNA into the sample (). Combined with the incompleteness of bacteriophage genomic databases and the need to de novo identify viral genomes against a bacterial background, this can lead to frequent misinterpretations in gut virome studies. For instance, a study of the murine gut virome exposed to antibiotic treatment claimed an expansion of the resistome and other functions potentially beneficial to bacterial hosts in the phage metagenome (). These claims, however, were refuted by a subsequent independent re-examination, which demonstrated that the majority of these genes were likely to be associated with bacterial DNA contamination (). A number of metrics are suggested to measure the amount of contaminating bacterial DNA in virome samples, including percentage of reads aligned to bacterial 16S rRNA and cpn60 gene databases ().

Given that the majority of reads cannot be aligned to a closed-reference database, the alternative is an open-reference approach where viral reads are assembled into contigs, which are then classified and annotated through reference-based and de novo annotation steps. Reads are then aligned back to assembled contigs, and alignment counts matrices are built. This gives an opportunity to quantify viral species and perform α- and β-diversity analyses independent of the taxonomic position of any of these contigs (). Quality assembly of short reads is a significant hurdle in viral metagenomics. The right choice of assembler software and algorithm parameters is critical and can lead to dramatic differences in the results (). Bacteriophage populations of the human gut represent a particularly challenging target for de novo assembly due to (1) high levels of diversity (); (2) modular structure of bacteriophage genomes and high levels of genetic mosaicism (); (3) population microdiversity and high heterogeneity at strain level (); (4) high incidence of repeat and hypervariable regions (); and (5) wide variation of relative abundance and hence sequence coverage (). This leads to a high degree of assembly fragmentation and hampers annotation and interpretation of alignment results. The most radical solution to the “metagenomic assemblies” conundrum could be physical separation of individual viral particles and sequencing of individual genomes—i.e., single virion genomics or use of long reads spanning nearly complete viral genomes ().

Perhaps the most critical shortcoming of the metagenomic approach to studying the human gut phageome is the severe discrepancy between the demonstrable diversity of gut viruses and the number of genomes of known gut-associated bacteriophages and eukaryotic viruses in public databases. Viral metagenomics of the human gut yields between 75% and 99% of reads that do not produce significant alignments to any known viral genome (). This is in stark contrast with the current status of human gut bacteriome research, where the Human Microbiome Project () and other efforts have allowed for the isolation and complete genome sequencing of >1,000 predominant gut bacterial species (), accounting for >90% of total gut microbial diversity at the species level (). Despite this obvious shortfall, many studies of human gut virome in health and disease published so far have relied on alignment of individual reads or assembled contigs to these nascent viral sequence databases and hence were able to interpret only a small minority of the sequencing data (). In a study with just 13 human donors, we were able to assemble 8,920 putative non-redundant complete and partial viral genomes, of which only 161 (1.8%) could be assigned to known viral taxa (with >50% identity over 90% of contig length). Of these, 157 were bacteriophages, two were human papillomaviruses, and one was a plant RNA virus ().

The use of WGA results in a significant distortion in viral taxonomic composition, especially if the starting DNA concentration was extremely low. For instance, use of the popular φ29 DNA polymerase-based kits results in a significant expansion of small circular ssDNA genome viruses (phage families Microviridae, Inoviridae, and eukaryotic CRESS-DNA viruses such as Circoviridae and Anelloviridae) and plasmids (). In addition, amplification leads to a general reduction of diversity and obscures detection of some rare viral groups (). Therefore, the use of a new generation of library prep kits compatible with Illumina sequencing platforms, suitable for extremely low dsDNA and ssDNA inputs, and minimizing amplification steps should be adopted as standard practice in gut virome/phageome research (). Single-virion genomics (SGV) will probably be more widely used in future studies ().

If we were to assume that the gut phageome is composed entirely of coliphage T4 (genome of ∼170 kb; an example of a relatively large Myoviridae phage) with a titer of 1 × 10cfu g, the total mass of virion heads (MW = 1.94 × 10Da) would be 3.23 μg per g of feces (). Consequently, the total mass of its 1.04 × 10Da genomic double-stranded DNA (dsDNA) would be 1.7 μg per g of feces. In a similar way, should the entire fecal phage population consist of coliphage X174 (genome of ∼5 kb; an example of small Microviridae phage), the total genomic single-stranded DNA (ssDNA) content would be equal to 30 ng per g of faeces. Since the human gut phage community represents a complex mixture of species with different genome sizes, with the types of nucleic acids varying as well, the actual DNA yields can be as high as 250–500 ng viral DNA per gram of feces (), albeit with some samples yielding as little as 4–5 ng. Thus, whole-genome amplification (WGA) techniques (typically multiple displacement amplification [MDA] with phage φ29 DNA polymerase) will be required to obtain sufficient DNA for downstream processing (). If a reverse transcription step is included, MDA gives the added advantage of converting single-stranded cDNA into a double-stranded form, compatible with common sequencing library preparation techniques ().

In contrast, shotgun metagenomic studies of total community DNA in human feces imply a much higher proportion of bacteriophage sequences in the gut, from an average of 5.8% () to extremes of up to 22% (). Assuming an average size of 50 kb per bacteriophage genome and an average size of bacterial genome of 4 Mb, this would imply a VMR of ∼4.64. With shotgun metagenomic data, it is impossible to discern DNA packaged in phage particles from prophage sequences in bacterial genomes. At the same time, it seems likely that both EFM/flow cytometry counts and viral metagenomics methods tend to underestimate viral loads in the extremely dense microbial and viral communities of the gut due to a number of factors, including inefficient elution of VLPs from feces (). It is also possible that binding of phages to bacterial cells and cell debris and aggregation of VLPs could lead to underestimated EFM counts.

A number of studies that employed direct counting of VLPs stained with DNA/RNA intercalating dyes, suggested viral counts from the human gut significantly lower than the expected level of 10VLPs g(if the postulated approximate 10:1 phage:bacterium ratio was maintained). Hoyles et al. reported an average 3 × 10VLPs obtained by filtration per gram of feces in healthy adult subjects (), while a study of colonic mucosa biopsy samples revealed the presence of 1.2 × 10VLPs per biopsy in healthy individuals and significantly higher viral loads of 2.9 × 10VLPs per biopsy in IBD patients (). Recently, we employed viral metagenomics to quantify fecal bacteriophages by comparing total numbers of DNA reads in VLP-enriched fractions to numbers of reads aligned to a standardized number of exogenous phage deliberately spiked into the fecal samples (lactococcal phage Q33) (). In a small sample size, we estimated viral loads to be 1.46 × 10–1.81 × 10VLPs g. Taking 1 × 10VLPs gas a rough estimate, one could conclude that the true VMR in the human gut is reversed relative to other environments and could be as low as 0.1.

Each of the available purification and analysis methods has its limitations and introduces a bias. For example, the use of glass beads for sample homogenization, high centrifugation speeds, and small pore filters leads to a dramatic reduction in large virions. Chloroform extraction of PEG-precipitated VLP samples removes enveloped viruses. Contrary to that, use of large pore filters and omitting the chloroform step improves recovery of some viruses but introduces considerable bacterial contamination (). Similarly, CsCl density gradient purification yields very pure samples, ideal for TEM and metaproteomic studies, but fails to recover enveloped viruses and those with atypical buoyant densities (). We will focus on some of the most significant biases and unsolved problems associated with metagenomic analysis of viral populations in the human gut.

Currently, deep sequencing using high-throughput short-read-based technologies (Roche 454, Illumina MiSeq/NextSeq/HiSeq/NovaSeq platforms, Ion Torrent platforms) remains the primary approach to characterizing unculturable viral communities in the gut. However, assembly, mapping, and classification of short reads arising from mostly novel and unknown viral genomes (viral dark matter) represent a considerable bioinformatic challenge (). In recent years, two long-read sequencing technologies became available (Pacific Biosciences and Oxford Nanopore), which, despite considerably lower per base accuracy rates, could be an interesting complement to short-read sequencing. Specifically, they can be used to assist in scaffolding of large novel viral genomes, obtaining information on methylation patterns (potentially useful for host prediction;), and for studying population structure at a single-virion level, since long reads can, in some cases, represent complete or near-complete viral genomes (). Additional future complementary approaches may include gut viral metatranscriptomics (RNA-seq) and viral metaproteomics.

An integrated gut virome analysis pipeline, which amalgamates different methods and approaches reported in the literature is presented in Figure 2 . A crude VLP-containing fraction of feces (fecal fitrate; FF) is typically prepared by vigorous homogenization and subsequent centrifugation and microfiltration of the supernatant to remove bacterial cells and dietary debris. Absolute quantification of VLPs in FF and mucosal samples can be done by direct counting of particles stained with DNA/RNA intercalating dyes (SYBR green II, SYBR gold, DAPI) under EFM or using flow cytometry (). Flow cytometry has an added advantage in that specific fractions of particles selected on the basis of size, granularity, and fluorescence intensity can be collected and further analyzed (). To obtain more concentrated samples of VLPs, ultracentrifugation at ∼120,000 g, precipitation with polyethylene glycol and NaCl or ZnCl, or tangential flow filtration can be employed (). A concentrated FF sample can then be enzymatically treated to remove free, capsid-unprotected DNA/RNA, yielding a viral fraction suitable for metagenomic sequencing (). Alternatively, even purer VLP samples can be obtained by collecting fraction(s) of specific buoyant density after ultracentrifugation in CsCl step or continuous gradients (). VLP fractions can be examined using TEM (), metagenomic DNA and cDNA sequencing, or metaproteomics.

A number of studies reported very high levels of individual specificity of the gut phageome, with inter-individual differences being the primary source of variance at the population level (). Despite that, the identification of a common set of bacteriophages found in 20%–50% of individuals formed the basis for the healthy core gut phageome concept (). A significant fraction of bacteriophages was found to be shared between twins and their mothers, as well as between IBD-affected and healthy members of the same household (). A common presumption is that newborns are born sterile; therefore, bacterial viruses would not be expected to be present in their gut. Rapid colonization of the newborn gut in the first days of life by a dynamic assembly of bacteriophages was reported (). The neonatal gut phageome is complex and relatively unstable, preying on a low abundance of microbial hosts (). However, progressive maturation of the infant gut microbiome leads to a reduction of viral abundance and diversity, accompanied by an increase in abundance and diversity of the bacterial component. Interestingly, abundance and diversity of Caudovirales and Microviridae show opposite trends in the early postnatal ontogenesis with a gradual decrease of the former and increase of the latter up to 2.5 years of life. Birth mode has a profound effect on phageome composition, still detectable at 1 year of age (). Little is known about phageome progression later in life. However, unusually high abundances of Gokushovirinae were detected in one cohort of elderly subjects, possibly reflecting a shift toward Firmicutes in their bacteriomes (). Interesting but understudied aspects of individual specificity in the human gut phageome are geographic and ethnic differences. We have observed stark contrasts in crAss-like phage composition in Western and African populations, with the former being predominantly colonized by candidate genus I (likely host Bacteroides), while the latter is colonized by candidate genera VIII and IX (likely host Prevotella) ().

These phages are an ever-present component of the human gut microbiome, even though the relative abundance of this family in the phageome remains controversial (). These viruses possess small circular ssDNA genomes (4–7 kb) packaged into icosahedral capsids. A large diversity of these viruses was detected in both marine and animal-associated microbiomes (). However, only a small sample of Microviridae phages, mainly enterobacteria phages belonging to the Bullavirinae subfamily, and Chlamydia and Bdellovibrio phages in the Gokushovirinae subfamily, has been isolated in culture. The latter subfamily, along with subfamily Alpavirinae, is especially predominant in the human gut (). As the only Microviridae group capable of a temperate lifestyle, prophages of family Alpavirinae are frequently detected in the genomes of Bacteroides and Prevotella, integrated through an unconfirmed mechanism that possibly involves cellular chromosome dimer resolution machinery (). The host range of gut Gokushovirinae cannot be directly established. However, acquisition of possible Gokushovirinae peptidase genes by some strains of predominant gut Firmicutes, in particular Faecalibacterium prausnitzii, suggests that these anaerobes may serve as the host to some strains of Gokushovirinae ().

In 2014, a previously unrecognized 97 kb dsDNA phage genome was described in human gut metagenomic datasets and termed crAssphage (crAss, cross-assembly) (). Its high relative abundance (up to 90% of total viral load in the gut of individuals) and wide representation in human population (>50% of Western population colonized) attracted a lot of attention, because of both its apparent significance for the human health and its potential use as fecal pollution marker (). Despite no significant homology to any known bacteriophages, it was predicted to infect Bacteroides based on co-abundance and CRISPR spacer hits (). Analysis of the genome assigned functions to ∼50% of genes, predicted a Podoviridae-like morphology, and identified a whole family of similar crAss-like bacteriophages present in diverse environments such as human and insect guts, oceans, terrestrial and groundwater samples (). We recently described an expansive collection of uncultured, human gut-associated crAss-like bacteriophage genomes, which could be classified into ten candidate genera and four subfamily-level taxa. Taken together, representatives of this novel proposed viral family are present in 77% of individuals in diverse human populations with relative abundances of up to 95% of the total viral load in the gut (). In a separate study, we reported the propagation of the first member (φcrAss001) of the family in a pure culture of Bacteroides intestinalis (). Preliminary results suggest that members of this family maintain stable colonization of the human gut and can engage in unusual carrier state-type of interaction with their bacterial hosts both in vitro and in vivo (). Reyes et al. reported stable engraftment and long-term persistence of two human fecal viruses (φHSC04 and φHSC05), which we later identified as crAss-like phages belonging to candidate genera I and VII, in mice colonized by a defined microbial community consisting of 15 strains of anaerobic bacteria, including 8 different species of Bacteroides (). Further, a recent report highlighted engraftment and persistence for up to 1 year of allogeneic crAssphage in humans during fecal microbial transplantation (FMT) (). These interesting phenomena provide additional clues regarding the host range of this unusual viral family and also support in vitro observations of their unusual ability to persist long term in the presence of a sensitive host.

Imbalance of bacteriome profiles within the Finnish Diabetes Prediction and Prevention study: Parallel use of 16S profiling and virome sequencing in stool samples from children with islet autoimmunity and matched controls.

Imbalance of bacteriome profiles within the Finnish Diabetes Prediction and Prevention study: Parallel use of 16S profiling and virome sequencing in stool samples from children with islet autoimmunity and matched controls.

Members of this viral group have linear dsDNA genomes ranging in length from ∼16 kb (streptococcal podovirus C1) to hundreds of kilobases in large Myoviridae. Temperate phages of this order engage in lysogenic interaction with their hosts by integrating their genome into the host chromosome (e.g., coliphages λ and μ) or persisting through generations as an autonomously replicating episome (e.g., coliphage P1). The frequent occurrence of prophages in gut commensal bacteria makes it possible to identify hosts for some of the temperate phage genomes detected in metagenomic surveys (). The host range of the order Caudovirales is very broad and includes all major bacterial phyla found in the gut: Firmicutes, Bacteroidetes, Proteobacteria, and Actinobacteria. The typical gut Caudovirales member Siphoviridae (identified de novo or using reference databases) have linear genomes of moderate size (∼35–50 kb), often containing a lysogeny module with a gene for a serine or tyrosine integrase (). Some representatives of this group of viruses infecting common members of the human gut bacteriome, such as Bacteroides and Clostridium, have been isolated in culture (). However, the presence of large Siphoviridae-type virions in TEM images (sometimes with tails >1 μm long) suggests the existence of potentially virulent viruses with larger genomes, since virion volume can serve as an (imperfect) predictor of genome size (). Furthermore, virulent-tailed phages tend to have larger genomes in order to supply the necessary complement of nucleic acid metabolism genes for efficient lytic cycles. Data on Myoviridae and Podoviridae phages remain scarce, but results suggest a wide variation of virion size, genome length, and potentially host ranges (). Recently, an uncultured megaphage with a genome length of >540 kbp, predicted to infect Prevotella and showing properties of the Myoviridae family, was detected in the microbiota of humans from Bangladesh and Tanzania (). Faecalibacterium and Bifidobacterium are among the most predominant bacteria in healthy human microbiota in adulthood and infancy, respectively (), but efforts to isolate bacteriophages infecting these predominant bacterial genera have proven unsuccessful, despite the presence of numerous prophages in their genomes. Recently, induction of prophages and secretion of Siphoviridae and Myoviridae-type viral particles from Faecalibacterium and Bifidobacterium have been reported ().

As discussed above, the scarcity of bacteriophage genomes in the reference databases makes it impossible to directly stratify metagenomic sequences of the human gut phageome by their position in viral taxonomy systems, functional properties or their specificity toward bacterial hosts. Using network-based de novo clustering approaches (e.g., vConTACT framework) could be an alternative. Such methods, based on analysis of the content of conserved protein-coding genes, attempt to represent evolutionary and functional relationships between both cultured and uncultured viral genomes in a reticulate fashion and categorize them into clusters, agnostic from established taxonomy (). However, further efforts will be required to reconcile these viral clusters with established taxonomic systems or to predict biological properties and host ranges of newly discovered unknown bacteriophages.

In addition to bacteriophages, cryptic human and microeukariotic CRESS-DNA viruses are consistently detected (Anelloviridae, Circoviridae, and Genomoviridae), as well as human Herpesviridae and Papillomaviridae (), but these fall outside of the scope of this Review. Interestingly, the only group of RNA viruses consistently found in the healthy gut are plant viruses belonging to family Virgaviridae (). These viruses of dietary origin are able to maintain infectivity upon passage through the human GIT (). The presence of giant amoebal viruses () in the gut has never been reported but cannot be completely ruled out because methods used by most studies are incapable of recovering them. The same would be true of some “jumbo” bacteriophages (with genome size of >500 kbp;) and viruses with unusual morphologies and physical properties, such as Autolykiviridae, a family of highly prevalent ocean phages, which avoided detection until very recently (). Archaeal viruses include some morphotypes shared with bacteriophages, as well as the unusual ones specific to the archaeal domain (). None have been detected in human gut samples to date.

These findings were further corroborated by metagenomic studies of the gut phageome involving sequencing of both viral genomic DNA and RNA. While >80% of viral sequences did not match against closed-reference databases, most of the classifiable metagenomic reads aligned to Siphoviridae phage genomes (). In more recent studies focusing on DNA viruses, only 7%–13% of recovered viral contigs could be assigned to known viral families—mostly of the order Caudovirales (dsDNA genomes) and family Microviridae (ssDNA genomes)—based on the presence of family-specific hallmark genes (). The abundance of Microviridae was, however, likely to be overestimated through the use of MDA amplification as described earlier. In a recent study, we found that the majority of viral contigs identifiable to family level belonged to the Caudovirales (). However, large numbers of Microviridae contigs were also detectable. The presence of lysogeny genes in the majority of the complete contigs of Caudovirales suggests temperate lifestyles. We were unable to detect RNA bacteriophages (e.g., family Leviviridae) in gut phage communities, likely because of low viral loads (Leviviridae virions are very small and can be resistant to precipitation) and nucleic acid extraction procedures ().

Microscopic studies have shown that gut bacteriophages are almost exclusively represented by tailed viruses with icosahedral capsids, belonging to the order Caudovirales. In the majority of cases, they can be robustly classified into families based on tail morphology: Siphoviridae with long flexible non-contractile tails, Myoviridae with long stiff contractible tails, and Podoviridae with very short tails ( Figure 3 ). Unique assemblages of up to several tens of distinct morphotypes of these three families can be recognized in feces collected from different human donors (). Some studies report detecting bacteriophage families other than those of the order Caudovirales, e.g., Cystoviridae, Inoviridae, and Microviridae (). It should be noted, however, that recognizing small non-tailed icosahedral or filamentous capsids in fecal samples against a background of dietary debris can be a challenging task ( Figure 3 ).

Spatial Structure and Dynamics of Bacteriophage Populations in the Gut

Thingstad et al., 2008 Thingstad T.F.

Bratbak G.

Heldal M. Aquatic phage ecology. Avrani et al., 2012 Avrani S.

Schwartz D.A.

Lindell D. Virus-host swinging party in the oceans: Incorporating biological complexity into paradigms of antagonistic coexistence. The ocean microbiome provides a classical ecological model to study general principles of population dynamics and ecological-evolutionary relationships between bacteriophages and their bacterial prey (). A number of models have been developed explaining certain aspects of phage population dynamics and phage-host co-evolution. Among others, there is the “arms race” model (continuous Red Queen-like directional selection of mutations leading to broadly resistant hosts and highly infective parasites), the fluctuating selection model (density dependent fluctuating selection based on a trade-off between benefits of resistance and its metabolic costs), and “kill-the-winner” (extension of fluctuating selection model, taking into account abiotic factors), each with their own assumptions and limitations ().

The mammalian gut environment presents a much more complicated system than the ocean, with a number of biotic and abiotic factors at play, such as the complex anatomy of the gut at both macroscopic (longitudinal segmentation, valves, haustra, peristalsis and mass movement, secretion of bile, and pancreatic juice) and microscopic (villi, microvilli, intestinal and colonic glands, M cells, glycocalyx, and secreted mucus) levels, the action of local immune system (secretion of sIgA into the lumen), a constant influx of new phages, and their hosts from the environment, as well as chemical composition, amounts, and consistency of dietary residue. This results in phage population dynamics that are fundamentally different to ocean ecosystems. Lack of observable biomass control from phages leads to bacterial densities (1011 cfu g−1 faeces) reaching the carrying capacity of the habitat, while phage titers and VMR remain comparatively low (<0.1). An unresolved question is why in such a dense community do we not see frequent phage bursts leading to much higher VMR levels?

Minot et al., 2013 Minot S.

Bryson A.

Chehoud C.

Wu G.D.

Lewis J.D.

Bushman F.D. Rapid evolution of the human gut virome. Reyes et al., 2010 Reyes A.

Haynes M.

Hanson N.

Angly F.E.

Heath A.C.

Rohwer F.

Gordon J.I. Viruses in the faecal microbiota of monozygotic twins and their mothers. Contradictory to the fluctuating selection model and in agreement with “piggyback-the-winner,” gut phageome composition is stable over time, with 80%–95% of phage contigs retained in a single individual over 1- to 2.5-year observation periods (). As predicted by the latter model, mutation accumulation rates were low for temperate Caudovirales phages but significantly higher for obligately virulent Microviridae phages (>1 nt substitution per 100 nt per day).

Minot et al., 2013 Minot S.

Bryson A.

Chehoud C.

Wu G.D.

Lewis J.D.

Bushman F.D. Rapid evolution of the human gut virome. Reyes et al., 2013 Reyes A.

Wu M.

McNulty N.P.

Rohwer F.L.

Gordon J.I. Gnotobiotic mouse model of phage-bacterial host dynamics in the human gut. “Piggyback-the-winner” offers an elegant explanation of low VMR despite high bacterial counts in the gut. This model, however, fails to explain all observed phenomena. For example, strictly virulent Microviridae phages are able to persist in the gut for extended periods of time (). At the same time, virulent Myoviridae phage (φHSC03) and crAss-like phages (φHSC04 and φHSC05) were able to engraft and stably persist in high amounts in mice colonized with a 15-strain artificial bacterial community without significant changes to the latter (). In the same experiment, a temperate Siphoviridae phage φHSC01 and a Microviridae phage φHSC02 were capable of only briefly colonizing mice, causing a transient decline of their hosts (Bacteroides caccae and Bacteroides ovatus) followed by rapid recovery of microbiota and elimination of phages.

Weiss et al., 2009 Weiss M.

Denou E.

Bruttin A.

Serra-Moreno R.

Dillmann M.-L.

Brüssow H. In vivo replication of T4 and T7 bacteriophages in germ-free mice colonized with Escherichia coli. Maura et al., 2012 Maura D.

Morello E.

du Merle L.

Bomme P.

Le Bouguénec C.

Debarbieux L. Intestinal colonization by enteroaggregative Escherichia coli supports long-term bacteriophage replication in mice. 11 cfu g−1 of feces with its host being stable at ∼1010 cfu g−1 of feces ( Weiss et al., 2009 Weiss M.

Denou E.

Bruttin A.

Serra-Moreno R.

Dillmann M.-L.

Brüssow H. In vivo replication of T4 and T7 bacteriophages in germ-free mice colonized with Escherichia coli. Maura et al., 2012 Maura D.

Morello E.

du Merle L.

Bomme P.

Le Bouguénec C.

Debarbieux L. Intestinal colonization by enteroaggregative Escherichia coli supports long-term bacteriophage replication in mice. Shkoporov et al., 2018a Shkoporov A.

Khokhlova E.V.

Fitzgerald C.B.

Stockdale S.R.

Draper L.A.

Ross R.P.

Hill C. ΦCrAss001, a member of the most abundant bacteriophage family in the human gut, infects Bacteroides. A number of studies have focused on phage-host dynamics in the gut utilizing germ-free mice monocolonized with either non-pathogenic or pathogenic strains of E. coli challenged with well-characterized strictly virulent bacteriophages (). Despite dramatic expansion of bacteriophage populations, little or no decrease was seen in E. coli colonization levels. Interestingly, while phage T4 colonization was transient, bacteriophage T7 was maintained at ∼10cfu gof feces with its host being stable at ∼10cfu gof feces (). Similar dynamics were reported with a three-phage cocktail, which showed no detectable changes in colonization levels (). Interestingly, eradication of phage T4 from the gut, despite the presence of large numbers of sensitive host, was not due to genetic resistance in E. coli. Only 20% of E. coli clones became resistant on prolonged in vivo exposure to phage T7, suggesting, on the one hand, the metabolic cost of such resistance and, on the other hand, additional factors at play in the living gut preventing the virulent phage from completely wiping out its host. We observed rapid emergence of resistance to a crAss-like phage in an in vitro co-cultivation system with its host, but this never resulted in complete takeover by resistant clones (). Interestingly, the high mutation rate suggests that a genetic switch mechanism rather than random point mutations may be responsible for resistance. Furthermore, some of the mutants were readily able to revert to sensitive phenotype, again suggesting a metabolic cost associated with resistance.

Minot et al., 2013 Minot S.

Bryson A.

Chehoud C.

Wu G.D.

Lewis J.D.

Bushman F.D. Rapid evolution of the human gut virome. De Sordi et al., 2017 De Sordi L.

Khanna V.

Debarbieux L. The gut microbiota facilitates drifts in the genetic diversity and infectivity of bacterial viruses. Latino et al., 2016 Latino L.

Midoux C.

Hauck Y.

Vergnaud G.

Pourcel C. Pseudolysogeny and sequential mutations build multiresistance to virulent bacteriophages in Pseudomonas aeruginosa. Howard-Varona et al., 2018 Howard-Varona C.

Hargreaves K.R.

Solonenko N.E.

Markillie L.M.

White 3rd, R.A.

Brewer H.M.

Ansong C.

Orr G.

Adkins J.N.

Sullivan M.B. Multiple mechanisms drive phage infection efficiency in nearly identical hosts. Avrani et al., 2012 Avrani S.

Schwartz D.A.

Lindell D. Virus-host swinging party in the oceans: Incorporating biological complexity into paradigms of antagonistic coexistence. Since strictly virulent bacteriophages are also available in the gut lumen, without causing any significant disturbance to the bacteriome, models other than “piggyback-the-winner” are required to explain the ecological and evolutionary forces driving maintenance of an equilibrium in the tripartite host-bacteriome-phageome system. CRISPR-Cas systems provide a powerful tool for bacterial cells to rapidly acquire resistance at population or community level upon initial contact with a new bacteriophage. Of special interest are CRISPR arrays encoded by temperate bacteriophages, adding an additional layer of complexity to the system in the form of phage versus phage antibiosis (). The enormous gene pool of the gut microbiome and high frequency of lateral gene transfer promotes the generation of diversity in both bacteriophages and their hosts, with host switches occurring at rates much higher than those seen in reductionist in vitro systems (). Pseudolysogeny, which seems to be a common mechanism of persistence for gut bacteriophages, was shown to promote accumulation of mutations in infected hosts and a build-up of resistance (). However, despite rapid diversification, there is no evidence of existence of Red Queen dynamics in the gut, which would otherwise lead to continuous directional selection of both multi-resistant hosts and generalist bacteriophages. Instead, it seems that the metabolic cost of resistance leads to slower growth of bacteria, while the generally lower efficiency of generalist phages toward a particular host prevents them from taking over the specialists (). Furthermore, the “enhanced infection” model predicts that metabolic cost-less resistance mutations (e.g., alterations of cell envelope) can render a bacterial clone sensitive to a different phage, leading to passive host switching ().

Lourenço et al., 2018 Lourenço M.

De Sordi L.

Debarbieux L. The diversity of bacterial lifestyles hampers bacteriophage tenacity. Reyes et al., 2013 Reyes A.

Wu M.

McNulty N.P.

Rohwer F.L.

Gordon J.I. Gnotobiotic mouse model of phage-bacterial host dynamics in the human gut. Weiss et al., 2009 Weiss M.

Denou E.

Bruttin A.

Serra-Moreno R.

Dillmann M.-L.

Brüssow H. In vivo replication of T4 and T7 bacteriophages in germ-free mice colonized with Escherichia coli. The available experimental data, however, suggest that physiological and epigenetic resistance (growth phase, expression of surface receptors), as well as physical “abiotic” factors of the gut biome (refuge model), can play a decisive role in the protection of bacterial cells from extermination by virulent bacteriophages (). There is a significant deficit of knowledge regarding how the gut anatomy at macroscopic and microscopic levels restricts phage-host interactions. Further studies should focus on biogeographic aspects of the gut phageome along both longitudinal and radial axes. It is of special interest to investigate mechanisms of physiological and epigenetic regulation of phage infection with prominent members of gut phageome, such as crAss-like phages and Microviridae phages infecting Bacteroides, Prevotella, and Faecalibacterium.