Viruses are unable to replicate independently. To generate copies of itself, a virus must instead invade a target cell and commandeer that cell's replication machinery. Different viruses are able to invade different types of cell, and a group of viruses known as bacteriophages (or phages for short) replicate within bacteria. The enormous number and diversity of phages in the world means that they play an important role in virtually every ecosystem.

Despite their importance, relatively little is known about how different phage populations are related to each other and how they evolved. Many phages contain their genetic information in the form of strands of DNA. Using genetic sequencing to find out where and how different genes are encoded in the DNA can reveal information about how different viruses are related to each other. These relationships are particularly complicated in phages, as they can exchange genes with other viruses and microbes.

Previous studies comparing the genomes—the complete DNA sequence—of reasonably small numbers of phages that infect the Mycobacterium group of bacteria have found that the phages can be sorted into ‘clusters’ based on similarities in their genes and where these are encoded in their DNA. However, the number of phages investigated so far has been too small to conclude how different clusters are related. Are the clusters separate, or do they form a ‘continuum’ with different genes and DNA sequences shared between different clusters?

Here, Pope, Bowman, Russell et al. compare the individual genomes of 627 bacteriophages that infect the bacterial species Mycobacterium smegmatis. This is by far the largest number of phage genomes analyzed from a single host species. The large number of genomes analyzed allowed a much clearer understanding of the complexity and diversity of these phages to be obtained. The isolation, sequencing and analysis of the hundreds of M. smegmatis bacteriophage genomes was performed by an integrated research and education program, called the Science Education Alliance Phage Hunters Advancing Genomics and Evolutionary Science (SEA-PHAGES) program. This enabled thousands of undergraduate students from different institutions to contribute to the phage discovery and sequencing project, and co-author the report. SEA-PHAGES therefore shows that it is possible to successfully incorporate genuine scientific research into an undergraduate course, and that doing so can benefit both the students and researchers involved.

The results show that while the genomes could be categorized into 28 clusters, the genomes are not completely unrelated. Instead, a spread of diversity is seen, as genes and groups of genes are shared between different clusters. Pope, Bowman, Russell et al. further reveal that the phage population is in a constant state of change, and continuously acquires genes from other microorganisms and viruses.