We are one of a kind because our genetic background says so. When our parents conceived us, one set of chromosomes from each of them merged to create a brand new diploid cell called zygote, which is already unique. Yet, such singularity doesn’t start with the zygote itself, but much earlier with our progenitors. The reason why relies on the way a peculiar kind of cells called gametes are produced. To synthesize a haploid cell with half the number chromosomes (a gamete), a process called meiosis must happen first and, in order to progress accurately, an exchange of genetic information between the chromosomes inherited by their parents must have taken place, thus making the resulting chromosomes slightly different from their own ones. This process is an important source of individual variability, meaning that indeed, we are unique. Furthermore, we know there is another source of uniqueness coming from traits that may be heritable but are not caused by changes in our genetic background: The epigenetic changes in response to the environment we live in.

In the unlikely event you’re not yet convinced, I guess there is still a chance you will; let me explain myself. Back in 1998, the term metagenomics was coined by Jo Handelsman to describe the study of the genetic composition of samples taken directly from the environment. That was a rather visionary idea as both sequencing and computational methods were not in full bloom yet. Scientist came to confirm more than 99% of the microorganisms can’t be cultured in the lab (they were suspicious about it for decades) and consequently, the immense majority of them were anonymous to us. Improvement in manipulation of environmental samples (creation of BAC libraries, etc.) along with faster, and cheaper sequencing capacity has been paramount to deal with escalating sample complexity and is indeed helping to unravel the secrets of this microscopic universe.

A particularly interesting application of metagenomics arises in the study of the human microbiome, which comprises the entire community of microorganism inhabiting our body whose role is vital to us. Besides curiosity, there is growing interest in the field because several studies have shown a correlation between changes in the composition of our microbiome and disease, even though the cause-effect assignment is not yet clear.

On average, the human intestinal microbiome includes around 160 bacteria species , although up to 1000 different bacterial species can be present. There are enormous differences in the microbiome of different individuals, as highlighted in a study showing gut samples from Japanese individuals express the enzymes porphyranase and agarase (present in bacteria found in seaweed), which are absent in samples from North Americans .

The diversity at the species level is relatively known, but to date no comprehensive study has been made looking at the strain diversity (diversity within species) found in the gut microbiome. To this end, Sharon Greenblum and colleagues have performed a large-scale study to determine the copy number variation at the strain level in 109 gut samples from Danish and Spanish healthy, obese and individuals with inflammatory bowel disease (IBD) .

To this end, they have developed a design in which sheared total DNA from each sample (short reads of 75bp) are aligned to 260 prevalent human gut microbiome strain reference genomes obtained from NCBI’s Genebank (including draft genome submissions). Reference genomes were then grouped into genome clusters corresponding to species (e.g. Bacteroides ovatus or Rosevuria intestinalis), with each reference genome within the cluster representing a single strain for that species.

In parallel, gene-coding regions from reference genomes were annotated with Kyoto Encyclopedia of Genes and Genomes (KEGG) orthology groups called KOs: these genes are responsible for specific functions like motility, sugar metabolism or drug resistance for a given species.

To determine the abundance of each genomic cluster 13 representative single copy marker genes (present in one copy in more than 95% reference genomes), mostly coding for ribosomal proteins were chosen. By using these, they convert the calculated coverage of each KO in each cluster to a copy-number estimate. Thereafter, they identify specific KOs in specific genome clusters (KO-cluster pairs or KCs) whose copy number varied across samples.

Their analysis returned 735 variable KCs across 38 genome clusters, with high variation in the number of highly variable KCs in each cluster, reaching up to 47 KCs in the Roseburia intestinalis cluster, for instance. Besides looking at highly variable KCs across different samples, they looked at variable KCs only found in small proportion of samples (set-specific variable KCs). These ones were far more common than the highly variable ones, showing a more subtle variation in other genes. Some of these had higher copy number compared to all other samples while some others had low copy number.

Interestingly, the specific function related to transport was the one showing the highest copy number variation (in 10 genome clusters). High variation in sugar transport systems outrivaled on the phyla Firmicutes and Actinobacteria, while iron transport function was prone to variation in Bacteroidetes. Proteins involved in transport have been suggested to be a primary adaptive mechanism suggesting strains have evolved to get adapted to their host. In Eubacterium rectale genome cluster, motility function was also affected. When analysing set-specific variable KCs in bacterial species, a number of functions were found to be susceptible to copy-number changes too. Among these, cell growth and sporulation were identified in the two genome clusters corresponding to Clostridium sp. Transitions between virulent states-associated genes stood out in a number of bacterial species: Streptomycin biosynthesis in Acadimonoccocus sp or lysosyme production in Bacteroides ovatus. Overall, much of the variability mostly happens in genes facilitating adaptation to the specific gut environment, although similar genes change regardless of the particular gut environment.

Copy number seems to rely upon the health status of the volunteer to some extent, at least. Accordingly, the copy number of 24 and three KCs was significantly linked to IBD and obesity, respectively. Furthermore, some of the genes they find to be affected have been previously associated to the disease, as it happens to be the case with thioredoxin-1 in Clostridium sp (thioredoxin reductase was present in the fecal microbiome of mice fed in a high-fat diet) . Its role is related to the maintenance of the redox equilibrium. A second example from a gene affected in obesity and found in this study, is the loss of ubiquinone-reducing gene from Bacteroides plebeius. Other findings were novel as it is the variation observed in particular bacterial species and IBD. In Roseburia inulinivorans, variation in a drug efflux protein (linked to antibiotic resistance) correlated with IBD-affected individuals.

Another outcome from the study was the different proportion of strain abundance within genomic clusters from different samples, even though their analysis was restricted to known strains, excluding these which are yet unknown. In some cases, like the Escherichia coli cluster, 76% of the variation in copy number could be explained by the reference genomes included from databases, but for some other species the observed variation couldn’t be explained by any combination of known reference genomes.

In summary, these results bring evidence to substantiate the idea of a “microbiome fingerprint” for each individual. Maybe this doesn’t mean much at the moment, but we can be certain that in a near future the information contained within this sort of fingerprint will be useful, hopefully for good.