In the currently available EMP database (as of July 2014) [9] there are samples acquired from >200 collaborators, comprising more than 40 different biomes, defined for broad categories including marine pelagic water, freshwater lake sediment, human-associated, and so on. At a ‘30,000 feet’ perspective the EMP is identifying the environmental characteristics that correlate with microbial community structure within and between these different biomes. However, as the EMP is a collection of individual projects, each with a core hypothesis, it is also possible to discuss the immediate observations associated with individual studies. For example, exploration of human saliva from obese versus normal-weight individuals showed that while saliva was able to alter the aromatic properties of wine, only a few microbial taxa were likely to be responsible for this [10]. This preliminary study shows that oral microbes may influence the aromatic properties of food and drink, altering our satiation response. In soil systems, microbial communities from prairie soils across the Midwest of the United States of America were sequenced by the EMP. This ecosystem has been mostly replaced through agricultural land-use, and this study showed that the major shifts in their composition are driven almost exclusively by the changing relative abundance of Verrucomicrobia and its influence on carbon dynamics [11]. These analyses could be useful in helping improve prairie restoration efforts. In deep soil samples from the Russian permafrost, the EMP characterized microbial communities associated with buried organic matter, helping to identify the bacteria that were degrading the soil organic matter in these systems [12]. In deep-sea sediments from the Gulf of Mexico, the EMP data have provided understanding of how the microbial communities responded to the oil pollution from the Deepwater Horizon Oil Spill [13],[14]. Another example of investigating human impact is the analysis of freshwater river sediments along a gradient of human influence, whereby the EMP data on the microbial communities demonstrate impact-specific signals [15]. The diversity of study sites and research questions embedded in these first 30,000 samples is extraordinary, yet this is just the tip of the iceberg. Initial analysis of 10,000 of the samples identified approximately 6 million bacterial taxonomic units (genus or species level taxa), only a small fraction of which could be mapped to known phylogenies using 16S rRNA databases such as GreenGenes [16]. The frequency and distribution of these species can enable us to address interesting questions, for example, regarding the distribution of taxa across different soil ecosystems; the EMP datasets suggest that there is considerable overlap in taxa between sites, with organisms that are abundant at one location being extremely rare in another location, as previously demonstrated from marine sites [17].

A small number of concerns regarding the existing data have been raised by communities focusing on specific systems or taxa. For example, as with all studies using PCR, there are biases associated with the EMP PCR primers: they are not efficient at amplifying marine Pelagibacter ubique targets. As a result, new primers have been designed that should be more efficient in amplifying Pelagibacter, an important taxon in marine systems; however, we need to determine how efficient these new primers will be at amplifying all the other bacteria from other environments. As such, a study is underway to investigate whether rescuing Pelagibacter has deleterious consequences for other taxa or systems. However, because DNA extraction protocols themselves can have different biases depending on the environmental matrix from which the DNA is extracted [18], and PCR reagents can have contaminants that may influence amplification [19], the number of potential biases that could influence analysis is large and the key for cross-system analyses is consistent protocols. We are taking all sensible precautions to catalogue and determine potential biases: by recording all procedural and analytical variables it will be possible to determine which specific protocol elements may influence interpretation and whether the effects of these technical sources of variation limit our ability to identify important factors structuring microbial diversity.