Description of the DWW sample geochemistry

Descriptive statistics for geochemical parameters in the full DWW dataset are compiled in Supplementary Table S1. The results demonstrate that most geochemical parameters were not normally distributed, as shown by both the Shapiro-Wilk and Anderson-Darling tests. In general, Cu, Zn, Hg and MeHg varied by the greatest amount, by up to two orders of magnitude. High levels of these metals were observed in the DWW, with mean ± SEM (standard error of the mean) values of 376 ± 43 μM, 170 ± 19 μM, 21 ± 1.2 μM, 1.1 ± 0.2 μM and 9.2 × 10−4 ± 5.7 × 10−4 μM for Cu, Zn, Ag, tHg and MeHg, respectively. MeHg levels in DWW were highly variable ranging from 4.0 × 10−3 to 8.2 × 10−5 μM. Total Hg in the DWW was much less variable than MeHg, ranging from 0.12–6.1 μM. The highest MeHg concentration detected (4.0 × 10−3 μM) was in DWW16 and the lowest (8.2 × 10−5 μM) in DWW25 (Fig. 1a). MeHg to tHg ratios (MeHg/tHg) were also highly variable (Fig. 1a,b) spanning two orders of magnitude from 0.01–1.5%. MeHg/tHg versus tHg followed a log-log trend with a slope of −0.88±0.16 (R2 = 0.46, P = 1.8 × 10−6). In comparison, the regression of MeHg/tHg versus tHg in Schaefer et al., 200437, would suggest ratios of approximately 0.01% for the high levels of tHg observed in the present study; at the lowest end of what we observed (Fig. 1b). Based upon these ratios of MeHg to tHg, we observed that a cluster of samples had very high levels of MeHg while a second cluster had lower ratios (Fig. 1a). These 14 samples were the focus of more detailed analysis and the geochemical parameter data for this sample subset is summarized in Table 1.

Table 1 Geochemical characteristics of the 14 DWW sample subset selected for pyrosequencing and ARISA analysis. Full size table

Figure 1 Correlation of MeHg to tHg in DWW. (a) Shown are log-transformed tHg and MeHg (μM) concentrations. Fourteen DWW samples with the highest and lowest MeHg/tHg ratios were selected for 16S rRNA gene pyrosequencing (red squares with sample number). Dashed line represents a MeHg/tHg ratio of 1:1000. (b) The comparison of the regression of MeHg/tHg versus tHg (in Schaefer et al., 2004)37 illustrates the high levels of tHg observed in the present study. Full size image

F−, Br−, phosphate and nitrate levels were frequently observed below the detection limit (BDL). Interestingly, sulfate levels increased with decreasing MeHg levels (but the trend was not statistically significant). The sulfate levels were less variable compared to sulfide levels in all the samples and were positively correlated with each other (r = 0.55, P < 0.05; Table 2). No significant difference was observed among the groups (Supplementary Figure S1a–d). The pH was lowest in DWW14 (6.9) and highest in DWW11 (8.2). All samples with high MeHg had low pH (6.9–7.5) except DWW19 (7.8); whereas all samples with low MeHg levels had high pH levels >7.8 (Supplementary Figure S1e). In addition, significant correlations were observed between the pH and MeHg levels (r = -0.64, P < 0.01, Table 2, Fig. 2a,b), whereas no correlation was observed with tHg levels. Total Hg levels were correlated with MeHg/tHg ratios and neutral bioavailable Hg(HS) 2 species (P < 0.05, Table 2). As expected, the pH was correlated with neutral bioavailable Hg(HS) 2 species in DWW (r = 0.52, P < 0.05, Table 2). Clinic activity during the study period was also variable, ranging from 1–4 fillings and 2–9 surfaces on sampling days during clinic operations (Table 1). In general, Ag and Cu did not vary significantly with day of week or with number of fillings, while tHg varied with clinic usage (Supplementary Figure S2a–h). Both MeHg and tHg exhibited significant temporal differences both during the study and with the day of the week. MeHg levels were high in weeks 3 and 4 and tHg levels were high in weeks 6, 7 and 8 (Fig. 3). The levels of MeHg were high on Tuesdays and Thursdays of the week and tHg levels were high on Mondays and Tuesdays of the week (Supplementary Figure S2c). Surprisingly, MeHg was not correlated with clinic usage (Supplementary Figure S2g), while tHg was strongly correlated with the clinic usage (Supplementary Figure S2d,h). As expected, tHg was highly correlated (P < 0.001) with the number of fillings for all days of the week.

Table 2 Pearson correlation matrix for geochemical characteristics of the DWW samples by linear regression analyses. Full size table

Figure 2 The correlations between pH and MeHg (a), pH and tHg (b). Red and green dots represent high and low MeHg samples, respectively. Linear regressions were used to test Pearson’s correlation between DWW samples with high and low mercury levels and pH. Full size image

Figure 3 Relationship between proteobacterial diversity and tHg and MeHg over time in DWW samples (DWW1-DWW40) collected from the 12-chair dental clinic. Primary vertical axis reflects the variations in relative abundance of Proteobacteria diversity and the secondary vertical axis shows the variation in tHg and MeHg concentration during the 8 week sample collection period. (•) Represents samples further selected for pyrosequencing. MeHg at DWW25 (8.1E-5 μ M ) is not shown for clarity. Full size image

Interestingly, equilibrium speciation modeling data suggests that anionic HgHS 2 − and HgS 2 2− complexes (non-bioavailable forms) were dominant in samples with high MeHg levels compared to samples with low MeHg levels (Fig. 4a) and thus Hg would appear to be less bioavailable to bacteria in high MeHg DWW samples. However, when we look at the actual predicted concentration of the neutral bioavailable Hg(HS) 2 species in these samples (Fig. 4b), a clear demarcation is observed where all high MeHg samples have Hg(HS) 2 concentrations near or above 0.1 μ M , while all low MeHg samples have concentrations at or below 0.01 μ M . This strongly suggests that it is the actual concentration of the neutral species that is important, not just the fraction of the total. Although studies have shown positive correlations between DOC and Hg methylation under sulfidic conditions (possibly suggesting DOC-mediated abiotic methylation)22, we did not observe such a correlation in DWW.

Figure 4 Speciation modelling in DWW samples. (a) Fraction of total Hg species and (b) Hg species as predicted by equilibrium Hg geochemical speciation modeling in DWW samples. Full size image

General sequencing statistics and DWW bacterial community composition

We performed Tag-encoded FLX amplicon pyrosequencing screens on the 14 DWW subset samples to characterize the bacterial community composition and phylogenetic diversity in DWW. For the 454 pyrosequencing, 14 DWW samples yielded a total of 110,292 sequencing reads. Following the quality trimming protocol, 85,000 high quality reads (>300 bp) were RDP-classified as bacteria, four reads classified as Archaea and 11 reads as unclassified roots. A total of 13,040 reads could not be identified and remained as unclassified bacteria. An additional 23,000 sequences (21%, <300 bp) were short in length and were removed before further analysis. Employing deep sequencing of 16S rRNA gene amplicons, we estimated the bacterial diversity of the DWW microbiome.

The total DWW bacterial community contained 24 phyla, unclassified bacteria and Archaea from the sequencing data. Sequences which showed less than 80% homology to the unclassified bacteria constituted 26% of the microbial community. The obtained taxonomy data demonstrates that DWW is remarkably biodiverse; covering a broad spectrum of known and unknown bacterial taxa (Table 3). The bacterial community consisted of Proteobacteria (21 to 98%), unclassified Bacteria (0.4 to 96%), Firmicutes (0.1 to 74%), Cyanobacteria (0.02 to 15%), Bacteroidetes (0.2 to 12%), Chloroflexi (0.3 to 7%), Fusobacteria (0.2 to 6%), Actinobacteria (0.2 to 3%), Synergistetes (0.04 to 2%) and Acidobacteria (0.02 to 1%). The remaining 16 phyla were grouped as other phyla, which were detected in lower abundance (<1%) in 10 out of 14 DWW samples. A total of 170 bacterial taxa, 58 unclassified families and unclassified bacteria were identified in 14 DWW samples. The dominant phyla in all samples belonged to Proteobacteria and were detected in all 14 DWW samples studied. They were apportioned as Gammaproteobacteria (1.5 to 98%, detected in all 14 samples), Deltaproteobacteria (0.1 to 39%, detected in 12 out of 14 samples), Betaproteobacteria (0.6 to 38%, detected in 13 out of 14 samples), Alphaproteobacteria (0.1 to 27% detected in 13 out of 14 samples) and Epsilonproteobacteria (1.8 to 7.7%) was the least common of the Proteobacteria, found in only six out of 14 samples. A total of 2,300 reads could not be classified below Proteobacterial class level and remained as unclassified Proteobacteria accounting for ~4% of the total Proteobacteria.

Table 3 Most abundant bacterial families in the 14 DWW sample subset. Full size table

Comparing the eight most abundant bacterial families (Table 3) and dominant predicted methylators in each sample (Table 4,33) reveals large taxonomic differences between the DWW samples. The families Rhodocyclaceae (10–35%), Xanthomonadaceae (30–90%) and the SRB (Desulfovibrionaceae and Desulfobulbaceae) (5–25%) were highly abundant in high MeHg samples and either absent or less common in low MeHg DWW samples (Table 3). Given the very high levels of potentially bacterio-toxic metals like tHg (>6 μ M ), Cu (>350 μ M ), Zn (>150 μ M ) and Ag (>20 μ M ) in DWW (Supplementary Table S1), these results suggest that metal-resistance is phylogenetically-broadly distributed among DWW bacterial communities.

Table 4 Predicted mercury methylators identified in DWW samples (this study). Full size table

Proteobacteria diversity

Alphaproteobacterial diversity was lowest in all DWW compared to other classes of Proteobacteria. DWW29 had the highest (27%) Proteobacteria diversity compared to other DWW samples. Alphaproteobacteria were detected in all DWW samples except DWW39. A total of 11 families with unclassified Alphaproteobacteria and 28 genera were detected in all DWW samples. Betaproteobacteria were dominant in DWW11 and DWW19 (28 to 38%) and were detected in all 14 DWW samples except DWW7. A total of eight Betaproteobacterial families were identified with 31 genera and unclassified Betaproteobacteria. In all DWW samples, Betaproteobacteria and Alphaproteobacteria were found to be more evenly distributed among samples. Gammaproteobacteria were detected in all DWW samples as the dominant class of bacteria (1.5 to 98%) in the total bacterial community. They were abundant in DWW7 (98% of total Bacteria), DWW25 (95% of total Bacteria) and DWW40 (92% of total Bacteria). Deltaproteobacteria were the second most dominant class of Proteobacteria in the DWW. These results are consistent with the hypothesis that SRB are important methylators in DWW, as virtually all known Hg methylators detected are in the Deltaproteobacteria (Table 4). DWW15 had the largest proportion of Deltaproteobacteria (39% of the total Bacteria), followed by DWW26 (35% of total Bacteria). DWW14 and DWW11 had approximately equal proportions (~22%) of total Bacteria. The samples with high MeHg levels had higher abundances of Deltaproteobacteria (12 to 42%) and Epsilonproteobacteria (5 to 22%), compared to the samples with low MeHg levels (except DWW11 which had higher abundance of both groups). No Deltaproteobacterial reads were detected in DWW7 and DWW39 (the former was classified as a high MeHg sample). A total of 15 Deltaproteobacteria families were identified representing 26 genera in the DWW samples: Desulfovibrionaceae, Desulfobulbaceae, Syntrophobacteraceae and Desulfobacteraceae. The dominant genera identified are Desulfovibrio, Desulfobulbus, Desulforhabdus, unclassified Desulfovibrionaceae, unclassified Desulfobacteraceae, Desulfomonile, Geobacter, Pelobacter and Syntrophus. In DWW16, (the sample with the highest MeHg levels) Desulfovibrio sp. were identified as the dominant SRB species, which include many known Hg methylators (Table 4). Overall, the abundances of Epsilonproteobacteria were low (2 to 8%) and these bacteria were detected in 6 out of 14 DWW samples (DWW11, DWW19, DWW14, DWW15, DWW26 and DWW25, in order of high to low Epsilonproteobacteria sequence reads). Genera detected in these samples include Sulfurospirillum, Campylobacter, Sulfuricurvum and Sulfurimonas. Details of the other important bacterial groups identified in 14 DWW samples are given in the supplementary data. Correlations between MeHg, tHg geochemical characteristics and microbial diversity at different taxonomic level were also studied. Significant Pearson correlations among environmental variables and microbial taxa at different levels are shown in Table 5.

Table 5 Correlations between the relative abundances of the microbial diversity at dominant bacterial taxa and DWW characteristics by linear regression analyses. Full size table

Diversity patterns in the DWW bacterial communities are linked with water geochemistry

The microbial community composition in the 14 DWW sample subset was evaluated using Cluster analysis (CA) and Principal Coordinate Analysis (PCoA). Hierarchical cluster analysis based on the Bray-Curtis (BC) similarity matrices for the bacterial community composition at the phylum and class level revealed sample specific differences (Fig. 5a–c). Cluster analysis at the class (Fig. 5a) and phylum level (Fig. 5b) showed conservation of community composition in DWW14, DWW19, DWW15, DWW26, DWW11 and DWW16. These samples were similar to each other and clustered separately from DWW33 and three other clusters. These samples had the highest proportions of Betaproteobacteria and Deltaproteobacteria groups. In contrast, DWW33 had the highest percentage of 16S rRNA gene sequences not clustered into any known taxonomic group and was dominated by unclassified bacteria (>95% of total bacteria). DWW7, DWW40, DWW25 and DWW10 were similar in community composition and clustered separately from DWW39, DWW24 and DWW29. DWW7, DWW40 and DWW25 had the largest proportions of Gammaproteobacteria (92 to 98%) and the lowest proportions of unclassified bacteria (0.01 to 0.4%). DWW39 (a Wednesday sample) had the largest proportion of Cyanobacteria, which were not detected in DWW7, DWW40, DWW25 and DWW10. DWW39, DWW29 and DWW24 had high proportions of Firmicutes and other bacterial groups. It is not known how cyanobacteria (a photosynthetic microorganism) were present in the dark environment of the DWW sewer. This was a mid-week sample and may have included DWW that was recently produced from the clinic. Interestingly, Alphaproteobacteria, Betaproteobacteria, Deltaproteobacteria and Epsilonproteobacteria clustered together in high MeHg samples, whereas Gammaproteobacteria were clustered together in low MeHg samples. Together, these results demonstrate that there is distinct clustering of the microbial communities between those samples with high MeHg and those with low MeHg.

Figure 5 Comparison of bacterial diversity between the different DWW samples. (a) Stacked column graph representing relative percent abundance of classes within the phylum Proteobacteria. (b) Stacked column graph representing the relative percent distribution of the dominant phyla in the 14 DWW samples. Values in parentheses corresponds to the number of samples (out of 14) in which that particular phyla or class were detected. (c) Cluster analysis of bacterial diversity at the phylum level using the Bray-Curtis matrix of similarity. Low abundance phyla are grouped together as “other phyla” and their cumulative abundances are listed. Full size image

CA and PCoA analysis further revealed that the bacterial composition was similar at the phylum level (Fig. 6a,b) but slightly changed at the class (Fig. 6c,d) and genus level (Fig. 6e,f). The PCoA analysis (Fig. 6b,d,f) results showed consistent clustering at the class and genus level, as was observed in the cluster analysis (Fig. 6a,c,e). DWW11 and DWW25 did not follow this trend and grouped together in PCoA analysis with high MeHg samples (Fig. 6d–f). Similarly, DWW24 grouped with low MeHg samples even though it had higher MeHg. These clustering patterns demonstrate agreement between both the CA and PCoA distance metric results.

Figure 6 Cluster and principle coordinate analysis of 14 DWW samples. Cluster analysis are shown at phylum level (a), at the class level (c) and at the genus level (e). The analysis is conducted using the unweighted pair group mean averages (UPGMA) based on the Bray-Curtis distance. Dotted lines show the similarity cutoff level to cluster and group 14 DWW samples. The principle coordinate analysis are shown at phylum level (b), at the class level (d) and at the genus level (f). Filled red circles represent samples with high MeHg and green circles represent samples with high tHg and low MeHg levels. Note different x axis scales. Full size image

Canonical correspondence analysis (CCA)

CCA was performed to identify major environmental parameters linking to DWW microbial community at different taxonomic levels. According to the CCA, the effects of individual environmental factors on DWW microbiome varied across the samples. MeHg, tHg, Hg(HS) 2 , pH and sulfate levels were the most important parameters shaping the overall microbial community. The first two axes explained 95% of the taxonomic information at phylum level (Fig. 7a). The length of the arrow of environmental parameter in the ordination plot indicates the strength of the relationship of the parameter to the DWW community composition. CCA in phylum-level diversity indicated that the phylum Cyanobacteria was tightly associated with the tHg, the phylum Actinobacteria was associated with the Hg(HS) 2 and pH and Proteobacteria was correlated with MeHg levels (Fig. 7a). Classes of Beta and Epsilonproteobacteria were associated with the sulfate levels, Alphaproteobacteria with pH- and Gammaproteobacteria were correlated with tHg levels. Deltaproteobacteria were associated with both the MeHg and Hg(HS) 2 levels in DWW. The first two axes explained 92% of the taxonomic information at Proteobacteria class level (Fig. 7b). CCA in genus level diversity of mercury methylating Deltaproteobacteria group indicated that Desulfobulbus and Desulfuromonas were correlated with pH, Geobacter and Desulfovibrio were associated with tHg, Hg(HS) 2 and sulfate levels and unclassified Deltaproteobacteria with MeHg levels (Fig. 7c).

Figure 7 Canonical correspondence analysis plots. First two axes of CCA1 and CCA2 represent the relationships between environmental variables and bacterial diversity at (a) phylum level, (b) class level for Proteobacteria taxonomic class, (c) genus level (mercury methylating Deltaproteobacteria groups). Symbols indicate phyla/classes/genus (blue dots), high MeHg samples (red dots), low MeHg samples (green dots) and environmental variables (black arrows). Arrows indicate the direction and magnitude of measurable variables associated with bacterial community structures. The percentage of variation explained by each axis is shown in axis label. Full size image

Mantel test analysis also revealed significant correlations between microbial community structure and different environmental factors. Different parameters such as number of OTUs, (r = 0.38, P = 0.06) were correlated with Bray-Curtis dissimilarity matrix for Proteobacteria class. Hg methylating Deltaproteobacteria members Desulfobulbus (MeHg/pH/tHg, r = 0.36, P = 0.05), unclassified Deltaproteobacteria (MeHg, r = 0.53, P = 0.1) and Geobacter species (pH, r = 0.31, P = 0.04; tHg, r = 0.34, P = 0.06) were highly correlated with pH, MeHg and tHg. Overall, the consensus for different parameters revealed that pH, MeHg, tHg and sulfate were the major factor affecting microbial taxon abundance in different DWW samples (Supplementary Table S2). Partial least squares-discriminant analysis (PLS-DA) also illustrated a distinct microbial diversity segregation at the genus level for the DWW microbiome primarily related to geochemical factors and MeHg conditions (Fig. 8).

Figure 8 Microbial diversity at genus level reveals hierarchical partitioning of high and low MeHg DWW samples. Bacterial communities were clustered using partial least squares-discriminant analysis (PLS-DA). Red circles represent samples with high methyl Hg levels and green circles represent samples with low methyl Hg. Full size image

Similarity Percentage (SIMPER) analysis

Since a difference in the bacterial community was observed between the high and low MeHg group, SIMPER analysis was performed to show which taxa contributed to this difference the most. The SIMPER results for main taxa are shown in Table 6. The taxa that contributed the most of the differences at phylum level were Proteobacteria (16.7% avg. dissimilarity), unclassified bacteria (10.8% avg. dissimilarity) Firmicutes (8.6% avg. dissimilarity) and Cyanobacteria (2% avg. dissimilarity). In particular, Proteobacteria and Firmicutes were highly abundant in the high MeHg samples, whereas unclassified bacteria and Cyanobacteria were abundant in low MeHg samples. Together, these taxa accounted for 41.6% of the difference. Rest of the four phyla accounted for 0.5 to 1% of the difference (Table 6). The high contribution of Proteobacteria is of particular interest, because it was the dominant taxa in all the DWW samples, SIMPER analysis was also performed at Proteobacteria class level.

Table 6 SIMPER analysis identifies top abundant taxa that contributed most of the dissimilarities (≥1%) between the microbial communities from high and low MeHg samples. Full size table

Gammaproteobacteria (19.2% avg. dissimilarity) and Deltaproteobacteria (10.7% avg. dissimilarity) contributed the most of the difference, followed by Beta- (8.1%) Alpha- (4.9%) and Epsilonproteobacteria (1.8%). Overall, 42.7% difference was accounted by these classes. Gamma and Betaproteobacteria were abundant in both the groups, whereas Deltaproteobacteria were dominant in high and Alphaproteobacteria were dominant in low MeHg group. These results revealed that the abundance of Proteobacteria class had a high influence on dissimilarities observed among the groups. Linear regressions also showed a high correlation with number of OTUs detected for Gammaproteobacteria (r = 0.71, P < 0.001, Fig. 9a) and Deltaproteobacteria (r = 0.75, P < 0.001, Fig. 9b), in high and low MeHg samples.

Figure 9 Linear regressions were used to test Pearson’s correlation between number of OTUs detected for Gamma- and Deltaproteobacteria in high and low MeHg DWW samples. (a) The correlations between number of OTUs and relative abundance of Gammaproteobacteria. (b) OTUs and relative abundance of Deltaproteobacteria in DWW. Pearson correlation coefficient and P values are shown. Red and green dots represent high and low MeHg samples, respectively. Full size image

Fold change analysis

Nearly half (46%) of the total bacterial taxa exhibits a fold change (FC) greater than 1.5 (Fig. 10a). These results clearly illustrate that the high levels of MeHg, tHg and other heavy metals are primarily responsible for changes in relative community abundance. More specifically, in the high MeHg group, important mercury methylating bacteria (Desulfobulbus, Desulfovibrio, Desulfobacter, Geobacter, Syntrophobacter and Geoalkalibacter) significantly increased by 2.6–8.7% units (1.8–3.1 fold change) in Fig. 10b. Figure 10c, represents the box and whisker plots of bacterial taxa abundance change in known Hg methylating bacteria. Clear and highly statistically significant differences are observed between the levels of each class of known methylators in high and low MeHg samples. The fold change values were calculated as the ratio between high and low MeHg group means using normalized data (Supplementary Figure S2).

Figure 10 Fold change (FC) analysis between high and low MeHg group means. (a) FC analysis of the bacterial taxa abundance between the high and low MeHg samples (the effect size and direction of the correlation is presented by the FC value and color). (b) The FC value of the significant correlations for important mercury methylating groups. (c) The FC values were calculated as ratio between high and low MeHg group means using normalized data. Shown are box and whisker plots of bacterial taxa abundance change in important Hg methylating bacterial taxa. Full size image

Mercury resistant bacterial taxa

In DWW we detected the presence of nine mercury resistant bacterial genus that have been reported to tolerate and resist high mercury levels, representing the following genera - Stenotrophomonas, Staphylococcus, Pseudomonas, Tolumonas, Escherichia, Aeromonas, Bacillus, Burkholderia and Streptococcus. In all DWW samples Stenotrophomonas were abundant (25% of total). Staphylococcus were more represented in high MeHg group compared to low. Pseudomonas and Escherichia were slightly higher in low MeHg samples. The percent abundance of these bacterial species was variable among the groups (Table 7a). We further performed the Pearson correlation and found Stenotrophomonas, Staphylococcus, Pseudomonas, Tolumonas, Escherichia, Aeromonas, Bacillus, Burkholderia were correlated with different geo-chemical parameters such as, pH, tHg, MeHg/tHg, sulfate and sulfide levels (Table 7b).

Table 7 Full size table

There was no statistically significant correlation observed (student’s t-test, P = 0.8) among the mercury resistant bacteria (genus level) in high and low MeHg samples. Also no significant correlation (r = −0.3, P = 0.4) was observed among the tHg and total abundance of mercury resistant bacteria in DWW (Supplementary Table S2). However, Mantel test showed a weak association in Bray-Curtis matrix of dissimilarity in mercury resistant taxa and Euclidean distance matrix of environmental variables. Stenotrophomonas (pH/tHg/MeHg, r = 0.42, P = 0.04) and Pseudomonas (MeHg, r = 0.15, P = 0.02) were significantly correlated with some of the geochemical parameters (Supplementary Table S2).

DWW microbial Community richness and diversity

We grouped 16S rRNA gene sequences with their nearest neighbours as shown by BLASTn and RDP searches. Clone clusters are comprised of one or more operational taxonomic units (OTUs)/phylotypes and sequences with more than 97% similarity were considered to be the same OTU and were clustered together to calculate rarefaction and nonparametric estimators. The frequencies of the OTUs obtained are shown (Table 8). Diversity index quantifies the diversity in a community and describes its numerical structure. The analysis demonstrates that bacterial diversity in the 14 DWW samples have been sufficiently covered by pyrosequencing (Table 8). A total of 2,546 OTUs were obtained from 14 DWW samples. Clustering the unique sequences into OTUs resulted in 24 to 808 different species-level OTUs per DWW sample. Diversity indices calculated for each of 14 DWW samples ranged from 1.0 to 4.8 (Shannon index) and from 34 to 631.5 (Chao1 index). The highest bacterial richness and diversity were found in DWW11 and DWW26 and were lowest in DWW24 and DWW7. The Shannon diversity index values and Chao1 richness estimates for the DWW11 were 4.8 and 631 and 1.0 and 37 for DWW7 respectively. Together, the number of OTUs, Chao 1, ACE and Shannon indices indicate that the species richness varied by 2–3 times among the different DWW samples (Supplementary Figure S4a–c). The OTU abundances and Shannon index based on both ARISA and pyrosequencing had a similar trend in the 14 DWW samples (Supplementary Figure S5a).

Table 8 Richness and diversity measures in the 14 DWW sample subset including number of OTUs, observed and estimated species richness. Full size table

Species evenness (E) for the DWW samples ranged from 0.3 to 0.7 (Table 8). Species evenness was highest in DWW19 and DWW26 (0.74 and 0.73 respectively) and lowest in DWW24 and DWW7 (0.31 and 0.32 respectively). The sample coverage using Good’s method was 88 to 99%. In high MeHg samples, OTUs and Shannon indices were high (24 to 435 and 1 to 4.5, respectively), whereas in low MeHg samples they were low (27–112 and 1 to 3 respectively). The lone exception to this trend was DWW11 (808 and 4.8, respectively). These values indicate that the diversity and evenness are high in DWW samples, in contrast to our expectations that the DWW would be dominated by SRB methylators and a few metal resistant bacterial species. Rarefaction analysis of the bacterial diversity in all DWW samples revealed four curves that did level off but not all had fully reached an asymptote (Supplementary Figure S5b). Three of these four were high MeHg containing samples (DWW 14, 15 and 26). This indicates that additional species diversity may be present in these samples, consistent with the quantitative diversity estimator analysis indicating that high MeHg samples were more biodiverse.

Phylogenetic analysis

Bacterial species known to produce MeHg to date are in the class Deltaproteobacteria, but only about half of the species tested so far have the ability to produce MeHg. We constructed a 16S rRNA gene-based phylogeny of the class Deltaproteobacteria to explore the potential Hg methylating bacteria within DWW (Fig. 11). Our alignment included the 79 representative OTUs identified within class Deltaproteobacteria from DWW and 19 reference strains for which Hg methylation has been tested and for which sequences were available in the literature (strain D. desulfuricans LS could not be included in analysis). Among the Deltaproteobacteria, Desulfovibrio is a large and diverse genus; and thus all of the Desulfovibrio type strains could not be included in the tree.

Figure 11 16S rRNA gene phylogeny for the predicted methylators identified in DWW samples. Type strains are indicated by – ‘T’. Desulfobacter postgatei was used as a reference and outgroup taxon. Strains reported to produce MeHg are shown with a filled red circle , those unable to do so are shown with an open blue circle and strains with a filled green circle are predicted methylators (from this study). The evolutionary history was inferred using the neighbor-joining method and the evolutionary analyses were conducted in MEGA. Full size image

The analysis yielded well-supported deep branching main clades and a number of less well-supported smaller groups. Most of the well-studied Hg methylating Desulfovibrio desulfuricans strains, including D. desulfuricans strains Essex 6, MB, estuarii and G20, fall into this group. Most of the strains tested for methylation to date fall in this group. This clade includes another closely related group of strong Hg methylators, strain ND132 and Desulfovibrio strain BerOc1. However, the group is also closely related to D. desulfuricans strain E1 Agheila (Accession number: M37316), which does not have the ability to produce MeHg. Weakly related to this group is D. africanus; both the type strain and strain ADR13, are Hg methylators. They are both weakly related to D. vulgaris type strain, (a non Hg methylating strain) and Hg methylating strain T2 from Chesapeake Bay. The strain T2 clusters closely together with eight representative Desulfovibrio species identified within DWW. This group of species with the ability to utilize a broader set of substrates clusters loosely together. This group contains Hg methylating strain X2 which clusters closely together with many Deltaproteobacteria species identified in DWW. To date, no other species in this group have been tested for Hg methylation potential. Desulfuromonas palmitatis type strain SDBY1 and Desulfobacter curvatus strain DSM3379 are both known Hg methylating species and they clustered closely together with other Desulfuromonas and Desulfobacter species identified in DWW. Desulfococcus multivorans DSM2059 (a known Hg methylating strain) clustered closely together with other Desulfococcus and related species identified in DWW. Interestingly, this group is not closely related to another Hg methylating strain Geobacter sulfurreducens PCA, which clusters closely with many Geobacter spp. identified in DWW. Desulfobulbus propionicus DSM2032 Hg methylating strain is clustered closely with other Desulfobulbus species identified in DWW.

Recent studies have shown that most of the commonly studied Desulfovibrio strains align into a clade with few Hg methylators, while most of the Hg methylating strains fall into the halophiles group20. The finished genomes for Desulfovibrio strain ND132, Desulfovibrio africanus strain Walvis Bay and Desulfovibrio africanus strain PCS are the first complete genomes for the strains that generate MeHg38,39,40. This will aid in to our studies with comparison of full genomes of known Hg methylators identified in DWW and comparative metagenomics of Hg contaminated ecosystems.