Physico-chemical characteristics

The locations where the samples were taken are shown in Supplementary Figure S1. Some physico-chemical and biological properties of the samples are described in Supplementary Table S1. Salinity, as represented by electrical conductivity, is much higher in Mar Menor than in the Mediterranean Sea. Albufera waters, however, appeared as highly mineralized freshwaters, showing a certain influence of the sea, with values of 2.8 mS cm−1, as compared to freshwaters from the area, which commonly show conductivities of around 1 mS cm−1. Even though Albufera is well separated from the sea, open inlets controlled by hydraulic gates sometimes allow some connection and aquifers providing water to the lagoon are slightly influenced by marine waters. This demonstrates that we have chosen the two sides of the main environmental condition in determining the ecology of coastal lakes, this is, salinity, with both a highly saline and a freshwater lagoon. Similar to well mineralized waters, both samples were mildly alkaline (pH was 8.4 for Mar Menor and 7.69 for Albufera), but alkalinity (and bicarbonate concentrations) in Albufera was lower compared to surrounding freshwater systems. The lower alkalinity in Albufera is mainly due to the high rates of planktonic primary production of such hypertrophic system that uses large amounts of inorganic carbon, thus decreasing the alkaline reserve mainly formed by bicarbonate. Saline content of Albufera, though much lower than that of Mar Menor, is quite balanced in anions between bicarbonate, chloride and sulphate, whereas that of Mar Menor is much higher and mostly due to chloride. These data indicate the differences in relative importance of continental and marine inputs in these two systems.

Total nitrogen (TN) and total phosphorus (TP) concentrations, taken together with chlorophyll concentrations, better reflect the extent and effects of eutrophication on both lagoons, as they show the amount of nutrients that are incorporated into biomass, mainly phytoplankton, in the form of particulate nutrients. Both TN and TP were around two and a half times higher in Albufera than in Mar Menor, showing that, in addition to salinity, these two systems also maintain a large difference in another quite important environmental feature, namely, the trophic status.

Chlorophyll-a concentration reveals even higher differences than TN and TP, as the chlorophyll levels of Mar Menor (3.94 µg/l) are actually very similar to that of the DCM of the Mediterranean7 (3.4 µg/l), while Albufera displayed levels corresponding to extremely hypertrophic conditions (271.31 µg/l). Following OECD criteria47, all these values categorized Albufera as a hypertrophic system, whereas those of Mar Menor correspond to a mesotrophic system but with a strong trend towards eutrophication as indicated by concentrations of soluble nutrients. Remarkably, when considering both nitrogen and phosphorus, these nutrients are mostly included within the particulate fraction in the Albufera, with comparatively low amounts in the soluble forms of phosphorus (soluble reactive phosphorus, mainly orthophosphate) and, even lower, of nitrogen (ammonia), compared to overall amounts that are mainly owed to the biomass of phytoplankton, as shown by Chl-a concentrations. Because of its long residence time during most of the year, Albufera acts as a bioreactor that converts most of the incoming nutrients in phytoplankton biomass, most of which is later retained in the sediments and represents a strong internal load that further supports hypertrophic conditions. Moreover, most of this phytoplankton biomass is composed by cyanobacteria, as shown by the dominance of taxa-specific carotenoids (Supplementary Table S1) from these phytoplanktonic organisms, such as zeaxanthin, as was further confirmed by microscopic and molecular analyses. Contrastingly, most nitrogen and phosphorus in Mar Menor was detected as soluble forms, with relatively low levels of phytoplankton biomass that are still comparable to productive areas of the sea, such as the DCM, but much poorer compared to Albufera. Ammonium is, in contrast to Albufera, the main form of nitrogen in the waters of Mar Menor. The very high planktonic biomass in Albufera quickly assimilates available nutrients, especially those which are limiting and ammonium is the preferred form of nitrogen to be assimilated by organisms as it has the same redox status than organic nitrogen. In Mar Menor, however, the high availability of soluble (biologically available) forms of nitrogen and phosphorus compared to the low chlorophyll levels indicates the occurrence of recent peaks of nutrient inputs into this lagoon, occurring briefly before the sampling, that have not yet had the time to be converted into biomass. Massive occasional nutrient inputs are a common feature of this lagoon and are associated to time-restricted discharges of wastewaters48 or increased agricultural runoff linked to heavy rains. These inputs commonly cause algal blooms that are associated with such nutrient dynamics4. Recent modelling estimated that, only accounting from agriculture sources associated to irrigations procedures, more than 2000 tonnes of nitrogen and around 60 tonnes of phosphorus enter per year in the Mar Menor, which, together with other sources, such as urban wastewaters, explain the high levels of soluble nitrogen found in this lagoon. Additionally to this modeling, previous empirical evidence of the high amounts of nutrients received by Mar Menor was given by Velasco et al49, who during a hydrological cycle measured nutrient inputs as high as 2010 tones of inorganic nitrogen and 178 tonnes of soluble reactive (biologically available) phosphorus in a year. Thus, our measurements of dissolved inorganic nitrogen, even if chlorophyll concentrations are not so high, reveal a relatively high (mesotrophic to eutrophic) trophic status of Mar Menor compared to the coastal waters of the nearby Mediterranean Sea, though much lower than that of Albufera, where nutrients are likely quickly bioconverted into phytoplankton biomass.

Phytoplankton diversity and abundance

In contrast to the very different abundance of phytoplankton (quantified as Chl-a concentration), both systems showed similar densities of heterotrophic bacterioplankton (in the range of 4–5 106 cells per ml), higher than those commonly found in surface waters of the Mediterranean Sea50,51. However the abundance of phototrophic picoplankton, mainly unicellular Synechococcus-like cyanobacterial cells, was almost twenty times higher in La Albufera than in Mar Menor. These autotrophic picoplankton (APP) cells are similar to those of surface waters, phycocyanin-rich cells mostly lacking phycoerythrin52. However, although APP abundance is much higher in Albufera, they represented up to 9.4 % of phytoplankton biomass (biovolume) in Mar Menor. This contribution was 3.3 % in Albufera, where filamentous cyanobacteria, diatoms and chlorophytes accounted for most of the biomass (Supplementary Figure S2). The relatively high diversity of phytoplankton in Albufera (Figure 1, Supplementary Figure S3) revealed by our sampling is a relative novelty in this lake within the last years associated with sewage diversion53 compared to the previous decades, when filamentous cyanobacteria, like Planktothrix agardhii, Pseudanabaena galeata and Geitlerinema sp. widely dominated the community54. This relatively high diversity related to increased relevance of chlorophytes and diatoms compared to cyanobacteria is also shown by taxa-specific pigments. In addition to the high concentrations of the cyanobacterial-specific carotenoid zeaxanthin, high concentrations of the diatom-marker carotenoid fucoxanthin were also found (Supplementary Table S1). The high contribution of chlorophytes in terms of total phytoplankton biomass, mostly due to the presence of very big colonial species of Pediastrum (P. boryanum and P. duplex), which at the time of sampling accounted for 46.6% of total phytoplanktonic biovolume (Figure 1; Supplementary Figure S2) but only for 1.3 % of phytoplankton individuals, is likely the reason that chlorophyte-specific carotenoids are not so abundant. Sewage diversion, together with increased flushing during some periods associated to rice cultivation, sometimes promotes clear water phases in late winter and spring, as it occurred in 2010, when sampling was performed and the more evident clear water phase has been reported for the last four decades. Contrastingly, Dinoflagellates dominated by far phytoplankton in Mar Menor, both in terms of total phytoplankton biomass and number of cells (excluding APP for the later count), with also relevant contributions of diatoms and unicellular picocyanobacteria (Figure 1; Supplementary Figure S2). These are also reflected in the abundance of the taxa-specific carotenoids (Supplementary Table S2), which, although at much lower concentrations than those of Albufera, also shows the relative importance of the dominant phytoplankton groups. Neither Albufera nor Mar Menor hold planktonic anoxygenic phototrophic bacteria, as revealed by the absence of bacteriochlorophylls.

Figure 1 Pairs of microphotographs, DAPI stain (blue, up) and photosynthetic pigment autofluorescence (red, down) of samples from Albufera (A and B) and Mar Menor (C and D) showing different microorganisms. A) a colony of unicellular picocyanobacteria B) several filamentous cyanobacteria and coenobia of the chlorophytes Pediastrum sp. and Scenedesmus sp. C) Different morphologies of heterotrophic bacterioplankton (cells not showing red autofluorescence in lower pictures) and autotrophic picocyanobacteria (cells showing red autofluorescence in lower pictures). D) Heterotrophic bacterioplankton and autotrophic picocyanobacteria with a eukaryotic nanoflagellate. White bar corresponds to 10 μm in all pictures. Full size image

GC Content

We obtained nearly equal amount of sequence data from each one three different filter sizes for each dataset (0.1, 0.8 and 3.0 µm, See Supplementary Table S2). The sequence data from the three filters of Mar Menor shows some differences in GC content (Supplementary Figure S4), with the 3 µm filter showing a high GC peak, likely because of the increased number of eukaryotic sequences captured in this filter. A comparison of the GC content of the two smaller filter sizes (0.1 and 0.8µm) with other available marine and hypersaline metagenomes (DCM, PC6 and SS19) is shown in Supplementary Figure S4. The Mar Menor metagenome shows a single distinct peak at ~50%, similar to the marine metagenome in being unimodal, but of very different GC% and a broader GC range and also distinct from the other hypersaline datasets which have clear bimodal GC distributions (both PC6 and SS19). All the hypersaline metagenomes do have at least a single peak at around 50% GC. The figure indicates that across a range of salinities, (from 3.5% to 19%) a diverse range of GC content may be found. Moreover, Mar Menor GC distribution appears to be quite different from PC6, although both habitats have nearly identical salinity (however, as no other physical-chemical data is available for PC6 dataset apart from salinity so the factors relating to these differences cannot be adequately discussed). There does not appear to be an abundance of very high GC organisms (~70% GC as in PC6) in Mar Menor (Supplementary Figure S4). On the other hand, sequences from all three filters of Albufera tended to show a GC profile skewed towards high GC content (Supplementary Figure S4). Comparison to three other freshwater datasets (Lake Gatun in Panama, Lake Lanier in Atlanta,US46 and the River Amazon24) (Supplementary Figure S4), does not show any kind of clear pattern, apart from a low GC peak (~45–50%) in all datasets except Albufera. So in this initial examination, the GC% profiles of both Mar Menor and Albufera appear quite different from other metagenomic datasets and this already is an indication of the different communities in these ecosystems compared to other related available datasets.

Community Structure

Among prokaryotes, the results of classification of the 16S rRNA sequences and all reads comparison to the NR database indicated almost exclusively the presence of Bacteria (Supplementary Figures S5, S6 and S7; Supplementary Tables S3, S4, S5 and S6). No archaeal 16S rRNA reads were detected in the Mar Menor dataset and an extremely low number (<1%, n = 322) of all metagenomic reads could be assigned to Archaea in Albufera. This extremely low fraction of reads from Albufera was assigned primarily to Euryarchaeota. This is indeed a little unusual, as Archaea are usually at least minor components of most systems (with exceptions, e.g. solar saltern crystallizer ponds), typically in the range of 5–10%24 but in Mar Menor we have barely detectable levels of archaeal sequences.

Phages

Among the most abundant organisms recruiting the maximum number of reads from the Mar Menor metagenome was a viral genome, that of Roseobacter phage SIO1. (see Supplementary Table S3). Roseophages are lytic podoviruses of Roseobacters, first isolated for Roseobacter SIO67, an aerobic, heterotrophic alphaproteobacterium. The currently sequenced Roseophages have been isolated from California near-shore locations. Comparative genome analysis of Roseophages has revealed largely conserved genomes, with three distinct pockets of variability (thyX gene, phosphage metabolism genes and structural genes like the tail-fiber protein). However, our sequence data indicates the presence of a population of organisms belonging to the order Rhodobacterales (see 16S rRNA section above). The average %identity of the metagenomic hits mapping to the Roseobacter genome was ~40%, i.e. rather low, so the dominant phage might be an abundant podovirus, similar to Roseophages, but its host specificity is as yet uncertain, as the host itself is as yet undescribed. In comparison to the nearly 11% reads in Mar Menor metagenome being assigned to phages, only ~3.6% reads could be assigned to phages in Albufera. Even then, a phage genome, Prochlorococcus phage P-SSM2 appeared as a genome that recruited several hits in Albufera (Supplementary Table S4). P-SSM2 is a myovirus, that is specific for cross-infections between Prochlorococcus strains55. However, there is no Prochlorococcus population in Albufera, so these reads likely belong to an abundant myovirus, which might be infecting the abundant Synechococcus or even Cyanobium.

Alphaproteobacteria

Alphaproteobacteria form a large part (~33%) of the community in Mar Menor (Figure 2), similar to the DCM. The marine metagenome of the DCM is dominated by Candidatus Pelagibacter (belonging to the SAR11 cluster). In Mar Menor as well, the majority of alphaproteobacterial 16S sequences could be ascribed to the SAR11 cluster and nearly half (43.6%, n = 69) of all 16S reads to which we could assign a tentative genus could be affiliated to Candidatus Pelagibacter (Supplementary Table S5). Moreover, alphaproteobacterium HIMB114, also a member of the SAR11 cluster (a marine microbe, isolated from Hawaii) was among the organisms that recruited the maximum number of reads from the metagenome (Supplementary Table S3). These results point towards the abundance of a SAR11 representative in Mar Menor. In addition to these organisms (both belonging to the order Rickettsiales), a number of hits were classified into the order Rhodobacterales (~30% of all alphaproteobacterial reads), that are known to comprise, among others, abundant microbes (e.g. Marivita, Cetrimonas, Roseisalinus, Roseovarius were identified). Only a small number of reads were classified into the order Rhizobiales (~10% of all alphaproteobacterial reads).

Figure 2 Cross comparison of comparative distribution 16S rRNA sequences from selected abundant high level bacterial taxa from Mar Menor and Albufera metagenomes to several freshwater and saline metagenomes. Results from all filters have been combined. Full size image

Contrastingly, in the similarly hypersaline lagoon of Punta Cormoran, which has a similar high percentage of alphaproteobacterial reads, the SAR11 clade does not appear to have any abundant representatives, with only a very small minority of reads assigned to Candidatus Pelagibacter. the major taxa belonged to the order Rhodobacterales (e.g. Dinoroseobacter, Roseovarius, Loktanella etc) and to a lesser extent, Rhizobiales (rhizobacteria) (e.g Parvibaculum, Mesorhizobium etc.)17.

So it appears, firstly, that an as yet unknown but abundant SAR11 cluster representative inhabits the hypersaline lagoon of Mar Menor supported by the recruitment plots of both Candidatus Pelagibacter and Alphaproteobacteria HIMB114 (Figure 3). Secondly, the Alphaproteobacteria inhabiting two hypersaline lagoons of similar salinity, are substantially different. Punta Cormoran, the pristine lagoon appears to have a thriving Roseobacter-community compared to Mar Menor that has both SAR11 representatives and Roseobacter species.

Figure 3 Fragment recruitment plots of selected organisms versus the Mar Menor and the Albufera metagenomes. The comparisons were done using BLASTN and a minimum length of 50 bp and an evalue of 1e-5 was considered a hit. The X-axis is scaled in Mb and the Y-axis shows the %identity. Full size image

In contrast to Mar Menor, in the freshwater Albufera, the Alphaproteobacteria are in a minority (~9% of all reads). This is surprising as they are usually detectable across a range of freshwater bodies31. Based on 16S rRNA phylogenetic analyses, freshwater Alphaproteobacteria have been divided into a number of different lineages, called alfI, alfII, alfIII, alfIV, alfV (LD12 sister group to SAR11), alfVI and alfVII31,56. In general, freshwater Alphaproteobacteria are not a well studied group and we have very little information regarding their ecology, functional roles or genomic characteristics. However, the freshwater datasets chosen here do show clearly that there appears to be a wide variation in the abundance and occurrence of the freshwater alphaproteobacterial lineages (Supplementary Figure S8), particularly the complete absence of the LD12 clade in Albufera, which could mean that LD12 distribution might be affected by nutrient status, as supported by our results.

Cyanobacteria

Cyanobacteria form a sizeable percentage of the marine microbial community, especially the deep chlorophyll maximum7 have been shown to progressively decrease in numbers with increasing salinity17. Here also, we can clearly see (Figure 2) that the number of cyanobacterial sequences shows a decline from the marine, to 5% salinity and finally nearly absent at 19% salinity. The total percentage of cyanobacteria identifiable in the Mar Menor dataset is similar to that of the marine metagenome of the DCM and nearly twice that of Punta Cormoran. The top organisms identified as cyanobacterial were only Synechococcus strains (e.g. WH7803, WH7805), which have been identified before in hypersaline habitats, both by 16S rRNA cloning studies57 and by metagenomic analyses17. Comparisons of the metagenomic reads against Synechococcus genomes show a very high level of fragment recruitment (Figure 3), indicating close relatedness between the free-living cyanobacteria in Mar Menor to the already sequenced strains. However, in comparison to the DCM, where the cyanobacterial population comprises both Prochlorococcus and Synechococcus, it appears that among free-living unicellular picocyanobacteria, Synechococcus alone contributes to the primary productivity of this system, where it accounts for ~10% of phytoplankton biomass (Figure 2) and the range of Prochlorococcus does not extend into the high salinity waters of Mar Menor. Indeed, at higher taxonomic levels, there appears to be very little difference between Mar Menor and the DCM, e.g. similar levels of Alphaproteobacteria, Cyanobacteria, Gammaproteobacteria, Bacteroidetes. Differences appear to emerge at the organismal levels, e.g. absence of Prochlorococcus in Mar Menor, a heterogenous alphaproteobacterial population etc.

The most striking characteristic of the Lake Albufera is clearly the high-abundance of cyanobacterial sequences (comprising nearly 35% of all 16S rRNA sequences and nearly 23% of all metagenomic reads, see Figure 2 and Supplementary Figure S7). Albufera exhibits a highly hypertrophic status, which makes a difference with other freshwater bodies previously studied, like the Amazon river, Lake Gatun and Lake Lanier (or even from the other saline/hypersaline datasets), which do not display such cyanobacterial abundances. Both the Amazon and Lake Gatun show only a very small percentage of cyanobacteria (<2%), while Lake Lanier appears to have a little more (~6%). In comparison to Mar Menor, where we were able to identify mainly Synechococcus, the diversity of cyanobacteria in Albufera is clearly higher, with a number of different and abundant genera, e.g. Synechococcus, Cyanobium, Pseudanabaena, Merismopedia, all of which have been previously isolated from freshwaters and mostly detected in this lake3.

Microscopic counts using the Utermöhl sedimentation technique on inverted microscope (Supplementary Figure S2) are useful for distinguishing morphologically different species of a certain size, mainly ranging from nanoplankton to bigger planktonic microorganisms, including filamentous cyanobacteria and eukaryotic algae. This does not apply for picocyanobacteria, such as Cyanobium and Synechococcus, which jointly accounted for up to 48 % of the 16S rRNA sequences assigned to a genus in samples from Albufera (Supplementary Table S6). Larger cyanobacteria, like colonial forms of genus Merismopedia or filamentous species, like Pseudanabaena, were detectable both from sequencing techniques (Supplementary Table S5) and by microscopy (Supplementary Figure S3), showing a partial agreement in both methods.

Even though the measured chlorophyll a levels in Albufera are far higher (271.31 µg/l) than Mar Menor (3.94 µg/l), the difference in the percentage of cyanobacteria (by 16S rRNA analysis) is not proportionately larger (~35% in Albufera and ~12% in Mar Menor). This is likely due to the presence of an enormous diversity and abundance of eukaryotic photosynthetic algae in Albufera, that are not very well detected by sequencing due to much larger eukaryotic genome size but are identified clearly under the microscope (Figure 1, Supplementary Figure S3). Similarly to Mar Menor, we found no evidence for presence of Prochlorococcus in the metagenomic data from Albufera although Prochlorococcus-like populations have been reported in freshwater systems before58 and a study on Yellowstone Lake has also detected Prochlorococcus ecotypes in freshwater59. Also, even though we detected small amounts of chlorophyll b (Supplementary Table S1) and Prochlorococcus cells have characteristic divinyl derivatives of chlorophyll a and b, this may most likely be attributed to chlorophytes that also have these pigments and were identified as very abundant by microscopic counts (Supplementary Figure S2). Another indication of the relative homogeneity of the cyanobacterial populations in Mar Menor, compared to Albufera, is also visible in the GC% profile of the cyanobacterial reads (Supplementary Figure S9), i.e. a single peak at ~62%, while in Albufera, there are two distinct peaks (one at ~55% and the other at ~62%).

Verrucomicrobia

This group of microbes is another point of difference between the pristine Punta Cormoran and Mar Menor. Verrucomicrobiae are widely distributed and have been isolated from a number of different habitats, e.g. soils, lakes, marine sediments, hot springs and even in man-made ecosystems like acid-rock drainage and municipal solid-waste landfill leachates60. They are recognized as an increasingly significant group of soil bacteria and according to several estimates may comprise up to 10% of total bacteria in soil60. In Mar Menor we find that a Coraliomargarita akajimensis (isolated from seawater61) related microbe is quite abundant (Supplementary Table S3). Another abundant organism (by 16S rRNA) was Haloferula, which lacks a sequenced genome. However, Haloferula species have been isolated from marine environments62 so it is likely these are close relatives. Moreover, it is clear from Figure 2, that the Verrucomicrobia are abundant in Albufera as well. However, here instead of Coraliomargarita or Haloferula (which appear to be more salt-tolerant), there is a Chthionibacter63 (which was isolated from soil) related Verrrucomicrobia.

Actinobacteria

Actinobacteria have been primarily thought of as soil bacteria. This can be attributed to the ease of cultivation of this group, which have been referred to as high GC microbes. However, several studies, using different approaches (16S rRNA, FISH and metagenomics) have shown now that Actinobacteria are very common and abundant members of freshwater communities24,30,56,64,65 and many are not even high GC17,23,24. The abundance of Actinobacteria varies greatly across the datasets (Figure 2). In the Albufera metagenome we were able to identify as corresponding to this group only 5-6% of reads (by 16S and all reads). This is an extremely low percentage relative to the other datasets, (e.g. Amazon ~20%, Lake Gatun ~40% and Lake Lanier ~20%). This reduced relevance of Actinobacteria is indeed striking.

Some saline datasets also show an abundant actinobacterial presence, e.g. ~24% of all 16S rRNA reads in the hypersaline lagoon Punta Cormoran are actinobacterial. This is in sharp contrast to the very low numbers in the DCM (~2%) or Mar Menor and SS19 (~5% each). In addition, most of the actinobacterial reads from the Mar Menor metagenome were high GC (Supplementary Figure S10) while those from Albufera showed three clear GC% peaks, indicating that in spite of the low number of actinobacterial reads, there might be at least three different clades of Actinobacteria present here.

We examined the 16S rRNA actinobacteril reads from all these datasets in the framework of a well-defined taxonomy (Figure 5), which the freshwater taxa have been classified into seven lineages (~10–15% identical in 16S rRNA to each other)31. Each lineage is subclassified into clades (> = 95% identity to at least one member) and clades into tribes (> = 97% identity to at least one member). The results of this classification show the variation in abundance of these lineages across all the datasets. However, apart from these differences, it is very difficult to arrive at more conclusions as there is not even a single sequenced representative yet from low GC Actinobacteria.

Figure 5 Classification of actinobacterial 16S Reads from Albufera, Mar Menor and several other metagenomic datasets into known lineages of freshwater Actinobacteria. The numbers above the bars indicate the total number of actinobacterial 16S sequences detected in each dataset. Full size image

Both Mar Menor and Albufera contain very similar percentage of actinobacterial reads. However, they differ in the type of their resident Actinobacteria. The majority of reads in Mar Menor could be affiliated to the Luna1, Luna3, acIII and acIV lineages. Albufera also has acIII and acIV, Luna3 lineage is absent and several others are present in small numbers. The Amazon River and Lake Gatun show very similar populations, with acI-C clade being the most abundant. Lake Lanier is quite different from both of these and has acI-A as the dominant clade. But Albufera is drastically different from any of the other freshwater datasets, with the acI lineage completely absent. Instead, acIII and acIV are nearly equally dominant. The saline samples also show a different trend. Very few reads are detected in the DCM so these might not be very reliable, but Mar Menor and PC6 actually appear quite similar in their actinobacterial load, apart from the extra presence of the acSTL lineage in PC6. However, one of the most striking results is the total dominance of the Luna1 lineage (previously called acII), in the SS19 dataset. It is also present in significant amounts in Mar Menor and Punta Cormoran. Broadly however, the data clearly show the separation between the various lineages on grounds of salinity. For example, the acI lineage, without doubt among the most abundant freshwater lineages, is restricted to freshwater alone and is not available in saline habitats. However, none of these lineages have a sequenced representative yet, so we cannot speculate further on the nature of these differences.

Betaproteobacteria

Betaproteobacteria are among the most dominant taxa in freshwater systems. This has been shown by several approaches (16S rRNA, FISH, metagenomics)24,64,66. In simple abundance levels, in comparison to Albufera(~6%) with other freshwater datasets, only Lake Gatun has a similar abundance levels of betaproteobacteria(~3%), while the Amazon and Lake Lanier appear to have very high levels (nearly 20%) (Figure 2). Although betaproteobacteria are detectable in Albufera, the most prominent, nearly universally available and arguably the best studied freshwater betaproteobacteria, Polynucleobacter, was conspicuous by its absence. Moreover, only a handful of betaproteobacterial 16S rRNA sequences could be affiliated to known genera (Methylibium-3 and Thauera-1) while the others were all unclassified betaproteobacteria. Comparisons from using all reads did suggest that nearly 6% of all sequences in Albufera were betaproteobacterial (Supplementary Figure S7). The ubiquity and absence of Polynucleobacter across a wide variety of lakes of different characteristics (altitude, pH, water chemistry, landscape position, trophic status etc) has been discussed extensively28 before and it has been suggested that high levels of dissolved organic carbon are negatively correlated to the abundance of this microbe. In Albufera, we detected very high values of dissolved organic matter (data not shown) and it is likely that this factor is important in the absence of this ubiquitous microbe in this habitat.

Freshwater betaproteobacteria are broadly divided into seven lineages (betI to betVII) based on 16S rRNA phylogenetic analyses31. We classified the betaproteobacterial reads from several datasets into these lineages (Supplementary Figure S11). Both the Amazon and Lake Lanier both showed a wide variety of lineages and with nearly equal amounts of betI lineage. Indeed, betI lineage does appear to be nearly universal across all freshwater datasets. This lineage does have some cultured representatives (e.g. Limnohabitans67,68). However, some lineages of betaproteobacteria found in both are different, e.g. betIII (order Burkholderiales) is dominant in Lake Lanier and betIV (order Methylophilales) in the Amazon.

The betII lineage, to which Polynucleobacter belongs, is seen only in two datasets (Amazon and Lake Lanier). Albufera also contained sequences belonging to the betVI lineage (~50%). In Albufera, the betaproteobacterial sequences appear nearly evenly divided between the betI and the betVI lineages, with a small amount of betIV sequences. However, the betI and the betVI lineages appear to be widely distributed in all freshwater datasets. But apart from this, there does not appear to be any kind of simple commonality regarding the distribution of the lineages within the datasets studied, with each dataset having its own characteristic features. More metagenomic datasets complemented with environmental data will be required to elucidate more clearly the various reasons for distribution of these lineages.

We also examined in more detail the distribution of Polynucleobacter specifically in all the metagenomic datasets compared in this study. In the betII lineage there are four different “tribes” namely PnecA, B C and D, named after Polynucleobacter31. The four tribes refer to different Polynucleobacter species i.e. PnecA, B, C D refer to P. rarus, P. acidiphobus, P. necessarius and P. cosmopolitanus respectively. Only the Amazon and the Lake Lanier datasets showed evidence of presence of Polynucleobacter. However, both are quite enriched in Polynucleobacter (betII lineage). More specifically, all four tribes PnecA, B, C and D were identified in the Amazon dataset, while only PnecB was identified in the Lake Lanier dataset. We could not identify any Pnec 16S sequences in any of the other datasets.

It is clear however, that betaproteobacteria are not numerically abundant in saline waters, e.g. in Mar Menor, only about ~1% of all the reads could be assigned to betaproteobacteria (Supplementary Figure S7). They are at similarly low levels in the Deep Chlorophyll Maximum, Punta Cormoran and SS19 datasets as well. This is in concordance with similar results regarding the low abundance of betaproteobacteria in marine metagenomic datasets that have been obtained before8.

Eukaryotes

From the collected metagenomic data it is possible to identify eukaryotic sequences ~12% Mar Menor, ~2% Albufera from comparison to the complete NR database. Indeed, the number of eukaryotic reads increased progressively with increasing filter size (Supplementary Figure S5). The total number of 18S sequences identified in Mar Menor and Albufera were 28 (~5% of total SSUs) and 22 (~5% of total SSUs) respectively. The main eukaryote identified in Mar Menor was Alexandrium (~18%, n = 5) a marine armored dinoflagellate that produces neurotoxins that cause paralytic shellfish poisoning. Alexandrium is well known in coastal lagoons in the Mediterranean69 and has both autotrophic and heterotrophic species. Alexandrium blooms are harmful and are famously referred to as red tides. The toxins it produces can have adverse effects when consumed by humans, usually in the form of contaminated seafood (shellfish, fish etc)70. Moreover, these blooms are common in coastal habitats and affect marine trophic structure, increase mortality of marine fish, birds and mammals and disrupt recreational activities70. Dinoflagellate blooms are usually correlated with increased levels of reduced nitrogen sources, particularly ammonia and urea (at least for Alexandrium)71. Photosynthetic dinoflagellates can supplement photosynthetic growth by organic sources and the increase in the levels of inorganic nutrients (particularly nitrogen and phosphorus)72, coupled by their ability to produced paralyzing toxins make them strong competitors in eutrophic systems, affecting multicellular and unicellular life alike73. However, toxin production by Alexandrium is inconsistent and not all species are toxic. Additionally to Alexandrium, other dinoflagellates were also detected (e.g. Gymnodinium, Protoceratium).

Another abundant organism present by 18S rRNA in Mar Menor was Chrysochromulina (n = 4), which is a haptophyte from the class Prymnesiophyceae. Haptophytes (e.g Chrysochromulina, Phaeocystis, Prymnesium), are all bloom forming organisms. The particular feature of haptophytes is the presence of a haptonema, a flagella-like (though only superficially), retractile, coiled protuberance, performing several functions (e.g. sensory responses, prey capture)74. Chrysochromulina is also photosynthetic and (like some Alexandrium species), can supplement photosynthetic growth by mixotrophic feeding. Indeed, some Chrysochromulina species are actually euryhaline as well, with a much higher level of optimum salinity for growth75 than marine levels.

In a microscopic examination and enumeration of the planktonic species, we were able to identify a number of abundant diatoms (e.g. Cyclotella, Entomoneis, Nitszchia). Cyclotella was identified by its 18S rRNA sequence in the metagenomic data as well. It is a well known abundant centric diatom. Some Cyclotella species are known to be associated with high nutrient concentrations, particularly phosphorus and thus are actually associated with polluted, eutrophic waters76,77. However, the most abundant organism by far identified by microscopy was a dinophyte Gyrodinium. In contrast with diatoms, whose main sequences (Cyclotella sp.) corresponded to taxa already identified by microscopic observations, molecular identifications of dinoflagellates (dinophytes) did not coincided with microscopic determinations, which demonstrates that taxonomy of this group is yet far to be elucidated, even though in our microscopic determinations we only considered autrotrophic or mixotrophic species which hold chloroplasts.

Apart from protists, crustacean copepods, that are among the most important group of marine invertebrates78, particularly for the carbon flux in the food web of the oceans79, were identified as well in Mar Menor (e.g. Paracyclopina, Oithona, Diarthrodes). These can be considered the zooplanktonic community in the lagoon. Parcyclopina species can be found in brackish waters but are tolerant to high salinities as well80.

The planktonic community of protists and zooplankton in Albufera was clearly different from the dinoflagellate and copepod dominated Mar Menor. The 18S rRNA sequences from Albufera could be assigned primarily to diatoms (e.g. Cymbella, Nistzchia, Sellaphora), that also matched with microscopic determinations (Supplementary Figure S2), or ciliates (e.g. Halteria, Strombidium). While microscopic observations confirmed the presence of Nitszchia and Cyclotella, the vast majority of organisms in the sample were identified as the filamentous or colonial nonoplanktonic cyanobacteria (prokaryotes), primarily Merismopedia, which form a dense layer of loosely arranged cells in a somewhat planar (rectangular or square) topology, sometimes enclosed by a mucilaginous matrix. Merismopedia is commonly found floating in freshwater, several species are planktonic and can also be found in somewhat halophilic habitats (e.g. coastal areas) or even in thermal springs. They are actually distributed all over the world81. In addition to the abundant cyanobacterial (prokaryotic) taxa, several type of chlorophytes (eg. species of Pediastrum, which accounted for a big portion of the phytoplankton biovolume, Coenochloris, Chlamydomonas, Tetraedron, Scenedesmus, etc) were detected. The photosynthetic organisms in Albufera clearly dwarfed those available in Mar Menor, both in sheer numbers and also in diversity.

Rhodopsins

We identified 52 rhodopsin sequences in the Mar Menor dataset and 34 in Albufera. In Mar Menor, though Firmicutes represented less than 1% of the classified sequences, 10 sequences (nearly 20%) of all rhodopsin sequences appeared related to firmicute rhodopsins (Exiguobacterium sp.). Nearly all other sequences in Mar Menor were related to proteobacterial rhodopsins (primarily a collection of Alphaproteobacteria and Gammaproteobacteria). In Albufera, the phylogenetic distribution of rhodopsins appeared more diverse , with the majority affiliated to Proteobacteria (11 sequences) and Planctomycetes (10 sequences). In addition, actinorhodopsins (7) and firmicute rhodopsins (4) were also found.

Metagenomic assembly

Assembly of the metagenomes resulted in a total of 104 contigs from Mar Menor and 35 contigs from Albufera (See methods for details). Nearly one-third of all contigs (77%, n = 80) assembled from Mar Menor were primarily alphaproteobacterial (average GC 51.7%, average length 3.1 kb, total length 250 kb). The only other significant sized fraction of assembled contigs could be assigned to viruses (12%, n = 8, total length = 34 kb). A small number of actinobacterial contigs (n = 6, average length 2.8 kb, total length 17 kb) could also be assembled. Five of these contigs were high GC (57 to 60%) while the last contig, (size 4.3 kb) had a much lower GC content (43%) and contained, among some hypothetical genes, the genes coding for the alpha and beta subunits of ribonucleotide reductase, that are crucial for conversion of ribonucleotides to deoxyribonucleotides.

We performed a principal component analysis on the tetranucleotide frequencies of the assembled contigs (see methods) (Figure 4). In this analysis it is possible observe, besides the actinobacterial cluster formed by the 5 (high GC) of the 6 Actinobacterial contigs (shown in yellow), 4 other clusters corresponding to cyanobacteria, gammaproteobacteria, alphaproteobacteria and viral contigs. However, the largest cluster is formed by the alphaproteobacterial contigs. But this cluster has no proximity to the reference genomes of two organisms of the SAR11 cluster (found by recruitment) namely, Candidatus Pelagibacter (GC = 29.7%) of the SAR11 cluster and Alphaproteobacterium HIMB114, but instead is closer to Candidatus Puniceispirillum marinum(GC = 48.9%), which is a member of the SAR116 cluster. A total of 320 genes were predicted in the 80 alphaproteobacterial contigs and of these 120 genes gave a best hit to Rhizobiales (mean similarity 72.38%), while 108 genes gave best blast hits to Rhodobacterales (mean similarity 73%). It did not appear to be related to Rickettsiales. So even though it appears that there are at least two unidentified microbes in Mar Menor, by 16S rRNA analysis, one related to Rickettsiales, SAR11 and the other to Rhodobacterales, we did not assemble any reads from the SAR11 related microbe, but from the other.

Figure 4 PCA of tetranucleotide frequencies of assembled contigs from Mar Menor and Albufera. Only those contigs longer than 2 kb that had a consistent phylogenetic profile are shown (see methods). Full size image

In one of these assembled alphaproteobacterial contigs, we identified a nearly complete cluster of Sox genes that provide the necessary apparatus for performing sulfur oxidation. This cluster has been demonstrated to operate in photo- and chemotrophic Alphaproteobacteria that oxidize thiosulfate to sulfate without inorganic sulfur globule formation as free intermediate and was first described in the alphaproteobacterium Paracoccus pantotrophus, a facultative lithoautotrophic organism that grows with thiosulfate (and other electron donors e.g. molecular hydrogen) as an energy source82. The cluster of P. pantotrophus coding for sulfur-oxidizing proteins comprises at least two transcriptional units with 15 genes. Seven genes, soxXYZABCD, code for proteins essential for constituting a periplasmic system for sulfur oxidation in vitro and are induced by thiosulfate. The SoxY gene has a C-terminal invariant binding site motif (VKVTIGGCGG), that binds different oxidation states of sulfur83. The exact motif was present in the assembled SoxY gene in the assembled contig, providing more confidence in the function assignment and assembly. Although several pathways of thiosulfate oxidation are known, two main pathways exist, the difference between them being related directly to the presence or absence of the SoxCD genes84. In the presence of SoxCD proteins, thiosulfate is converted to two sulfate molecules and this is the pathway in P. pantotrophus, while in the absence of SoxCD, only a single sulfate is produced, the other sulfur atom being deposited in the form of inorganic globules (e.g. Beggiatoa). In the case of the assembled contigs, it clearly possesses a SoxC gene, while the SoxD part is likely not assembled. So it appears that the organism to which this cluster belongs is able to fully oxidize sulfur to two sulfates and does not deposit any sulfur granules either intra or extracellularly.

Comparison of the assembled Sox genes cluster with the sox gene cluster of Roseobacter sp. MED193 and Aurantimonas manganoxydans SI85-9A1 (Supplementary Figure S12) showed nearly complete synteny between the genomic regions and the assembled contig. This suggests that the organism to whom these contigs belong is a novel sulfur-oxidising Alphaproteobacteria, likely adapted to a higher salinity. This is interesting because the close relatives of this microbe, e.g. Candidatus Pelagibacter, Alphaproteobacterium HIMB114 and Candidatus Puniceispirillum do not have the Sox cluster in their genomes and are likely incapable of sulfur oxidation.

Some other contigs that were assembled from the data for Mar Menor could be assigned to cyanobacteria (Figure 4). These contigs appeared closely related to Synechococcus species. However, the assembly from Albufera represented a much more diverse set of contigs, with several contigs assembled, primarily from cyanobacteria (66%) but also from other taxa (Viruses 9%, Bacteroidetes 11% and Betaproteobacteria 6%).

A more focused analysis of the assembled actinobacterial contigs from Mar Menor, in the context of actinobacterial contigs from other metagenomic datasets is shown in Figure 6. We collected actinobacterial contigs from Lake Gatun and Punta Cormoran (see methods) and also three fully sequenced actinobacterial fosmids from Lake Kinneret85. We could identify at least six distinct clusters, each representing a dominant lineage of freshwater actinobacteria. Because of the presence of 16S sequences in the contigs, at least three of the clusters can be assigned a tentative name, i.e. two sub groups of acI, acIA and acIB1 and a lineage acIV. In comparison to the Lake Gatun, the contigs from Punta Cormoran, appear to have higher GC content (see Clusters 4, 5 and 6 in Figure 6). Also, out of the 6 assembled contigs from Mar Menor, five cluster very clearly with Punta Cormoran contigs in Cluster 5 (GC% 55–62). Only a single contig, a low GC contig, clusters with the acIB1 cluster (Cluster 1). So it does appear that there is a minor low GC actinobacterial population in Mar Menor (also seen in the GC% profile of Mar Menor Actinobacterial Reads, Supplementary Figure S10). No rRNA sequences were detected either in the contigs from Punta Cormoran, or from Mar Menor contigs so assignment of names to clusters 4, 5 or 6 was not possible. Also, since a number of different actinobacterial clades are nearly equally abundant (Figure 5), it is not possible to associate with any degree of confidence these contigs with the known actinobacterial lineages.