General characteristics of marine myxobacterial genomes

Five draft genomes are currently available from obligatory marine myxobacteria: Plesiocystis pacifica DSM 14875, Haliangium ochraceum DSM 14365, Enhygromyxa salina DSM 15201, and, related to the latter one, Enhygromyxa salina SWB005 and Enhygromyxa salina SWB007, of which the last two were recently sequenced from our working group19 (Table 1). The quality of the draft genomes differs and the number of contigs varies between 1 for H. ochraceum DSM14365 to 330 for E. salina DSM 15201. However, all strains possess large genomes ranging from 9 to 10.6 Mbp. Like in terrestrial myxobacteria, the GC content is rather high, i.e. between 67 and 71% and the number of predicted gene coding sequences (CDS) is around 7,000–8,500, which is in accordance to the large genome size of these strains.

Table 1 General characteristics of available marine myxobacterial genomes. Full size table

A phylogenetic tree of marine myxobacteria was constructed based on a nucleotide sequence alignment of the core genomes (see below) (Fig. 1). The E. salina strains belong to the order of Myxococcales and the P. pacifica DSM 15201 type strain is the closest relative to the E. salina clade. They are part of the Nannocystaceae family. However, the first isolated marine myxobacterium H. ochraceum DSM 14365T belongs to the family of Kofleriaceae and the core genome of this strain is distinct from the other marine myxobacteria (see below).

Figure 1 Phylogenetic tree of selected marine myxobacteria. Available genomes of marine myxobacteria were used to build the tree based on nucleotide sequence alignment of the core genomes. The closely related halophilic strain Nannocystis exedens ATCC 25963, as well as the terrestrial Myxococcus xanthus DK1622, which represents the outgroup, were included. Tree for 7 genomes, build out of a core of 645 genes per genome, 4515 in total. The core has 838,246 bp per genome, 5,867,722 in total. The tree topology was evaluated in 500 bootstrap iterations and showed a branch conservation of 100%. Full size image

Genome comparison

A synteny plot of the reciprocal best blast hits of all CDS within the contiguous contigs was constructed using the EDGAR pipeline. The genome of E. salina SWB007 was chosen as reference for synteny analysis, because (i) it is bigger in size, thereby the chance to cover genomic parts of the other strains is higher, (ii) it is of high quality, and (iii) due to the high relationship between the genera Enhygromyxa and Plesiocystis, which excludes Haliangium as reference. According to the synteny plot, there are many CDS located in different positions compared to the reference genome of E. salina SWB007. However, there is still rather good synteny of orthologous genes within the areas that reside inside contig boundaries of E. salina SWB007 and the genomes of E. salina SWB005 and DSM 15201, as well as P. pacifica DSM 14875. The latter showed slightly lower synteny (Fig. S3A). This result indicates a low degree of genome divergence within these marine myxobacteria. On the nucleotide level, E. salina SWB007 and DSM 15201 are highly similar, while the identity ratio of E. salina SWB005 is slightly lower and further decreases for P. pacifica DSM 14875 and H. ochraceum DSM 15365, respectively (Fig. S3B).

Based on in silico parameters which determine if genomes belong to the same species (i.e. ANI value ≥ 96%, isDDH value ≥ 70%, and difference in G + C content ≤ 1%)30,39, both strains, E. salina SWB005 as well as SWB007, can be considered as a distinct new species of the genus Enhygromyxa. The ANI value between E. salina SWB007 and E. salina DSM 15201 is 85% with an isDDH value of 29% and a G + C difference of 0.7%. The ANI values between E. salina SWB005 and the other E. salina strains is 79% with an isDDH value of 23% and a G + C difference of more than 1% (Figs 2 and S4). On the amino acid level, all E. salina strains and P. pacifica DSM 14875 show 74.7–92.7% average amino acid identity (AAI) between each other (Fig. S5). Therefore, the orthologous genes in these strains probably perform the same functional roles. However, the function of the orthologous genes in H. ochraceum DSM 14365 is more uncertain, since the AAI is only 48% towards other strains (Fig. S5).

Figure 2 Average Nucleotides Identity (ANI) matrix of the available marine myxobacteria genomes. All values are given in percent. Full size image

In order to obtain further insights into the degree of similarity between the analyzed genomes, the number of core genes, as well as of singletons was determined. (Fig. 3A). Even for the most closely related strains investigated here, i.e. the E. salina strains, more than 1600 CDS represent singletons, which is equivalent to 21–23% of each genome (Fig. 3A). This value duplicates if the next further relative, i.e. P. pacifica DSM 14875 is considered, since this strain has 3365 singletons (equivalent to ~40% of the genome). H. ochraceum DSM14365 has 5220 (equivalent to 74% of the genome) singletons (Fig. 3A). The core genome of these marine myxobacteria consists of 1130 CDS. This relatively low number, equivalent to 13–16% of the CDSs per strain, is due to the inclusion of the more distantly related H. ochraceum DSM 14365 genome to the analysis. For comparison, the core genome of six Myxococcus genomes, including 4 different species and three M. xanthus strains consists of 4,693 CDS. This accounts for 56.6–63% of the CDSs in each genome40. If only the E. salina strains are considered, they have > 4600 CDSs in their core genome, and inclusion of P. pacifica DSM 14875 in the analysis results in a core genome of >3600 CDSs (Fig. 3B). Hence, the pan genome increases by about 2000 CDSs by every additional E. salina strain. If the other marine myxobacteria are included, the pan genome increases further by almost 3500 CDSs of P. pacifica DSM 14875 and by 5000 CDSs of Haliangium ochraceum DSM14365, respectively (Fig. 3B).

Figure 3 (A) Venn diagram of the CDS counts in different subsets of the genomes (singletons are given as percentage of the respective genome). (B) Core vs. pan genome plot of the genomes. Full size image

Analysis of specialized metabolite biosynthetic gene clusters in the genomes

In order to estimate the potential of the strains for the production of specialized metabolites, the genomes were screened in silico for the presence of biosynthetic gene clusters (BGCs) putatively coding for the production of such compounds3. All organisms investigated have a high variety of BGCs in their genomes, i.e. 30–46 BGCs were identified in each strain (Table 2). These numbers even doubled, if a more general cluster finder algorithm was applied to estimate the cluster boundaries (assigning putative BGCs) based on frequencies of locally encoded protein domains detected by Pfam3. In terms of novel metabolites, the numbers of identified BGCs by AntiSMASH which had similarities to known BGCs from the MIBiG database41 were counted (Table 2). 10–11 BGCs of each E. salina strain, 5 from P. pacifica DSM 14875, and 17 from H. ochraceum DSM 14365 matched partly or completely to validated gene clusters.

Table 2 Overview of predicted biosynthetic gene clusters (BGCs)a. Full size table

Analyzing the classes of metabolites predicted from the identified BGCs, revealed that PKs (2–11 per strain), fatty acids (5–9), and terpenes (3–9) represent the majority of predicted specialized metabolites, followed by bacteriocins (4–6). NRPs (0–4) and mixed PK/NRPs (0–3) are less common (Fig. 4). However, it should be noted that because draft genome sequences were analyzed, big BGCs such as PKS and NRPS can be split across contigs and the real number of BGCs might be overestimated.

Figure 4 Distribution of different BGC types in the five genomes. Genomes (first) and BGC types (second) are segmented in descending order of the BGCs counts. (A) Color code of the respective strain and number of BGCs in each strain. (B) ribbon color is set to respective BGC type. (C) BGC color code. Full size image

To get additional insights into the nature of the metabolites putatively corresponding to a BGC, an analysis using the ARTS webserver was performed35. This tool aims to enable prioritization of BGC, which correspond to antibacterial compounds. It is based on the fact that, to avoid suicide, an antibiotic producer harbors resistance genes often within the same BGC responsible for manufacturing of the compound. Known resistance, as well as possible resistant housekeeping genes are detected35. Using this analysis, several resistance model hits were identified (Table 2 and Fig. 5) suggesting that these specific BGCs code for antibacterial compounds. 7 to 13 resistance model hits were identified among the E. salina strains, including beta-lactamase, ABC-transporters, and other efflux systems. In P. pacifica DSM 14875 and H. ochraceum DSM 14365 only 4 hits pointing toward antibacterials were identified.

Figure 5 Similarity network of the predicted biosynthetic gene clusters (BGCs) in the five analyzed genomes. (A) Unique and shared similar BGCs (connected by a line). ARTS hits for resistance (R) and essential core genes (C) are labeled inside the respective nodes. (B) Venn diagram displaying node counts according to distribution in strains (H. ochraceum is excluded, since this strain has only 1 BGC which is similar to a BGC of the other strains). Interactive network is available at http://www.ndexbio.org under the title (Fig. 5) or by the DOI (http://doi.org/10.18119/N9F30V). Full size image

In a next step, we analyzed if BGCs encoding specialized metabolites are shared between the myxobacterial strains. A similarity network of all detected BGCs in the genomes was created based on the Pfam similarity metrics34. Out of the 351 BGCs identified, 124 (35%) can be found in at least two strains (Fig. 5A). The closely related strains E. salina SWB007 and E. salina DSM 15201 have the biggest overlap, whereby more than two third (71%) of the BGCs show similarities (Table S1A). E. salina SWB005 and P. pacifica DSM 14875 share several similar gene clusters with at least one other strain in the network (19.3% and 8.9%, respectively). Conversely, H. ochraceum DSM 14365 has only one BGC in common with other strains. This BGC is annotated as putatively related to 3-hydroxybutyryl-CoA biosynthesis. In fact, excluding H. ochraceum, only seven BGCs equivalent to 9–11% of the BGCs in each strain are similar between all E. salina strains and P. pacifica (Fig. 5B). However, 38.7% of the shared similar BGCs are categorized as putative, meaning that no corresponding metabolite class can be predicted (Table S1B). From the predictable BGCs, PKS clusters contribute to the highest share with 14.5%, followed by terpene (12.1%), and fatty acid (11.3%) BGCs (Table S1B). If only the strain specific (unique) BGCs are considered, half of them (50.7%) are classified as putative. The other half of the unique BGCs can be linked to the biosynthesis of polyketides (9.7%), fatty acids (8.8%), others (6.1%), and further less abundant ones (Fig. 5A and Table S1B). BGCs coding for peptidic metabolites, e.g. encoding NRPSs, PKSs/NRPSs and RiPPs, are mostly strain specific in the investigated strains.

In a next step, the predicted biosynthetic pathways were analyzed in more detail.

Terpenes

Many of the shared specialized metabolite BGCs encode for the biosynthesis of terpenes. The E. salina strains harbor six to nine terpene BGCs, P. pacifica DSM 14875 five and H. ochraceum DSM 14365 only three. In silico metabolic analysis using RAST revealed that all strains harbor the potential to generate the building blocks necessary for terpene assembly (Figs S6–S8). Several of the identified terpene BGCs could be linked to known terpene BGCs, including geosmin, squalene, sterols and carotenoids.

The predicted geosmin BGC shows high similarity to the BGC of Nostoc punctiforme PCC 73102 (ATCC 29133), which was investigated before42. Beside the gene encoding the geosmin synthase/cyclase, two genes encoding transcription regulators were also detected (Fig. S9). All strains except P. pacifica DSM 14875 harbor this cluster. The same gene cluster can be also found in the closely related halophilic myxobacterium Nannocystis exedens ATCC 25963 (Fig. S9). Interestingly, in this bacterium, 2-methylisoborneol and geosmin were identified as the main volatile compounds43. A squalene BGC was detected in all five investigated strains. This BGC encodes two squalene synthases (HpnC and D) and a squalene-associated FAD-dependent desaturase (HpnE), necessary to convert farnesyl diphosphate (FPP) to squalene (Fig. S10). In addition, the E. salina strains harbor three conserved squalene/hopene cyclases in other locations of their genomes, while P. pacifica DSM 14875 harbors two. The squalene/hopene cyclases detected in one of the BGC conserved in all E. salina strains and P. pacifica DSM 14875 showed BLAST hits towards different described sterol synthases including lanosterol and cycloartenol synthases (Fig. S11). H. ochraceum DSM 14365 does not harbor any additional squalene/hopene cyclase. Furthermore, a carotenoid BGC was found to be shared between all investigated strains. The essential genes for geranylgeranyl-CoA diphosphate synthase, a phytoene synthase, two dehydrogenases and a polyprenyltransferase are present44 (Fig. S12).

Polyketides (PKs)

The biggest group of specialized metabolite BGCs is linked to polyketides, i.e. 11.4% of all BGCs (Fig. 5). The total count of polyketide BGCs is 9–11 for E. salina strains and P. pacifica DSM 14875, while H. ochraceum DSM 14365 harbors only two. The genes coding for biosynthesis of starter and extender units for polyketide assembly were identified (see SI for details). Beside the standard extender unit malonyl-CoA (mCoA), the results indicate that the strains also possess the potential to synthesize methylmalonyl-CoA (mmCoA) and propionyl-CoA (pCoA). The latter is formed in the catabolism of isoleucine and valine (Fig. S7) and can serve as precursor for mmCoA. Ethylmalonyl-CoA (emCoA) can be biosynthesized through carboxylation of butyryl-CoA (bCoA). Carboxylation of bCoA is a described side activity of the propionyl-CoA carboxylase (PCC), which is part of the mmCoA biosynthesis (see above). Another pathway yielding emCoA is the conversion of crotonyl-CoA (cCoA) to emCoA by the catalytic activity of a cCoA carboxylase/reductase (CCR). A gene putatively coding for this conversion was identified in E. salina SWB007, i.e. annotated as crotonyl-CoA reductase /alcohol dehydrogenase (accession: WP_106090768), 61% identity to Leu10 and 51% identity to TgaD, which are part of leupyrrin and thuggacins BGCs in Sorangium cellulosum45,46. It is of interest that none of the polyketide BGCs in these bacteria could be linked to any known polyketide BGC and also they are just partly similar to BGCs of terrestrial myxobacteria and streptomycetes. For example, a putative type 1 PKS BGC is shared between E. salina strains and P. pacifica DSM 14875, shows some similarities to the thuggacin BGC from Chondromyces crocatus45 (Fig. S13). However, the corresponding metabolite to this BGC is unknown.

In addition, there are some type III polyketide synthase (PKSIII) BGCs found in analyzed strains except Haliangium ochraceum DSM 14365. P. pacifica DSM 14875 harbors one and E. salina DSM 15201, SWB007, and SWB005 harbor two, three and four PKSIII BGCs, respectively. One PKSIII BGC is shared between E. salina strains and P. pacifica DSM 14875, while another PKSIII BGC is shared only between the E. salina strains. Furthermore, E. salina SWB007 carries a unique PKSIII BGC, consisting of genes encoding a PKSIII, a methyltransferase, and an oxidoreductase. In its vicinity, genes encoding a polyprenyl synthetase and a polyprenyl transferase were detected (Fig. S14).

Non-Ribosomal Peptides (NRPs) and PKs/NRPs hybrids

Almost all of NRPS and PKS/NRPS hybrid BGCs were strain specific and only identified in E. salina strains and H. ochraceum DSM 14365. In E. salina SWB007, a strain specific type 1 PKS/NRPS BGC was identified, showing high homology to the reported leupyrrin BGC from Sorangium cellulosum So ce69046. In depth investigation of the gene cluster revealed that all genes necessary for leupyrrin biosynthesis are present (Fig. S15).

Bacteriocins

Several bacteriocin BGCs were identified in each strain. The similarity network (Fig. 5) indicated many of the them to be similar BGCs, i.e. 11 out of 19 have at least one counterpart, if the E. salina strains and P. pacifica DSM 14875 are considered.

Arylpolyenes

Arylpolyene (APE) BGCs were detected only in E. salina strains and P. pacifica DSM 14875. One of them is well conserved within all with homologous gene clusters from different marine photobacterium strains and closely resembles the APE BGC of Escherichia coli CFT073 and of Vibrio fischeri ES114 (100% of the biosynthetic genes show similarity, Fig. S16)2. Another APE BGC was only found in E. salina SWB007 and E. salina DSM 15201. However, the latter one did not show high similarity to any known BGCs.

Siderophores

Siderophore BGCs (NRPS-independent) were only shared between the E. salina strains and P. pacifica DSM 14875. Each strain harbors two distinct siderophore BGCs. One of them contains only one conserved gene from the IucA/IucC family of siderophore biosynthesis enzymes and the other encodes two IucA/IucC-like proteins and a lysine/ornithine N-monooxygenase.

Ectoine and hydroxyectoine

A complete ectoine/hydroxyectoine BGC was detected only in E. salina SWB005 and SWB007. In H. ochraceum DSM 14365 only an ectoine synthase gene was detected, while all the other necessary genes were absent. In addition, the ectoine BGC in E. salina SWB007 contains a glycine/sarcosine N-methyltransferase (GSMT) and a sarcosine/dimethylglycine N-methyltransferase (SDMT), which are responsible for betaine biosynthesis (Fig. S17)47.

Indole

All E. salina strains harbor a conserved indole prenyltransferase. However, the adjacent genes are either rearranged or not conserved (Fig. S18).

Ribosomally synthesized and post-translationally modified peptides (RiPPs)

BGCs coding for RiPPs were only found as unique BGCs in the genomes of E. salina SWB007 and H. ochraceum DSM 14365. A lanthipeptide and a thiopeptide BGC were detected in the genome of E. salina SWB007, and in H. ochraceum DSM 14365 a lanthipeptide and a lassopeptide BGC were detected.

Putative gene clusters

Many of the putative BGCs (29%) were shared as similar BGCs between E. salina strains and P. pacifica DSM 14875. They are mostly related to the biosynthesis of primary metabolites, such as a BGC putatively linked to the production of 3-crotonyl-CoA and 3-hydroxybutyryl-CoA. In addition, a conserved PHB synthase identified in E. salina strains and P. pacifica DSM 14875 are probably involved in the synthesis of polyhydroxybutyrate (PHB) from 3-hydroxybutyryl-CoA (Fig. S19).

Metabolomic analysis of four marine myxobacterial strains

Next, we aimed to analyze and compare the metabolomes of the marine myxobacterial strains, in order to see if the bioinformatics results are translatable into actual metabolites. For this type of analysis, the more closely related strains E. salina SWB005, SWB007, DSM15201 and P. pacifica DSM 14875 were selected. The strains were cultivated in liquid medium containing adsorber resin and subsequently extracted and fractionated. The crude extracts and all fractions were analyzed with HPLC coupled with high-resolution mass spectrometry and automated fragmentation (HPLC-HRMS/MS). The resulting MS2 data were used to generate a molecular network consisting of 1251 nodes after removal of media blanks (Fig. 6A). The ion distributions were counted and summarized in a Venn diagram (Fig. 6B). E. salina SWB005 and DSM15201 display the highest metabolic diversity of the four strains with 584 and 556 nodes, respectively, contributing to the network. Interestingly, all four strains show a relatively high percentage of strain-specific nodes, i.e. nodes that were only found in one strain. The most unique metabolome shows E. salina SWB007, where more than half of all nodes (173 of 343) were found to be strain-specific. Surprisingly, only 6–11% of the nodes in each strain were shared in the network by all four strains. Taken together, this analysis points to a large degree of unique metabolism in all four investigated strains under laboratory conditions.

Figure 6 (A) Molecular network of E. salina SWB005, SWB007, DSM15201 and P. pacifica DSM 14875 extracts and fractions. Network is color-coded according to detection from single or multiple strains. Identified specialized metabolites are marked. (B) Venn diagram displaying node counts according to ion distribution in strains. Interactive network is available at http://www.ndexbio.org under the title (Fig. 6.) or by the DOI (http://doi.org/10.18119/N9988C). Full size image

Only a few nodes in the network could be dereplicated as specialized metabolites using the GNPS and our metabolite libraries (Table S3). Salimyxin A and salimyxin B were previously isolated from E. salina15. Both compounds were detected as strain specific metabolites of E. salina SWB005 in this analysis. Retention time and exact mass of all compounds correspond to an authentic standard. Enhygrolide A15, was found in extracts from E. salina SWB005 and SWB007. In order to extend the metabolomic results, the completely uninvestigated E. salina strain SWB006 was included to the investigation. Hence, this strain was fermented, extracted and its metabolomic profile analyzed. Also from this strain, enhygrolide A was identified (Fig. S20). Large scale cultivation of this strain was required to enable isolation of the compound for verification. By this approach, enhygrolide A was isolated and its structure confirmed by NMR spectroscopy (Figs S21 and 22).

One metabolite cluster from the network with a mass range between 883.3554-1332.5815 m/z displayed several characteristic mass shifts of 86.04 Da, which correspond to the loss or gain of hydroxybutyric acid (Fig. S23A). In addition, in the MS2 spectra of these compounds, several hydroxybutyric acid mass shifts were observed (Fig. S23B). Thus, we conclude that this metabolite cluster consists of different molecule weight fragments of the polymer polyhydroxybutyric acid (PHB), which is produced by all the strains. These biopolymers gained interest due to their biodegradability, biocompatibility, the possibility of biosynthesis from renewable resources, and similar physical and chemical characteristics to the ones of petrochemical polymers48. Other compound clusters in the network could be dereplicated with the help of the GNPS library search tool. These include a number of ions annotated as triterpenes/sterols and a large group of phospholipid-related molecules. Finally, with the DEREPLICATOR+ tool available on the GNPS platform49, one metabolite cluster produced by E. salina SWB005 and P. pacifica could be annotated with high confidence as meroditerpenoids related to tetraprenyltoluquinols isolated from marine algae50.