The candidate phylum Poribacteria is one of the most dominant and widespread members of the microbial communities residing within marine sponges. Cell compartmentalization had been postulated along with their discovery about a decade ago and their phylogenetic association to the Planctomycetes, Verrucomicrobia, Chlamydiae superphylum was proposed soon thereafter. In the present study we revised these features based on genomic data obtained from six poribacterial single cells. We propose that Poribacteria form a distinct monophyletic phylum contiguous to the PVC superphylum together with other candidate phyla. Our genomic analyses supported the possibility of cell compartmentalization in form of bacterial microcompartments. Further analyses of eukaryote-like protein domains stressed the importance of such proteins with features including tetratricopeptide repeats, leucin rich repeats as well as low density lipoproteins receptor repeats, the latter of which are reported here for the first time from a sponge symbiont. Finally, examining the most abundant protein domain family on poribacterial genomes revealed diverse phyH family proteins, some of which may be related to dissolved organic posphorus uptake.

Funding: This publication was funded by the German Research Foundation (DFG) and the University of Wuerzburg in the funding programme Open Access Publishing. URL: http://www.bibliothek.uni-wuerzburg.de/en/homepage/ . Financial support to U.H. was provided by the SFB630-grant TPA5, the SFB567-grant TPC3, and by the Bavaria California Technology Center (BaCaTeC). T.W., C.R., P.S., N.I and K.M. were funded by the United States Department of Energy Joint Genome Institute, Office of Science of the United States Department of Energy under Contract No. DE-AC02-05CH11231. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

This is an open-access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.

Single-cell genomics is a powerful tool to describe genomes of as yet uncultivated organisms from diverse environments [1] , [2] . Recently it allowed a first glimpse into the vast functional diversity represented by genomes of previously largely uncharacterized candidate phyla [3] . This method further revealed the glycobiome of the candidate phylum Poribacteria, symbionts of marine sponges, based on six single-amplified genome (SAG) sequences [4] . In this study we further examined these SAGs for phylogenetic and additional functional features of Poribacteria. Poribacteria were first discovered as highly abundant symbionts of marine sponges [5] and as of now lack any cultivated representatives. Through amplicon sequencing studied based on 16S rRNA genes they were also detected in seawater albeit in low abundances [6] – [8] . Poribacteria are one of the most predominant taxa inhabiting the extracellular matrix (mesohyl) of sponge species around the world [9] – [11] . These symbionts are vertically transmitted over larval stages from the adult sponge to the next generation [7] , [12] . Initially, the candidate phylum Poribacteria showed a moderate phylogenetic relationship to Planctomycetes, Verrucomicrobia, and Chlamydiae (PVC superphylum) based on monophyletic clustering in 16S rRNA gene analysis [5] . Later, Poribacteria were classified as members of the PVC superphylum although the exact position within the superphylum could not be completely resolved [13] . Similar to some members of the PVC superphylum Poribacteria were also suspected to have a compartmentalized cell plan [5] . In this study we revisited the features of phylogeny and cell compartmentalization based on the sequence data of six single-cell derived genomes from the candidate phylum Poribacteria. We further reveal a large abundance and diversity of eukaryote-like domain containing proteins as well as phyH-like proteins in Poribacteria.

For the calculation of the bacterial phylogenetic tree we followed the procedure described by Rinke et al. [3] based on a custom marker set of 83 bacteria specific markers ( Table S1 ) described in the study. Briefly, single-cell genome assemblies of Poribacteria were translated into all six reading frames and marker genes were detected and aligned with hmmsearch and hmmalign included in the HMMER3 package [22] using HMM profiles obtained from phylosift ( http://phylosift.wordpress.com/ ). Extracted marker protein sequences were used to build concatenated alignments of up 83 markers per genome. Alignments were included into the database constructed by Rinke and coworkers [3] and reference sequences were selected for phylogenetic tree construction. Phylogenetic inference methods used were the maximum likelihood based FastTree2 [23] and a custom RAxML bootstrap script originally provided by Christian Goll and Alexandros Stamatakis (Scientific Computing Group, Heidelberg Institute for Theoretical Studies, Germany) and modified by Douglas Jacobsen (Bioinformatics Computing Consultant, LBNL, Berkeley, USA). The script requires two input files, the alignment file as PHYLIP format and a starting tree calculated by RAxML-Light [24] . The script workflow is briefly summarized as follows: First RAxML version 7.3.5 [21] creates bootstrap replicates of the multiple sequence alignments and stepwise addition order parsimony trees as starting points for the maximum likelihood search, based on user defined rate heterogeneity and substitution models. Next RAxML-Light [24] is run on every bootstrap replicate. After all RAxML-Light runs are finished the resulting replicate trees are fed into RAxML to calculate the bootstrap support values which are drawn upon the starting tree. The rate heterogeneity and amino acid evolution models used were GAMMA and LG for the custom RAxML bootstrap script, and CAT approximation with 20 rate categories and Jones-Taylor-Thorton (JJT) for FastTree2. To evaluate the robustness of the protein trees we used seven different out-group taxon configurations ( Table 1 ).

Sequences for 16S rRNA gene based phylogenetic analysis were selected from the SILVA 16S rRNA database version 108 [17] in the ARB software package (V5.3) [18] . All poribacterial 16S rRNA sequences (≥1100 bp) available in GenBank by June 2013 and the 16S rRNA sequences of poribacterial single-cell genomes were included. Additional sequences for the candidate phyla Aerophobetes (CD12) and Hydrogenedentes (NKB19) were obtained by blast searches [19] of reference sequences (accession number JN675971 for CD12 and CR933119 for NKB19) against Genbank nr/nt database in June 2013 and selecting the 100 best hits with >75% sequence ID and sequence length ≥1100 bp. All sequence added to the original database were aligned using the SINA aligner [20] and included into the ARB database for further manual refinement. Alignments were exported from ARB for phylogenetic tree construction using RAxML (v7.3.2) [21] . Maximum likelihood trees were constructed using sequences ≥1100 bp only and 50% conservation filters. Bootstrap analysis was carried out with 500 resamplings. Trees were reimported into ARB and sequences <1100 bp were added to the tree using the parsimony interactive tool in ARB without changing tree topology.

Please also note that the initial version of genome WGA 3A (first published as WGA A3 with accession number ADFK00000000 version ADFK01000000) [14] was found to be flawed. It was corrected accordingly and the submission to Genbank was updated (version ADFK02000000) [4] . All genomic information of WGA 3A in this manuscript is based on the latest version of the genome, which should be used for all future studies. For a detailed description of all steps from sample collection to genome assembly and annotation please refer to Kamke et al. [4] . Genome sequences were automatically annotated via the IMG pipeline [15] and manually curated in IMG/MER. All analyses were conducted using the tools in IMG/MER unless further specified.

Results and Discussion

Eukaryote-like Repeat Proteins Eukaryote-like repeat domain containing proteins have received much recent attention in sponge microbiology and their involvement in mediating host-microbe interactions has been postulated. Especially ankyrin (ANK) and tetratricopeptide repeats (TPR) have been in focus of such investigations [37]–[39]. To examine the role of these domains on poribacterial SAGs we searched for proteins with pfam hits to repeat and eukaryote-like domains in the IMG/MER database and also compared these to all finished genomes of free-living marine bacteria available in the IMG database in July 2013 (n = 98). We detected 41 such domains on poribacterial SAGs. The majority of these showed a higher domain frequency per total genes on at least one poribacterial SAG when compared to the average frequency of this domain on genomes of free-living marine bacteria (Fig. 4, Table S7). For 14 pfam domains the frequency on poribacterial genomes was even higher than the maximum frequency of this domain on the genome of any free-living marine bacterium. Many domains occurred simultaneously on the same genes with a total of 668 domains in all poribacterial SAGs on 490 encoded proteins (3A: 15 domains on 11 genes, 3G: 335 domains on 240 genes, 4C: 95 domain on 75 genes, 4CII: 24 domains on 16 genes, 4E: 181 domains on 135 genes, and 4G: 17 domains on 8 genes). PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Figure 4. Bar plot showing frequency of eukaryote-like pfam domains found on poribacterial SAGs in comparison to the average and maximum frequency on all finished genomes of marine free-living bacteria available in IMG in July 2013. https://doi.org/10.1371/journal.pone.0087353.g004 Among the most abundant domains were TPRs with pfams 013414, 00515, 07719, 13432, 13174, and 13181, which were also represented by eight other pfams (13424, 13374, 13371, 09976, 13431, 13429, 13428, and 13176) but in lower abundances. We were also able to find Sel1 repeat like proteins domains encoded on poribacterial SAGs 3G and 4E (0.02 and 0.15% of total genes, respectively) which have a similar structure to TPRs [40]. In total TPRs represented the highest frequency of repeat domains on poribacterial SAGs. Furthermore WD40 domains (pfam00400) were highly abundant on poribacterial SAGs, as well as two-copy leucin rich repeats (LRR) (pfam 12799), and the VCBS domain (pfam 13517) which is a domain found in high numbers in the genera Vibrio, Colwellia, Bradyrhizobium and Shewanella. Pfam domain 07593- ASPIC and UNbV was also present on several poribacterial SAGs in multiple copies. ANK repeat domains were detected (pfam 12796, 13637, 13857, and 00023) in lower numbers on a total of 14 genes on SAGs 3G, 4C and 4E (Table S7). The frequency of genes with pfam domains representing ankyrin repeats was often higher than average compared to the genomes of free-living marine bacteria (Table S7). The occurrence of low-density lipoprotein (LDL) receptor repeat class B domains (pfam00058) on poribacterial genomes seemed noteworthy. We found these domains on one gene in each SAG 4C and 4E as well as on five genes in SAG 3G. Outside of Poribacteria this domain has only been found in proteins of 14 bacterial genomes but not in archaeal genomes publically available at the IMG/MER database in July 2013. Most of these bacterial hits however do not show the tandem repeats that are characteristic for this domain in eukaryotes. Such tandem repeats were only detected in the poribacterial proteins and proteins of four other bacterial genomes. Amongst these were free-living marine cyanobacteria (Cyanothece species, Pleurocapsa sp. PCC 7327), the marine deep sea piezophile Mortiella sp. PE36, and the strictly anaerobic bacterium Paludibacter propionicigenes WB4, DSM 17365. The LDL receptor is best described in mammals where they transport ligands into the cell for degradation by lysosomes and plays a role in cholesterol homeostasis [41]. The LDL repeat domain class B is part of the region of the LDL receptor which is responsible for ligand release and receptor recycling [42]. Virtually nothing is known about such domains in bacteria and it remains to be investigated whether there is a real connection to eukaryotic domains. Although the limited data did not allow for any functional assignments of the LDL receptor genes, a role on the cell surface seems very likely in Poribacteria since all of the discovered genes with these domains had predicted transmembrane helices (TMHs) (∼86%) with the majority of the protein located outside of the cell or signal peptides (SPs) (∼14%). TMHs and SPs were also frequently predicted on genes representing other eukaryote-like proteins of Poribacteria (Table S8 and S9). High abundances (≥50% of genes with this pfam) of either TMHs or SPs were found on genes also encoding for bacterial Ig like domain protein genes, PQQ enzyme repeat containing genes, fibronectin type III domain and cadherin domain genes. Also genes with some of the pfams domains representing LRR and TPRs showed strong representation of TMH and SPs. Additionally, many poribacterial eukaryote-like domain genes (especially WD40 repeats) encoded for a domain potentially belonging to the Por secretion system C-terminal sorting domain family (TIGR04183) (Table S9), which is characteristic of proteins with outer membrane locations [43]–[45]. Since structural genes of the Por secretion system were not found on poribacterial genomes a potential secretion pathway for gene products with this domain remains to be revealed. Our findings support previous reports of repeat and eukaryote-like domains being highly abundant in symbionts of marine sponges. The identification of proteins with these domains from the microbial communities of the sponge Cymbastella concentrica by ways of metaproteogenomics [46] might point towards an active functional role of these proteins. ANK domain proteins of sponge symbionts have been suspected to be involved in preventing phagocytosis by the sponge host as in analogy to similar functions of ANK domain proteins in bacterial pathogens Legoniella pneumophila or Coxiella burnetti [39], [47]. Indeed, in a recent paper Nguyen et al. [48] were able to show that ANK proteins from a marine sponge symbiont that were expressed in E.coli prevent phagocytosis of the bacterial cells by amoeba. The authors suggested this to be a function of sponge symbionts to avoid digestion by their host [48]. Thus, poribacterial ANK proteins may also facilitate similar functions. LRRs have been found in proteins of pathogenic bacteria such as Yersinia species where LRRs are part of important virulence factors [49] or Listeria monocytogenes which encodes for LRR containing protein InlB that aids in host cell invasion [50]. Also TPRs were shown to be involved in different functions of pathogenesis [51] and fibronectin domains were shown to play a role in host-pathogen interactions as well, although in this case bacterial proteins bind to the fibronectin domains of the host protein [52], [53]. It would be interesting to explore whether bacterial fibronectin domains might be used in a similar way. Furthermore, fibronectin III domains have been found in polysaccharide degrading extracellular enzymes of Clostridium thermocellum [54]. Hentschel et al. [47] speculated that such functions in sponge symbionts could be connected to interactions with molecules of the sponge host extracellular matrix and our recent investigations of poribacterial carbohydrate degradation potential [14] support this hypothesis. However, at the current stage, we are just beginning to decipher the real functions of eukaryote-like proteins in Poribacteria. As many of these proteins may not be located outside of the poribacterial cell, as indicated by the large amount of proteins detected without TMHs or SPs (Table S9), it appears likely that at least some may mediate intracellular protein-protein interactions.