Data reporting

No statistical calculations were performed to pre-determine the sample size.

Sample collection and culture establishment

Surface seawater was collected in the Republic of Palau (longitude = 7.181386° N, latitude = 134.336947° E) on 27 October 2015. Approximately 500 mL of the seawater sample was filtered through 5 µm Isopore membrane filters (Millipore, Billerica, USA), and was subsequently concentrated to approximately 10 mL using 0.6 µm Isopore membrane filters (Millipore, Billerica, USA). The concentrated sample was serially diluted across 12 wells by using a 96-well culture plate filled with ESM medium44. Samples were incubated at 24 °C in the dark for 10 days. ‘Candidatus Uab amorphum’ and other bacterial cells were observed in several wells under light microscope following the incubation period. Single ‘Ca. Uab amorphum’ cells were isolated from the incubated sample with a micropipette, and were transferred into a new 96-well cell culture plate filled with the ESM medium. A small amount of the incubated sample was also added in the new 96-well cell culture plate together with isolated ‘Ca. Uab amorphum’ cell in order to inoculate bacteria as food sources. As a result, a xenic culture that consists of clonal ‘Ca. Uab amorphum’ and unidentified bacteria was established. The xenic culture was maintained in a culture flask or a culture plate in ESM medium at 20 °C in the dark, and was subcultured into A new medium every month.

To establish a monoxenic culture of ‘Ca. Uab amorphum’, Alteromonas macleodii (NBRC 102226), which was obtained from NITE Biological Resource Centre (NBRC), was inoculated into a 96-well culture plate filled with the ESM medium. A single cell of ‘Ca. Uab amorphum’ isolated by the micropipette was added into the culture plate. The monoxenic culture of ‘Ca. Uab amorphum’ was maintained in the same condition as the xenic culture. No antibodies were used during the culture establishment. The monoxenic culture was deposited at the Japan Collection of Microorganisms (JCM) as JCM 39082.

Light and time-lapse microscopic observation

Cells of ‘Ca. Uab amorphum’ and unidentified prey bacteria in the xenic culture were inoculated into glass-bottomed dishes filled with the ESM medium. Cells were incubated for 1–5 days prior to observation. Light micrographs and time-lapse videos were taken using an Olympus IX71 inverted microscope (Olympus, Tokyo, Japan) equipped with an Olympus DP73 CCD camera (Olympus, Tokyo, Japan). Sodium azide (76.9 mM) was added to the xenic culture 15 min before observation. For the feeding experiment, cells of Debaryomyces hansenii JCM 1439, Lactobacillus farciminis JCM 1097 and Staphylococcus condimenti JCM 6074 (obtained from JCM) were added to glass-bottomed dishes together with ‘Ca. Uab amorphum’ and observed immediately under light microscope. Cells of Bathycoccus prasinos NIES-2670 (obtained from the National Institute for Environmental Studies, NIES) were added to glass-bottomed dishes together with ‘Ca. Uab amorphum’, and were incubated for one day prior to light microscopic observation and specimen preparation for transmission electron microscopic (TEM) observation.

DNA extraction, PCR and sequencing

Cells of ‘Ca. Uab amorphum’ and unidentified prey bacteria in the xenic culture were collected by centrifugation. Total DNA was extracted using the DNeasy plant mini kit (Qiagen, Hilden, Germany) according to the manufacturer’s instructions. PCR was performed using primer pairs specific for the bacterial 16S rRNA gene (27F and 1492R)45,46. The PCR cycles (30 cycles) consisted of denaturation at 96 °C for 10 s, annealing at 55 °C for 30 min and extension at 68 °C for 2 min. The amplicon was purified with the QIAquick Gel Extraction kit (Qiagen, Hilden, Germany), and was then cloned into the p-GEM T-easy vector (Promega, Tokyo, Japan). Ten clones were completely sequenced with a 3130 Genetic Analyser (Applied Biosystems, Foster City, CA), using the BigDye Terminator v3.1 cycle sequencing kit (Applied Biosystems). Sequenced clones were assembled into six sequences by the CodonCode Aligner (CodonCode Co., Centerville, MA) with a sequence similarity threshold of 99%. The 16-S rRNA gene sequence of ‘Ca. Uab amorphum’ was deposited in the GenBank database with accession code LC496071.

Fluorescent microscopy

Fluorescein isothiocyanate (FITC)-labelled oligonucleotide probes EUB338 (ref. 47) and PLA886 (ref. 48) were purchased from FASMAC Co., Ltd. (Kanagawa, Japan). EUB338 completely matched all sequences derived from the xenic culture, while PLA886 matched only to the 16S rRNA gene sequence of ‘Ca. Uab amorphum’, and had at least three mismatches to other sequences (Supplementary Fig. 2) The xenic culture of ‘Ca. Uab amorphum’ was cultured on coverslips that were treated with 0.1% (w/v) poly L-lysine (Sigma Chemical Co., St. Louis, MO) for one week prior to observation. Fixation and hybridization were performed as described in a previous study47. Cells grown on coverslips were fixed with 4% (w/v) paraformaldehyde for 16 h at 4 °C, and were then treated with a series of 50%, 80% and 96% ethanol for 3 min each and air-dried. Hybridization buffer containing 0.9 M NaCl, 20 mM Tris/HCl (pH 8.0), 0.01% (w/v) SDS, 5 ng/μL probe and a specific amount of formamide (Supplementary Table 12) were mounted on coverslips and incubated at 46 °C for 1.5 h in humid chambers. In the case of PLA886, an equimolar amount of a competitor oligonucleotide (cPLA886) was added to the hybridization buffer to avoid binding to non-target bacteria. Coverslips were washed in pre-heated (48 °C) washing buffer containing 0.9 M NaCl, 20 mM Tris/HCl (pH 8.0) and 0.01% (w/v) SDS, and were then placed in the washing buffer for 20 min at 48 °C. They were then washed with distilled water and air-dried. Each coverslip was incubated with PBS containing 0.1 µg/mL 4′,6-diamidino-2-phenylindole (DAPI) for 10 min in the dark. Coverslips were washed with distilled water, air-dried, mounted with SlowFade Diamond (Invitrogen, Carlsbad, CA) and sealed with nail polish. Specimens were then observed using a Leica DMRD microscope (Leica, Wetzlar, Germany) equipped with an Olympus DP73 CCD camera (Olympus, Tokyo, Japan).

AcGFP1-labelled Escherichia coli was prepared by transformation using TOP10-competent cells (Thermo Fisher Scientific, MA, USA) and pAcGFP1 Vector (Takara, Tokyo, Japan). Cells of AcGFP1-labelled E. coli were added to glass-bottomed dishes together with ‘Ca. Uab amorphum’ and immediately observed under Nikon A1 confocal microscope (Nikon, Tokyo, Japan). For detection of acidic compartments in ‘Ca. Uab amorphum’ cells, ‘Ca. Uab amorphum’ and AcGFP1-labelled E. coli in glass-bottomed dishes were incubated with 1 μM L−1 LysoTracker Red DND-99 (Thermo Fisher Scientific) in the dark for 1 h. Cells were washed three times by ESM medium and observed under Nikon A1 confocal microscope. For detection of reactive oxygen species, ‘Ca. Uab amorphum’ in glass-bottomed dishes was incubated with 1 μM L−1 LysoTracker Red DND-99 and 1 mM L−1 DCFH-DA (OxiSelect Intracellular ROS Assay Kit, Cell Biolabs, CA, USA) in the dark for 1 h. Cells were washed three times by ESM medium and observed under Nikon A1 confocal microscope.

Electron microscopy

For scanning electron microscopy, cells of ‘Ca. Uab amorphum’ were cultured on 8.5-mm-diameter glass SEM plates (Okenshoji Co., Tokyo, Japan) treated with 0.1% (w/v) poly L-lysine (Sigma Chemical Co., St. Louis, MO) for one week. Cells were pre-fixed with vapour of 4% (w/v) osmium tetroxide (OsO 4 ) for 30 min, and were subsequently post-fixed with 1% (w/v) osmium tetroxide in culture medium for 2 h. Fixed cells were dehydrated in a series of 15–100% (v/v) ethanol. After dehydration, specimens were placed once in a 1:1 mixture of 100% ethanol and t-butyl alcohol, and then twice in 100% t-butyl alcohol, and chilled in the freezer. The specimens were freeze-dried using a VFD-21S (SHINKU-DEVICE, Ibaraki, Japan) freeze drier, and were then mounted on aluminium stubs using carbon paste. Specimens were sputter-coated with platinum–palladium using a Hitachi E-102 sputter-coating unit (Hitachi High-Technologies Corp., Tokyo, Japan), and were observed using a JSM-6360F field emission SEM (JEOL, Tokyo, Japan).

For TEM observation, cells were collected by centrifugation, and were pre-fixed with a mixture containing 1% glutaraldehyde, 0.1 M cacodylate buffer and 0.25 M sucrose for 1 h. Pelleted cells were washed twice with 0.2 M sodium cacodylate buffer (pH 7.2), and were post-fixed with a mixture containing 1% osmium tetroxide and 0.2 M cacodylate buffer. Fixed cells were dehydrated in a series of 30–100% (v/v) ethanol. After dehydration, cells were placed in a 1:1 mixture of 100% ethanol and acetone for 10 min, and 100% acetone for 10 min two times each. Resin replacement was performed with a 1:1 mixture of acetone and Agar Low Viscosity Resin R1078 (Agar Scientific Ltd., Stansted, England) for 30 min, and then with Low Viscosity Resin R1078 for 2 h. Resin was polymerized at 60 °C for 12 h. Ultra-thin sections were prepared on a Reichert Ultracut S ultramicrotome (Leica, Vienna, Austria). Cell sections were double-stained with 2% (w/v) uranyl acetate and lead citrate, and were observed using a Hitachi H-7650 electron microscope (Hitachi High-Technologies Corp., Tokyo, Japan) equipped with a Veleta TEM CCD camera (Olympus, Tokyo, Japan).

Molecular phylogenetic analyses using 16S rRNA gene

We prepared a dataset containing 16S rRNA gene sequences of major bacterial lineages and ‘Ca. Uab amorphum’. The sequences were first automatically aligned in MAFFT v7.27349 with the G-INS-i algorithm at default settings, and were then manually edited with SeaView version 4.650. Ambiguous regions in the alignment were trimmed with SeaView. The final alignment consisted of 71 OTUs and 1307 sites. The best substitution model was searched for using jModelTest 2.1.1051, and the GTR + Γ + I model was selected. A maximum likelihood (ML) tree was heuristically searched for using RAxML version 8.1.1552 under the GTR + Γ + I model. Tree searches began with 20 randomized maximum-parsimony trees, and the highest log likelihood (lnL) was selected as the ML tree. A nonparametric bootstrap analysis with 1000 replicates was conducted using RAxML under the GTR + Γ + I model. A Bayesian analysis was run using MrBayes 3.2.253 with the GTR + Γ + I model for each dataset. One cold and three heated Markov chain Monte Carlo chains with default temperatures were run for 5 × 106 generations; lnL values and trees were sampled at 100-generation intervals. The first 2.2 × 106 generations with an average standard deviation of split frequencies (ASDSF) greater than 0.01 were discarded as “burnin”. Bayesian posterior probabilities (BPP) and branch lengths were calculated from the remaining trees.

In order to conduct molecular phylogenetic analysis of environmental sequences related to ‘Ca. Uab amorphum’, we prepared a 16 S rRNA gene dataset including major subgroups of Planctomycetes, which were selected based on a previous phylogenetic study54. We added several Chlamydiae and Verrucomicrobia OTUs to the dataset as outgroups. Environmental sequences related to ‘Ca. Uab amorphum’ were screened using the BLASTn program, with the 16S rRNA gene sequence of ‘Ca. Uab amorphum’ as query. The top 300 BLASTn hit sequences were added to the dataset. Sequences in the dataset were automatically aligned in MAFFT with the G-INS-i algorithm at default settings, and were then manually aligned and trimmed using SeaView. Preliminary maximum likelihood tree was constructed by RAxML with rapid bootstrap analysis (100 replicates) under a GTR + Γ model. We removed BLASTn-derived environmental sequences that did not form a clade with ‘Ca. Uab amorphum’ from the final dataset. Sequences in the final dataset were aligned and trimmed as shown above; the final alignment consisted of 100 OTUs and 1,351 sites. The best model was identified using jModelTest, and the GTR + Γ + I model was selected. The ML tree was heuristically searched for using RAxML under the GTR + Γ + I model. Tree searches began with 20 randomized maximum-parsimony trees, and the highest lnL was selected as the ML tree. A nonparametric bootstrap analysis with 1,000 replicates was conducted using RAxML under the GTR + Γ + I model. A Bayesian analysis was conducted using MrBayes with the GTR + Γ + I model for each dataset. One cold and three heated Markov chain Monte Carlo chains with default temperatures were run for 5 × 106 generations; lnL values and trees were sampled at 100-generation intervals. The first 1.0 × 106 generations with ASDSF greater than 0.01 were discarded as “burnin”. BPP and branch lengths were calculated from the remaining trees.

We screened for sequences related to ‘Ca. Uab amorphum’ from Tara Ocean 16S rRNA gene sequences10 using BLASTn, with the ‘Ca. Uab amorphum’ 16S rRNA gene sequence and environmental Uab clade sequences as query. We added sequences of BLASTn hits with sequence similarities above 97% and sequence length over 100 bp to the dataset used in the environmental 16S rRNA gene sequence analysis. Sequences in the dataset were automatically aligned in MAFFT with the G-INS-i algorithm at default settings, and were then manually aligned and trimmed using SeaView. Preliminary maximum likelihood tree was constructed by RAxML with rapid bootstrap analysis (100 replicates) under a GTR + Γ model. We removed environmental sequences that did not form a clade with ‘Ca. Uab amorphum’ from the final dataset. Sequences in the final dataset were aligned and trimmed as shown above; the final alignment consisted of 189 OTUs and 1,509 sites. The ML tree was heuristically searched for using RAxML under the GTR + Γ + I model. Tree searches began with 20 randomized maximum-parsimony trees, and the highest lnL was selected as the ML tree. A nonparametric bootstrap analysis with 1,000 replicates was conducted using RAxML under the GTR + Γ + I model.

Genome sequencing

For genome sequencing, total DNA was extracted from the monoxenic culture of ‘Ca. Uab amorphum’ that was cultivated until most prey bacteria were consumed. Long reads (157,852 reads, 1.4 Gb) were sequenced using PacBio RS II (Pacific Biosciences, Melon Park, CA) with one cell (P6–C3 chemistry). Mean read length was 9.0 kbp. Sequencing was performed by Macrogen (Seoul, Korea).

Genome assembly and annotation

Raw long reads were error-corrected and assembled using Canu v1.4, and 49 contigs were acquired. To remove prey sequences, homology search was performed using BLASTn against the NCBI nr database; only the longest contig was derived from Ca. Uab amorphum (Supplementary Table 2), which was manually circularized. Any other sequences, e.g. plasmids, were not found in ‘Ca. Uab amorphum’. The contig was polished using raw long reads by Pbjelly 0.3.055 and Quiver 2.1.0 (Pacific Biosciences). Gene models were constructed using the DFAST web server56 with the Refseq database. Functional annotation was performed using the EggNOG mapper web server57.

Molecular phylogenetic analysis using 171 proteins

Molecular phylogenetic analysis was performed using 171 highly conservative orthologues. The dataset was composed of 29 operation taxonomic units (OTU), which contained most available genomes from planctomycetes. Four species in Verrucomycrobia or Chlamydiae were used as an outgroup. The highly conserved orthologues were searched for by reciprocal blast best hit with the cut-off of coverage ≥30% and similarity ≥40%; 171 proteins (52,272 amino acids) were conserved among ≥90% of the OTU, and were used for the phylogenetic analysis. The amino acids were aligned using Mafft v7.221, and highly diversified regions were manually trimmed on MEGA 758. Model test was performed using IQ-TREE 1.4.359, following the Bayesian information criterion (BIC). Phylogenetic analysis was performed as described above using RAxML version 8.2.9. A nonparametric bootstrap analysis with 200 replicates under the LG + gamma + I model was performed. A Bayesian analysis was run using MrBayes 3.2.6 with the same model for 1.0 × 106 generations; lnL values and trees were sampled at 1000-generation intervals. The initial 25% generations with ASDSF values greater than 0.01 were discarded as “burnin”. BPP and branch lengths were calculated from the remaining trees.

Inference of digestive proteins in the FV

Subcellular localization of proteins was predicted using PSORTb 3.060, Cello v.2.561 and LocTree362 with an organism option: Gram-negative bacteria or bacteria. Proteins were categorized into “secreted” or “extracellular” by at least one of the three prediction programmes that were considered a candidate FV protein. Digestive proteins were identified using the KEGG database. Digestive proteins for DNA/RNA were searched for using GO terms (GO:0004518). Digestive proteins for peptidoglycan were searched for using the HyPe web server63.

Detection of horizontal gene transfer

Genes putatively derived by HGT were identified using HGTector v0.2.164 with the following cut-off values: e-value = 1e–20, identity = 30 and coverage = 40. In this analysis, based on the phylogenetic relationships, “selfGroup” and “closeGroup” were set as ‘Candidatus Brocadiaceae’ (taxonomic ID: 1127830) and the PVC (Planctomycetes–Verrucomicrobia–Chlamydiae) group (taxonomic ID: 1783257), respectively.

In addition, we searched Uab homologues of the 347 eukaryote-specific proteins (ESPs)21 and actin-binding proteins37 in the Saccharomyces Genome Database65 by BLASTp with the cut-off: E-value < 1E–5. The Uab proteins with any hits were re-checked by BLASTp against the NCBI nr database.

Molecular phylogenomic analyses of single proteins

For molecular phylogenetic analyses of single proteins (α-amylase, actin, acyloxyacyl hydrolase (AOAH), phospholipase C (PLC), diacylglycerol acyltransferase (DGAT), carboxypeptidase, DNase I and EPT1), we screened each protein sequence by BLASTp, against the NCBI nr database and constructed the dataset. Datasets were aligned by MAFFT v7.273, and were then manually edited with SeaView version 4.6 or MEGA 7. The final alignments consisted of 331 amino acid positions and 17 OTUs for α-amylase, 362 amino acid positions and 103 OTUs for actin, 534 amino acid positions and 28 OTUs for AOAH, 255 amino acid positions and 17 OTUs for PLC, 408 amino acid positions and 17 OTUs for DGAT, 206 amino acid positions and 6 OTUs for carboxypeptidase, 173 amino acid positions and 11 OTUs for DNase I and 154 amino acid positions and 10 OTUs for EPT1. ML trees were constructed using IQ-TREE 1.5.5 following the best-fit model, which was chosen in accordance with BIC (WAG + I + G4 for α-amylase, LG + G4 for actin, DGAT and DNase I, LG + I + G4 for AOAH and PLC, WAG + G4 for carboxypeptidase and mtZOA + I + G4 for EPT1) with 200 replicates of nonparametric bootstrap. A Bayesian analysis was run using MrBayes 3.2.6 with the LG + Γ model for the actin dataset. One cold and three heated Markov chain Monte Carlos with default temperatures were run for 1 × 107 generations; lnL values and trees were sampled at 100-generation intervals. The first 7 × 106 generations with an average standard deviation of split frequencies (ASDSF) value greater than 0.03 were discarded as “burnin”. Bayesian posterior probabilities (BPP) and branch lengths were calculated from the remaining trees.

Description of ‘Candidatus Uab amorphum’

‘Candidatus Uab amorphum’ (U.a.b masc. n. a giant of Palauan mythology. a.mor’phus. L. neut. adj. amorphum amorphous, deformed).

Marine free-living aerobic Gram-negative bacterium was collected from surface seawater in the Republic of Palau (7.181386° N, 134.336947° E). Cells are flattened and round or oval shape with granular cytoplasm (Fig. 1a–c). Flagellum is absent. Cells were 3.2–7.8 μm (4.5 ± 0.85 μm) in the long axis and 2.8–5.5 μm (4.0 ± 0.57 μm) in the short axis (n = 79). When prey bacteria were abundant in the culture, cells occasionally reached 10 μm in diameter (Supplementary Fig. 1a). Cells attach to substrate and show gliding motility with changing shape (Supplementary Movie 1). Cells reproduce by binary fission. Periplasm (or paryphoplasm) highly invaginates into the cytoplasm (or pirellulosome) and shows a reticulate pattern (Fig. 3a–c). The cytoplasm includes multiple nuclear bodies (Fig. 3a, c). The cell interior includes four types of fibrous structures: large striated fibres, small striated fibres, short bundle fibres and cylindrical structures that contain linear fibres (Fig. 5). Cells can engulf bacteria and picoeukaryotes (Bathycoccus prasinos) (Fig. 1d–o, Fig. 3d–h, Supplementary Fig. 7, Supplementary Movies 2 and 4). Engulfed bacteria and algae can be found in phagosome-like vacuoles (PVs) (Fig. 2f–h, Supplementary Fig. 7s–t). Some PVs connect to other phagosome-like structures or outside of the cell by narrow ducts (Supplementary Fig. 8a, b). Similarity of the 16 S rRNA gene sequence (LC496071) to that of the closest species (the planctomycete ‘Candidatus Kuenenia stuttgartiensis’) is 79% (Supplementary Table 1). The genome of ‘Ca. Uab amorphum’ (AP019860) is circular and 9,503,110 bp in size. The G + C content of the genome is 39.4 mol%. A co-culture of ‘Ca. Uab amorphum’ has been deposited at the JCM as JCM 39082.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.