Confinement correlates with reduced microbial diversity

Opposing structures of UB and CB were accompanied by a significant loss (Spearman’s rank correlation rho, correlation coefficient: -0.8783, P = 0.02131; one-sided t test: n = 9, t = -3.2, df = 2.6, P = 0.03) of taxonomic diversity (Shannon–Weaver indices: CB 7.2 H’, UB 8.8 H’) (Fig. 1a and Supplementary Table 1). In contrast, the functional diversity between UB and CB remained balanced (10.8–11.1 H’ according to SEED annotations; Fig. 1b). The analysis of 16S rRNA sequences showed even clearer differences between CB (5.6 H’) and UB (7.2 H’) due to lower diversity estimates for ICU samples (3.8 H’) and a higher diversity for private houses (6.4 H’) (Supplementary Fig. 1). These differences in diversity estimates were observed in the presence of constant bacterial abundances (~ 106–107 16S rRNA gene copies per m2), with a higher variability for the fraction of intact cells (~103–107 16S rRNA gene copies per m2). However, diversity estimates did not correlate with the proportion of intact cells (Spearman’s rank correlation rho, correlation coefficient: 0.2, P = 0.4).

Fig. 1 Microbial diversity estimates. Calculations were executed in MEGAN according to the results of the BLASTx searches against NCBInr. Data of single reads were filtered (unassigned reads were removed) and normalized (randomly and repeatedly subsampled to the smallest sample size). Violin plots showing the kernel probability density of the data, including a box with the median and the interquartile range, were created in R. a Significant differences of Shannon diversity estimates of microbial communities on species level in CB (confined) and UB (unrestricted built environments). b Similar Shannon diversity estimates of microbial functions on highest SEED levels (individual functional gene levels, level 5) in CB (confined) and UB (unrestricted built environments) Full size image

Environmental differences correlate with the microbiome

Shotgun metagenome samples from public buildings and public houses were more similar to each other than samples obtained from private houses according to Principal Coordinates Analysis (PCoA) ordinations and Unweighted Pair Group Method with Arithmetic Mean trees. Even greater dissimilarities were observed between samples from UB and CB. Moreover, 16S rRNA-based population structure indicated lower dissimilarities for UB (mean Bray–Curtis distance 0.71) than for CB environments (mean Bray–Curtis distance 0.82; Fig. 2 and Supplementary Fig. 2).

Fig. 2 Connection between different built environment types. UPGMA tree (Unweighted Pair Group Method with Arithmetic Mean tree) of sampled built environments based on different microbial communities resolved to species level. Calculations were executed with MEGAN according to the results of the BLASTx searches against NCBInr. Data of single reads were filtered (unassigned reads were removed) and normalized (randomly and repeatedly subsampled to the smallest sample size). Color code for column environment: blue (cleanroom facility); red (intensive care unit); dark green (public buildings); light green (public houses); yellow (private houses) Full size image

Different categories of sampled built environments could be characterized by distinct compositions of the metagenomic reads even on the superkingdom level (Supplementary Fig. 3). Hence, proportions of bacteria vs eukaryota (mainly sequences assigned to humans) decreased significantly (one-sided t test: n = 9, t = 3.4, df = 2.0, P = 0.04) from UB (~ 99% bacteria, ~ 1% eukaryota) towards CB (for bacteria: cleanroom ~ 69% and its gowning area ~ 85%; ICU ~ 55%). A similar pattern could be observed for archaea, although not significant (one-sided t test: n = 9, t = 1.9, df = 2.0, P = 0.1), with higher counts (~ fourfold) in CB. Traces of viruses were less apparent between CB and UB, but showed highest relative abundances in the ICU and in the environment of public houses. Clear differences continued into higher taxonomic levels (Supplementary Fig. 4 and Supplementary Fig. 5): on the phylum level, public buildings and public houses were dominated by sequences of Actinobacteria (up to 50%) and Proteobacteria (~ 21%). In private houses, the proportion of Firmicutes raised up to 55%. Likewise, the proportion of Firmicutes was also higher after masking the DNA of compromised cells with propidium monoazide (PMA). In CB, the prevalence of bacterial phyla was reduced and proportions of multicellular organisms and not assignable sequences increased (up to 62% in the cleanroom). Furthermore, Pseudomonas, Porphyromonas, Propionibacterium, and Prochlorococcus could be identified as significant discriminative features (Supplementary Fig. 6) in CB by LEfSe (linear discriminant analysis of the effect size) analysis. Besides these bacterial taxa, also viral sequences (e.g., human herpes and papillomavirus) and assignments to arthropods (e.g., mites like Trombidiformes and Prostigmata) and insects (e.g., lices such as Liposcelis bostrychophila and cockroaches like Blattella germanica) were defined as discriminative features for CB.

The core 16S rRNA gene microbial profile was visualized in a core operational taxonomic unit (OTU) network (Supplementary Fig. 7). This analysis indicated a high proportion of shared OTUs assigned to Acinetobacter and Staphylococcus as well as a bigger overlap of samples from the cleanroom facility and unrestricted buildings compared to the core of samples from the ICU environment.

To correlate microbial community composition with environmental parameters, a bioenv test with Spearman rank correlations compared to Euclidean distances was applied on the 16S rRNA gene profile. This bioenv analysis showed higher correlations of samples with latitude, longitude, and sea level (best variable combination ρw = 0.9425) than with temperature, humidity, and room variables, like the surface area, room height, or room volume (best variable combination ρw = 0.7518). These correlations were further visualized as vectors on an Non-metric multidimensional scaling (NMDS) ordination of the sampled communities together with calculated ellipses per sampling category (Fig. 3). This ordination showed distinct clusters for samples obtained from the surface of tiles in private houses, the sanitary environments in public houses and public buildings, or that ICU floors and ICU workplaces overlapped with samples from medical devices. However, associations of the microbiome with environmental variables like biogeography or microclimate could not be further supported or differentiated due to confounding variables (see Supplementary information).

Fig. 3 Environmental variables associated with the microbiome of sampled built environments. NMDS of 16S rRNA gene amplicons based on Bray–Curtis distances with superimposed vectors representing Spearman correlations of measured environmental variables (bioenv) based on Eucledian distances. Color code for column environment: blue (cleanroom facility); red (intensive care unit); dark green (public buildings); light green (public houses); yellow (private houses) Full size image

In general, the composition of the microbiome was so distinct that the associated metadata categories could be predicted by supervised learning methods (random forest classification and regression models). Samples from CB or UB could be predicted with a high overall accuracy of 92%. Likewise, numerical environmental parameters such as temperature (R = 0.92, P = 4.8 × 10-5), relative humidity (R = 0.89, P = 3.3 × 10-4), longitude (R = 0.95, P = 2.8 × 10-6), and sea level (R = 0.82, P = 3.3 × 10-3) could be easily predicted. Microbial abundances (R = 0.63, P = 0.12) and respective room areas (R = 0.58, P = 0.24) were not suitable to build predictive models from observed features.

Changed functional capabilities were evident on genome levels

Assembled contigs and scaffolds could be binned into 125 draft genomes (8–20 bins per sample). Most binned genomes were recovered from samples of private houses, while only a few genomes could be reconstructed from the ICU dataset (Supplementary Table 2). A subset of 44 draft genomes (representing 45% of all assembled contigs) were sufficient in quality for an in-depth analysis. Annotations, replication activity, and predicted phenotypes of these binned genomes were significantly representative for CB or UB environments (Fig. 4). Hence, according to iRep, replication rates were lower in CB (two-sided two-sample Kolmogorov–Smirnov test: D = 0.68, P = 0.005) and ranged from 2 to 6 replication events for 10–75% of the sampled population. According to Phenotype Investigation with Classification Algorithms (PICA), several distinct phenotypes could be predicted (46 individual chi square tests, Bonferroni correction P = 0.02) on genome and marker-gene levels. Therefore, significant phenotypic traits for CB covered alkane degradation, benzoate degradation by hydroxylation, trimethylamine production by choline, T4 and T6 secretion systems, and plant pathogenicity based on thaxtomins, while arsenic detoxification and facultative anaerobes were specific for UB. Overall, Gram-positive bacteria (P = 0.004) with functions associated with carbohydrate and amino acid metabolism dominated in UB. On the contrary, Gram-negative bacteria with many functions associated with virulence, disease (P = 0.008), defense (P = 5.2 × 10-5), and resistance (P = 0.08) were representative for CB (P values were calculated by Kruskal–Wallis tests; Supplementary Figs. 8–11).

Fig. 4 An overview of reconstructed genomes. High-quality binned genomes clustered by average nucleotide identity (ANI), resolved to highest taxonomic levels, respective built environment origins, and respective replication rates (activity). Color code for column environment: blue (cleanroom facility); red (intensive care unit); dark green (public buildings); light green (public houses); yellow (private houses) Full size image

Genomes assigned to Exiguobacterium (V = 0, P = 2.2 × 10-11) and Macrococcus (V = 0, P = 1.0) were commonly recovered from diverse UB environments. Genomes of Arthrobacter (V = 465.5, P = 2.9 × 10-15) and Janibacter (V = 0, P = 0.3) were more specific for the category of public buildings and public houses. Enhydrobacter (V = 0, P = 1.0), Kocuria (V = 0, P = 8.3 × 10-4), and Pantoea (V = 225, P = 1.2 × 10-9) were found additionally in private houses together with Lactococcus (V = 9, P = 1.0) and Staphylococcus (V = 3445, P = 0.01). Leuconostoc (V = 169, P = 0.9) marked the transition from private houses to ICU. And finally, genomes assigned to Propionibacterium (V = 2697, P = 0.01), Pseudomonas (V = 133530, P = 2.9 × 10-15), and Stenotrophomonas (V = 97.5, P = 0.07) were characteristic to all CB environments (P values from Wilcoxon signed rank tests; Fig. 4). Representative taxonomic assignments for distinct built environments were supported by data of the single-read analysis (Supplementary Figs. 5 and 6) and 16S rRNA gene amplicons (Supplementary Fig. 12).

Genomes assigned to the genus of Acinetobacter (median completeness 94%, median contamination 20%) were highly prevalent and ubiquitous in all sampled built environments. This has allowed a detailed comparison of closely related bacterial species from different maintained built environments regarding changed functional properties on pan-genome levels. Genomes of Acinetobacter from private houses, the ICU, the cleanroom and its gowning area shared a core genome with 24–39% of all CDS (proportion of core coding DNA sequences to all coding DNA sequences in a genome). Coding genes in the recovered genome of Acinetobacter (e.g., Acetyl-CoA acetyltransferase fadA or alcohol dehydrogenase frmA) from the ICU showed the biggest overlap with this core (39%) and less strain-specific CDS (784) than genomes of Acinetobacter from the private houses (2857 strain-specific CDS, 24% of the core genome). Regarding all binned genomes, the ICU environment showed the greatest density (highest grade of similarity) for its core genome (0.2% core CDS) compared to all other sampled built environments (Supplementary Table 3). Differences in the pan-genome of Acinetobacter were especially striking for functions associated with virulence, disease, and defense. In CB, the number of assigned functions to these categories almost doubled compared to UB.

In general, functional traits were more evenly distributed over all sampled indoor spaces compared to microbial profiles (Supplementary Figs. 13–16). Nevertheless, a detailed LEfSe analysis based on SEED annotations revealed functions associated to Gram-positive bacteria (Gram-positive cell wall components, heme and hemin uptake, and utilization in Gram positives), fatty acid metabolism (fatty acid lipids, isoprenoids, teichoic and lipoteichoic acid biosynthesis), DNA repair systems (DNA repair UvrABC system, DNA repair bacterial Rec FOR pathway, and transcription repair-coupling factor), and heatshock (heatshock dnaK gene cluster) as significant discriminative features of UB. On the contrary, functions associated with Gram-negative bacteria (Gram-negative cell wall components), iron acquisition (ferrichrome iron receptor, TonB-dependent siderophore receptor, and siderophore pyoverdine), oxidative stress, membrane transport and secretion (Ton and Tol transport systems, RND efflux system inner membrane transporter CmeB, Type III, IV, VI ESAT secretion systems), virulence (virulence disease and defense), and resistances (resistance to antibiotics and toxic compounds, multidrug resistance efflux pumps, cobalt zinc cadmium resistance protein CzcA) were identified to be representative for CB. A comparison of all annotated SEED functions with the RAST server9,10 revealed a high proportion of functions associated with amino acid and carbohydrate metabolism for UB (Supplementary Fig. 17). In contrast, genomes from CB indicated a shift towards other functions like virulence, disease, and defense. Especially, genomes from the cleanroom environment showed much more evenly distributed functional capabilities for all functional groups and, additionally, many functions associated with stress response.

Differences were reflected by the resistome

Due to distinct profiles and our interest in functions related to virulence and resistance, we captured the virulome (entity of virulence factors) and resistome (entity of resistances against antibiotics) of CB and UB in greater detail. Slightly more virulence genes (VFDB) were detected for genomes of CB (19) than of UB (18). Highest proportions of virulence genes were present inside the ICU, followed by public and private houses. Lowest counts were visible for the highly unrestricted environment of public buildings. Hence, chromosomally encoded bacterial virulence in CB and UB was likely associated with its distinct microbial profiles. However, differences in proportions were not significant.

Compared to the virulome, the resistome showed clearer differences for CB vs UB. Using CARD (Comprehensive Antibiotic Resistance Database), 377 different resistance features could be identified for the 42 selected high-quality binned genomes and 91 extracted plasmids. Detected resistance genes were manually curated (removal of only mutation and regulation-mediated resistances according to ref. 11) for a detailed analysis of intrinsic (124) and mobile (186) resistance features. The resistome of CB and UB as well as resistances from genomes and plasmids differed significantly (Permutational Multivariate Analysis of Variance test: n = 37, pseudo-F = 3.8, P = 0.004 and pseudo-F = 4.0, P = 0.002; Fig. 5 and Supplementary Fig. 18). UB showed more often mobile (10 vs 6%), transposable (36 vs 13%), replication (29 vs 10%) and slightly more virulence (6 vs 4%) factors or elements on their extracted plasmids than CB. Overall, interconnections of the resistome between genomes and extracted plasmids were very rare. Only a few genes encoding diverse efflux pumps (pmrA and acrA) could have been transferred between genomes and extracted plasmids of Exiguobacterium sibiricum, Streptococcaceae (both from UB), and Stenotrophomonas maltophilia (inside the cleanroom facility), respectively (Fig. 6), since they were detected in the same environment and/or recovered from similar genomes. However, the role they might have in resistance, particularly acrA, which forms the part of an intrinsic tripartite Enterobacteriaceae efflux pump, remains obscure. CB showed significantly higher abundance of elements involved in intrinsic resistance, including efflux pumps and stress-resistance determinants (e.g., as identified by LEfSe analysis, the multidrug efflux proteins mexK and mexB, and the catalase peroxidase-activating isoniazid katG in all CB environments). Besides built environment-specific profiles, species-specific patterns of the resistome were also observed; for instance, smeA in S. maltophilia (multidrug efflux) and salA in genomes of Macrococcus caseolyticus (possible resistances against lincosamides and streptogramins; Fig. 7a, b).

Fig. 5 Diversity estimates of detected resistance features. Significant differences in Shannon diversity estimates of different resistance features (highest levels, level 3) of the CARD database inside CB (confined) and UB (unrestricted built environments) as well as on binned genomes and plasmids. Data were normalized (rarefied). CARD, Comprehensive Antibiotic Resistance Database Full size image

Fig. 6 Resistance network of genomes and plasmids. Potentially transferred (edge-connected) resistance genes (CARD database) according to their presence/absence in binned genomes and plasmids inside the same built environment. Edge-weighted spring-embedded algorithms implemented in Cytoscape were used for visualizations. Filled circles represent genomes and empty circles, plasmids. Most abundant resistance genes were used for labeling and correlated to circle sizes. Colors are defined by respective built environments: blue (cleanroom facility); red (intensive care unit); dark green (public buildings); light green (public houses); yellow (private houses). CARD, Comprehensive Antibiotic Resistance Database Full size image

Fig. 7 Proportion of CARD categories and drug classes. a Higher categories of the resistome according to CARD per environment (CB and UB), nucleotide structure (binned genomes and plasmids), and for individual binned genomes (referring to individual species). b Drug classes and their conferred resistance to them according to CARD per environment (CB and UB), nucleotide structure (binned genomes and plasmids), and for individual binned genomes (referring to individual species). CARD, Comprehensive Antibiotic Resistance Database Full size image

Further differences between CB and UB were also evident in terms of potentially conferred resistances against distinct drug classes. CBs were relatively enriched by resistances against fluoroquinolones (W = 1705, P = 0.4) and triclosan (W = 1666, P = 0.02) compared to UB. In turn, UBs were more representative of resistances against aminoglycoside (W = 1842, P = 0.007), diaminopyrimidine (W = 1384.5, P = 0.7), and macrolide-based antibiotics (W = 1598.5, P = 1.0; P values from Wilcoxon signed-rank tests). Regarding their location, genes encoding beta-lactam, phenicol, and streptogramin resistance were more common in binned genomes, while extracted plasmids could mediate more resistances against fluoroquinolones, aminoglycosides, and diaminopyrimidines. Likewise, genomes of Arthrobacter arilaitensis showed many resistances against fluoroquinolones, while genomes assigned to Acinetobacter sp., Pseudomonas sp., and Sphingobium were rich in resistances against tetracyclines. Stenotrophomonas maltophilia harbored many resistances to both drug classes. On the contrary, more unspecific multidrug resistances were frequently common for Staphylococcaceae, Macrococcus caseolyticus, and Exiguobacterium sibiricum.

The core resistome of individually binned genomes was much more coherent (100% of core resistance genes in all genomes) than the core resistome of extracted plasmids or the different built environment categories (only 20–30% of core resistance genes in all plasmids). These data agree with the concept of intrinsic resistomes as a set of resistance genes present in all the (or most) members of a given species12. Hence, the core resistome of CB showed resistances against fluoroquinolones and aminocoumarins, while UB contained resistances to these antibiotics and additionally against tetracyclines and mupirocins.

As already shown for the composition of the microbiome, annotated resistance features were also used to build predictive models by supervised learning methods. Predictions were almost accurate if they were based on resistance genes (CB vs UB: overall accuracy = 91%) instead of microbial profiles (CB vs UB: overall accuracy = 92%). However, numerical environmental parameters like sea levels (R = 0.64, P = 3.3 × 10-3), temperature (R = 0.46, P = 0.09), and microbial abundance (R = 0.46, P = 0.06) could not be predicted easily and showed only low model accuracies.

Resistance genes were further investigated in their genomic context (synteny). In most cases, antibiotic resistance genes were co-localized with other resistance genes especially on genomes retrieved from CB environments (mainly multidrug efflux transporter systems e.g., acrA, acrB, and bepE). In contrast, genomes from UB environments showed more often transcriptional regulators (e.g., cymR and grpE) and transposases (tnpABC) in close vicinity to annotated resistance genes. Despite the high frequency of transposase genes in the vicinity of resistance genes, no integron clusters could be detected. Resistance genes of genomes from CB environments were also significantly more often surrounded by a higher frequency of flanking repeats (W = 12075, P = 0.02). Potentially horizontally transferred genes (HGT) in regions of genome plasticity were identified by synteny breaks and the compositional bias between genomes of CB and UB and closely related genomes available in the MaGe database13. More potential HGT features (both mobility genes as well as tRNA hotspots) were detected in genomes from CB environments. However, higher proportions of HGT in CB were not significant.

In summary, a significant (W = 110, P = 1.3 × 10-7) reduction in microbial diversity on surfaces in CB by 50% was accompanied by a significant (W = 202.5, P = 0.01) increase of resistances by 20%, suggesting an enrichment of resistant microorganisms that displace the susceptible ones in these environments (P values from Wilcoxon signed-rank tests).