Identifying signatures on shoe and phone samples

To determine the extent to which the microbial communities of samples were driven by surface type (that is, shoe, phone, or floor) and study participant, we employed a combination of ordination and supervised learning analyses. We found that microbial community structure was determined both by surface type and participant (PERMANOVA on weighted UniFrac; Pseudo-F = 19.7 and 22.7, respectively; P < 0.0001). The relative influence of surface type and interacting individual on microbial community structure was demonstrated by the weighted (Figure 1A, B) and unweighted (Figure 1C, D) UniFrac distance between samples. In both cases, the first principle coordinate clearly demarcated sample surface while the second principal coordinate demarcated study participant. UPGMA hierarchical clustering of samples pooled by individual and surface type (Figure 1E, F) further suggested surface type as the dominant influence on microbial community structure, with phone and shoe samples forming distinct groups, which were in turn subdivided individually. In both ordination analyses, floor samples clustered tightly with their longitudinally associated shoe samples.

Figure 1 Ordination of samples based on weighted and unweighted phylogenetic dissimilarity in community composition. (A, B) depict principal coordinate (PCoA) plots for all samples in the study based on pairwise weighted UniFrac distance between samples, with sample points colored by surface and person, respectively. (C, D) are similarly colored by surface and person but are based on unweighted UniFrac distance. (E, F) depict UPGMA clustering of pooled and evenly rarified sample groupings based on weighted and unweighted UniFrac distance, respectively. Branches are highlighted to reflect person of origin (colors as in B and D) and group names at branch tips are colored by surface as in A and C. Full size image

The diagnostic power of microbial community profiles for predicting which of the two study participants a shoe or phone sample had been taken from was determined using random forest supervised learning. Random forest models were highly successful at determining which of the two participants’ shoes a sample was taken from, correctly classifying samples more than 50 times as effectively as one would expect by chance (Table 1), which indicates consistent differentiation in the shoe microbial communities of these two different people, even accounting for temporal variability. This is likely due to the presence of a ‘core microbiome’ on the shoes of individual study participants, which we assessed by looking at the abundances over time of the 100 taxa with the highest feature importance scores in the model (Additional file 1: Figure S1). The majority of those 100 operational taxonomic units (OTUs) were consistently detected on the shoes of one participant over the course of the time series, but not on those of the other participant.

Table 1 Summary of predictive accuracy of random forest supervised learning models Full size table

In contrast to the high error ratio of models predicting study participant, the models did no better than expected by chance in determining which of the four shoe sites a sample had been taken from, even when models were segregated by study participant. We propose that this is due to the homogenization of communities across the shoe sole over time or to rapid changes in community structure at each sampling site. A similar pattern was observed in phone samples, with the models able to classify the participant a phone sample was taken from (error ratio of 13.6) but unable to determine whether the sample had been taken from the front or back of a given phone (Table 1).

Random forest models were also used to assess which bacterial taxa were most associated with different surface types. Models were trained on a genus-level summary of the OTU table, and shoe and floor samples were merged into a single surface type based on their similarity in ordination analyses. When trained at the genus level, models were able to determine whether a sample was taken from a phone or a shoe/floor with an error ratio of 3.6. The 20 genera with the highest feature importance scores are summarized in Additional file 2: Figure S2, with skin-associated genera such as Streptococcus, Propionibacterium, and Corynebacterium highly enriched in phone samples relative to shoe samples.

Longitudinal interaction between shoe and floor communities

To determine the extent to which the floor environments a shoe has interacted with influence the sole’s microbial community and to assess whether individual shoe and floor time series could be matched based on similarity, we employed a Bayesian source tracking approach [13]. These Bayesian models predicted a dominant influence from the correct source (Figure 2), which we believe shows the similarity between shoe and floor microbial community composition and may be used to infer where someone has recently walked. On average, the models predicted that a floor sample was the source of microbes for approximately three quarters of the microbial community associated with that shoe at that time point. Strikingly, floor samples had significant predictive power despite often being taken in areas the shoe did not directly touch (that is, proximate to where the participant had actually stepped), which suggests localized homogeneity of the floor microbial community. We also formulated individual SourceTracker models for each participant, in which the floor samples of individual locations were treated as sources to the shoe samples (Additional file 3: Figure S3). These models demonstrated that bacterial taxa associated with the floor of a particular location often increased in abundance on the shoe soles of study participants while walking through that space.

Figure 2 Summary of predictive accuracy of SourceTracker models in determining which of the two study participants a sample was taken from based only on the microbial communities of the floor samples those shoes had interacted with. For the models, all four shoe samples taken by each participant at a given time point were consolidated and treated as individual sinks (N = 29 and 27 for persons 1 and 2, respectively). All floor samples from the two participants’ time series were collapsed and treated as the two possible sources to the shoe sink communities. Full size image

To determine whether changes in the microbial community of the four shoe environments tended to be similar at each hourly sampling interval, we employed Procrustes analysis of the four sets of principal coordinates (Additional file 4: Figure S4). All three pairwise comparisons for each study participant produced significant P values (P < 0.005; Additional file 5: Table S1), demonstrating that changes in the microbial communities of the four shoe environments resemble each other at each sampling interval, and thus suggesting a consistent impact from the floor microbial community. Procrustes analysis of the principal coordinates from the front and back of participants’ phones did not produce significant P values, which we hypothesize is likely due to greater heterogeneity in community composition across the surface area of an individual phone at a given time point than would be observed across a shoe at a given time point due to lower overall biomass and high volatility in hand-associated microbial communities. It is also likely that microbes from the back of phones are likely to be sourced mostly from hands while the front may also be sourced from the face of the owner.

To assess the speed at which the floor environment influences the shoe sole microbial community, we looked at the relationships between shoe and floor samples taken from the same time point in principal coordinate (PC) space. For both study participants, PC1 values for the floor and shoe samples at each time point were highly correlated for all four shoe environments (Figure 3A); we believe this is likely due to rapid contamination of the shoe sole by the floor microbial community. In all but one shoe environment (the right shoe heel of person 2), the correlation between shoe and floor PC1 values from the same time point was substantially higher than the correlation between samples taken one time step apart (Additional file 5: Table S2). In most cases, shoe microbial communities quickly converged on a PC space similar to that of the floor community (Figure 3B). These communities were largely segregated by the geographic location the sample was taken from and by the material of that location’s floor (wood, linoleum, etc.), further supporting the possibility of rapid microbial transfer to the shoe sole.

Figure 3 Immediate impact of floor microbial community on shoe microbial communities. (A) Correlation in the first principal coordinate values of shoe and floor samples taken at the same time point. (B) Principal coordinate plots of all shoe and floor samples, split by individual and colored by floor type and location at time of sampling. Full size image

Figure 4 Ordination of biogeographic samples based on weighted and unweighted phylogenetic dissimilarity in community composition. Panels A and B depict principal coordinate (PCoA) plots for all biogeographic samples based on pairwaise weighted UniFrac distance between samples, with sample points colored by surface and location respectively. C and D depict ordinations of shoe samples, colored by sampling location, based on weighted and unweighted UniFrac distance, respectively. E and F depict ordinations of phone samples colored by sampling location and are based on weighted and unweighted UniFrac distance, respectively. Full size image

Although our experimental design only allows us to assess the impact of the floor microbial community on that of the shoe sole, it is of course also true that shoes influence floor microbial communities by depositing microbes that have adhered to them. As participants walk, bacteria may adhere to shoes and be subsequently transferred back to the floor in a dynamic process of continual loading and unloading of microbes. A study of uptake and deposit of particles via indoor foot traffic showed that in many cases downplay of particles in the size range of bacteria from shoe to floor is greater than uptake by the shoe [14].

To assess the stability of microbial community structure across the 12 individual shoe and phone time series, we focused on weighted UniFrac distance between samples from consecutive time points and visualized community volatility as a density plot of those distances (Additional file 6: Figure S5). Phone-associated microbial communities were observed to be both less stable (higher median distance) and more variable in their rate of change over time (broader distribution) than shoe-associated communities. By contrast, little difference was observed between the four shoe environments or between the two phone environments. We hypothesize that the high volatility of phone-associated microbial communities is likely due to a small microbial biomass that would be prone to a rapid turnover in community composition and the very high volatility of hand-associated microbiota that has been observed in previous studies [8].

Biogeographic influence on community structure

In addition to the two time series participants, we also collected individual shoe and phone samples from volunteers at three academic conferences, one in Vancouver, BC (N = 29), one in Washington, D.C. (N = 26), and one in California (N = 34). California samples were taken from two different rooms at the same conference while Vancouver and Washington samples were all taken from the same room. We used these data both to corroborate the patterns of diversity observed in the time series with a larger number of participants and to assess the differentiation in community structure attributable to geographic segregation.

As in the time series analyses, phone and shoe microbial communities were significantly different (Figure 4; Pseudo-F = 38.2 for weighted UniFrac, P < 0.0001). The location at which samples were collected also played a significant role in shaping community similarity, especially in shoe samples (Pseudo-F = 8.8, weighted UniFrac, P < 0.0001) though also significantly in phone samples (Pseudo-F = 4.9, weighted UniFrac, P < 0.0001). Random forest models were able to determine which of the three conferences a sample was taken from significantly better than expected by chance for both the shoe and phone environments (error ratio = 11.7 and 8.0, respectively). This suggests to us that, as seen in the time series data, different sites maintain a significantly different floor microbial community, which in turn shapes the microbial assemblage structure associated with the shoe samples.