Sample kit preparation and sample collection

Sampling wipes were prepared at the Jet Propulsion Laboratory (JPL; Pasadena, CA). Briefly, each polyester wipe (9″ × 9″; ITW Texwipe, Mahwah, NJ) was folded two times and soaked in 15 mL of sterile molecular grade water (Sigma-Aldrich, St. Louis, MO) for 30 min followed by the transfer to a sterile zip lock bag [73]. The sampling kit was assembled at NASA Ames Research Center (ARC, Moffett Field, CA). The implementation team at NASA ARC delivered the kit to the Cargo Mission Contract at Johnson Space Center (Texas) which was then transferred to Kennedy Space Center (Florida) in order to be loaded into the Space Exploration Technologies (SpaceX) Dragon spacecraft prior to launch. Each sampling kit was sent to the ISS onboard the SpaceX-5, -6, -8, rockets and returned to the Earth onboard the Russian vehicle (Soyuz TM-14) and Dragon capsule (SpX-6 or -8). Eight different locations were sampled on the ISS using the polyester wipes described above (see Fig. 1 for a summary of the sampling locations). The metadata associated with the samples and collections is summarized in Additional file 6: Table S3.

The study requirements stated that there should be no cleaning at least 4 days prior to sampling. When the cleaning occurred during the weekends, it was done at the crew’s discretion without suggestions about the specific locations, therefore following the typical routine of activities on the ISS. The disinfectant wipes that are used in the ISS contain octyl decyl dimethyl ammonium chloride (0.0399%), dioctyl dimethyl ammonium chloride (0.01995%), didecyl dimethyl ammonium chloride (0.01995%), alkyl (50% C14, 40% C12, 10% C16) dimethylbenzylammonium chloride, and dimethylbenzylammonium chloride (0.0532%). During each flight, one astronaut performed all the sampling and used the wipes to sample one square meter. A new pair of individually packed sterile gloves (KIMTEC Pure G3 White; Nitrile Clean-room Certified; Cat. HC61190) were used before sampling the next location. The crew was instructed to collect samples from the same surfaces during all three sampling sessions. A control wipe (environmental control) was taken out from the Zip lock bag, unfolded, waved for 30 s, and packed back inside a new sterile zip lock. One control wipe was included for each flight session. Similarly, an unused wipe that was flown to the ISS and brought back to Earth along with the samples served as a negative control for sterility testing. If field controls (wipes that were exposed to the ISS environment but not used in active sampling) showed any signs of microbial growth, then negative controls would be assayed for cultivable counts to check sterility of the wipes used for sampling. However, none of the field controls showed any CFUs for all three flights. The samples were stored at room temperature in orbit. After sample collection, samples were returned to Earth after 7 days for Flight 1, 9 days for Flight 2, and 6 days for Flight 3. The kits were delivered to JPL immediately after arrival to Earth at 4 °C with processing at JPL commencing within 2 h of receipt.

Sample processing

Sample processing took place in a ISO 7 (10K class) cleanroom at JPL. In a certified biosafety cabinet, each wipe was aseptically removed from the zip lock bag and transferred to a 500 mL bottle containing 200 mL of sterile phosphate-buffered saline (PBS; pH 7.4). The bottle with the wipe was shaken for 2 min followed by concentration with a Concentrating Pipette (Innova Prep, Drexel, MO) using a 0.22 μm Hollow Fiber Polysulfone tips (Cat #: CC08022). Each sample was concentrated to 4 mL with PBS elution fluid (Cat #). Then, 3 mL of this concentrated sample was split into two 1.5 mL aliquots. One aliquot was treated with PMA (18.25 μL of 2 mM PMA, resulting in a final concentration of 25 μM) to assess cells that were viable or had an intact cell membrane [24], while the second aliquot was handled in a similar manner but without the addition of PMA. The PMA and non-PMA-treated aliquots were incubated in the dark at RT for 5 min, followed by 15 min of photoactivation using the PMA-Lite™ LED Photolysis Device, specifically designed for photoactivation of PMA (Biotium, Hayward, CA). The PMA- and non-PMA -treated aliquots were then split into two 0.75 mL aliquots. One aliquot was transferred to bead beating tubes containing Lysing Matrix E (MP Biomedicals, Santa Ana, CA), followed by bead beating for 60 s using the vortex sample holder (MO Bio, Carlsbad, CA). The bead-beaten aliquot and the aliquot without bead beating were combined for their corresponding PMA-treated and non-treated samples. DNA extraction was performed with the Maxwell 16 automated system (Promega, Madison, WI), in accordance with manufacture instructions using the Maxwell 16 Tissue LEV Total RNA purification kit. A Maxwell control (MC) without any sample added in its cartridge was run concurrently with each flight sample. The extracted DNA was eluted in 50 μL of water and stored at − 20 °C until further analysis.

Estimation and identification of cultivable microbial population

The concentrated samples were diluted in PBS (up to 10−6 of each original sample) and 100 μL of each dilution was plated (in duplicate) on Reasoner’s 2A agar (R2A for environmental bacteria), Potato Dextrose Agar with chloramphenicol (100 μg/mL; PDA for fungi), and blood agar (BA for human commensals; Hardy Diagnostics, Santa Maria, CA). R2A and PDA plates were incubated at 25 °C for 7 days and BA plates at 35 °C for 2 days at which time colony forming units (CFU) were calculated. Whenever possible, a minimum of five isolates of distinct morphologies were picked from each plate, from each ISS sampling location. The isolates were then archived in semisolid R2A or PDA slants (agar media diluted 1:10) and stored at room temperature. Once a culture was confirmed to be pure, two cryobead stocks (Copan Diagnostics, Murrieta, CA) were prepared for each isolate and stored at − 80 °C. A loopful of purified microbial culture was directly subjected to PCR and the targeted fragment was amplified (colony PCR), or DNA was extracted with the UltraClean DNA kit (MO Bio, Carlsbad, CA) or Maxwell Automated System (Promega, Madison, WI). The extracted DNA was used for PCR to amplify the 1.5 kb 16S rRNA gene in order to identify bacterial strains. The following primers were used for the 16S rRNA gene amplification: the forward primer, 27F (5′-AGA GTT TGA TCC TGG CTC AG-3′) and the reverse primer, 1492R (5′-GGT TAC CTT GTT ACG ACT T-3′) [74, 75]. The PCR conditions were as follows: denaturation at 95 °C for 5 min, followed by 35 cycles consisting of denaturation at 95 °C for 50 s, annealing at 55 °C for 50 s, and extension at 72 °C for 1 min 30 s and finalized by extension at 72 °C for 10 min. The ITS region was amplified using the forward primer ITS1F (5′-TTG GTC ATT TAG AGG AAG TAA-3′) [76] and reverse primer Tw13 (5′-GGT CCG TGT TTC AAG ACG-3′) [77] to obtain a ~ 1.2 kb product. The PCR conditions were as follows: initial denaturation at 95 °C for 3 min followed by 25 cycles of 95 °C for 50 s, annealing at 58 °C for 30 s, and extension at 72 °C for 2 min, followed by a final extension at 72 °C for 10 min. The amplicons were inspected on a 1% agarose gel. When bands for products were visible, amplification products were treated with Antarctic phosphatase and exonuclease (New England Biolabs, Ipswich, MA) to remove 5′- and 3′-phosphates from unused dNTPs before sequencing. The sequencing was performed by Macrogen (Rockville, MD) using 27F and 1492R primers for Bacteria, and ITS1F and Tw13 primers for Fungi. The sequences were assembled using SeqMan Pro from DNAStar Lasergene Package (DNASTAR Inc., Madison, WI). The bacterial sequences were searched against EzTaxon-e database [78] and the fungal sequences against the UNITE database [79]. The identification was based on the closest percentage similarity (> 97%) to previously identified microbial type strains.

qPCR assay

Following DNA extraction with the Maxwell Automated system, quantitative polymerase chain reaction (qPCR), targeting the partial 16S rRNA gene (bacteria) or partial ITS region (fungi), was performed with SmartCycler (Cepheid, Sunnyvale, CA) to quantify the microbial abundance. Primers targeting the partial 16S rRNA gene were 1369F (5′-CGG TGA ATA CGT TCY CGG-3′) and modified 1492R (5′-GGW TAC CTT GTT ACG ACT T-3′) [80]. Primers targeting the ITS region were NS91 (5′-GTC CCT GCC CTT TGT ACA CAC-3′) and ITS51 (5′-ACC TTG TTA CGA CTT TTA CTT CCT C-3′) [81]. Each 25-μL reaction consisted of 12.5 μL of 2X iQ SYBR Green Supermix (BioRad, Hercules, CA), 1 μL each of forward and reverse oligonucleotide primers (10 μM each), and 1 μL of template DNA (PMA treated and non-treated samples). Each sample was run in triplicate; the average and standard deviation were calculated based on these results. Purified DNA from a model microbial community [82] served as the positive control and DNase/RNase free molecular-grade distilled water (Promega, Madison, WI) was used as the negative control in each run. The reaction conditions were as follows: a 3-min denaturation at 95 °C, followed by 40 cycles of denaturation at 95 °C for 15 s, and a combined annealing and extension at 55 °C for 35 s. The number of gene copies in the samples were determined by running a standard curve, which was generated using serial dilutions (108–102) of Bacillus pumilus SAFR-032 16S rRNA gene as described previously [2]. The qPCR efficiency was ~ 98% for each run. The negative control values were not deducted since the values were at ~ 100 copies per 1 or 10 μL and not scalable (yielded the same results despite using 1 μL and 10 μL of DNA templates was used).

Illumina sequencing - Bacteria

Flight sampling 1 and 2

Bacterial diversity was assessed by analyzing the V4 hypervariable region of the 16S rRNA gene coding sequence. Amplification was performed with the following primer pair: forward primer, A519F (new nomenclature: S-D-Arch-0519-a-S-15), 5′-CAG CMG CCG CGG TAA-3′, and the reverse primer 802R (new nomenclature: S-D-Bact-0785-b-A-18,) 5′-TAC NVG GGT ATC TAA TCC-3′ [83]. Expected amplicon size is 283 for Bacteria as estimated for 16S rRNA gene sequences deposited in the Silva SEED Reference Database [84].

Fungal diversity was assessed by analyzing the ITS1 region between 18S and 5.8S rRNA coding sequences. Amplification primers were ITS1-F_KYO2 (5′-TAG AGG AAG TAA AAG TCG TAA-3′) and ITS2_KYO2 (5′-TTY RCT RCG TTC TTC ATC-3′) [85]. Expected amplicon length distribution is 271 ± 90 bp for Ascomycota, 284 ± 42 bp for Basidiomycota, and 216 ± 94 bp for non-Dikarya species [86].

PCR synthesis of SSU-V4 and ITS1 amplicons was performed using Q5 High-Fidelity PCR Kit (New England Biolabs, Ipswich, MA) according to the manufacturer’s instructions. The 40-μL reaction mixtures were incubated under the following conditions: initial denaturation at 94 °C for 3 min followed by 35 cycles of 94 °C for 30 s, 47 °C for 30 s, and 72 °C for 90 s, with a final extension at 72 °C for 5 min. Afterwards, each reaction mixture was fractionated by electrophoresis on 2% agarose gel, recovering all PCR products in the size range of 200 to 400 bp. The amplicons were isolated from gel slices using silica spin-columns [87], and eluted with nano-pure water. The purified amplicons were tagged with barcoded Illumina adapters using TruSeq DNA PCR-Free Library Prep Kit LT (Illumina, San Diego, CA) according to the manufacturer’s instructions. The libraries were quantified on a TBS-380 Fluorimeter (Turner BioSystems, Sunnyvale, CA) using PicoGreen dye (Invitrogen, Carlsbad, CA) as a dsDNA-binding fluorogenic reagent. The dsDNA length distribution in individual library preps was assessed by analysis on a 2100 Bioanalyzer with High Sensitivity DNA chip (Agilent Technologies, Santa Clara, CA). The libraries were pooled to be present at equimolar concentrations in each mixed sample with total concentration of 10 nM. The first mixed sample contained 20 16S rRNA-V4 libraries and 17 ITS1 libraries representing the first ISS sampling session together with corresponding controls. The second mixed sample contained 21 16S rRNA-V4 libraries and 20 ITS1 libraries representing the second ISS sampling session and corresponding controls. The two sample sets were sequenced on a NextSeq 500 Sequencing System (Illumina, San Diego, CA) with NextSeq 500/550 Mid-Output v2 Kit for 300 main and 6 index cycles.

Flight sampling 3

DNA from these samples was amplified using 1 μL of gDNA in triplicate 25 μL reactions using Platinum Hot Start PCR master mix (Thermo Fisher cat# 13000012) and custom golay barcoded primers of the 16S V4 region, 515fB (5′-GTG YCA GCM GCC GCG GTA A-3′) and 806rB (5′-GGA CTA CNV GGG TWT CTA AT-3′), (expected amplicon size ~ 291 bp) as described in the http://www.earthmicrobiome.org for 94 °C 3 min and 35 cycles at 94 °C 45 s, 50 °C 60 s, 72 °C 90 s followed by 72 °C 10 min and held at 4 °C. Triplicate reactions were then pooled into a single tube and quality assessed. The amplicons were run on a 2% agarose gel and quantified using PicoGreen to access quality and relative quantity. All samples were pooled in equal volume into a single tube and then processed through the MoBio PCR cleanup kit to remove adaptors and primers. Final cleaned pools were then sequenced on a HiSeq 2500 2 × 150 bp Rapid Run.

Illumina sequence processing—Bacteria (flight 1, 2, and 3)

For F1 and F2 samples, the forward reads were de-multiplexed by using fastq-multx v. 1.02.772, a tool from ea-utils software package [88], with the forward amplification primers for prokaryotes as search targets. The reads were further processed to remove all remaining sequences of the amplification primers and the Illumina TruSeq adapters from their 3′-ends using consecutively fastq-mcf v. 1.04.807 program [88] for exact sequence search, and agrep (http://www.tgries.de/agrep/) and treagrep (0.8.0: https://github.com/laurikari/tre/) programs for search allowing up to three mismatches between the primers/adapters and the reads to accommodate for sequencing errors. The F3 reads were demultiplexed and adaptors removed using Qiita (http://qiita.ucsd.edu) using the parameters max_barcode_errors: 1.5; barcode_type: golay_12; and phred_quality_threshold: 3.

The demultiplexed reads for F1, F2, and F3 were then processed using the DADA2 pipeline, trimming the 3′ end of the forward reads to a length of 130 bp, and setting the filter parameters to maxN = 0, maxEE = 2, trunQ = 2, and rm.phix = True. The DADA2 pipeline (https://benjjneb.github.io/dada2/index.html) was followed to obtain an amplicon sequence variant table (“ASV” table), a “higher resolution analogue of the ubiquitous OTU table”. Taxonomy was assigned used the SILVA reference database.

Illumina sequence processing—Fungi (Flight sampling 1 and 2)

The forward reads were de-multiplexed by using fastq-multx v. 1.02.772, a tool from ea-utils software package [88], with the forward amplification primers fungi as search targets.

The 5′-ends of the sorted reads were trimmed for a predetermined length based on the length of the corresponding amplification primer for each dataset. The reads were further processed to remove all remaining sequences of the amplification primers and the Illumina TruSeq adapters from their 3′-ends using consecutively fastq-mcf v. 1.04.807 program [88] for exact sequence search, and agrep (http://www.tgries.de/agrep/) and treagrep (0.8.0: https://github.com/laurikari/tre/) programs for search allowing up to three mismatches between the primers/adapters and the reads to accommodate for sequencing errors. After primers/adapters were removed, the processed reads exhibited multimodal length distribution. The reads from the fungal datasets formed three groups of 184–223 bp, 224–246 bp, and 246–282 bp length. This correlates well with known length variability of ITS sequences from different fungal phyla [89]. Each of the three groups was separately subjected to the OTU clustering and taxonomy assignment procedures, and the results were merged together for further statistical treatment and visualization. ITS1 sequence clustering and taxonomy assignment were performed using USEARCH version 8.1.1756 [90]. For each collection of the related datasets, the OTUs were established by selecting high-quality reads with an expected error rate not exceeding 0.5%. The selected reads were further de-replicated, sorted, clustered at the default 3% difference, and de-chimerized against the UCHIME reference dataset distributed by UNITE [79]. Then, the reads from individual samples were filtered to exclude those with the expected error rate above 6%, and mapped to the OTUs. Taxonomy was assigned using the Warcup training dataset V1 (http://drive5.com/utax/data/utax_warcup_trainset1.tar.gz), with a bootstrap threshold of 50%.

The ITS targeted amplicon sequencing for Flight 3 samples did not yield any product to move forward in generating sequences and this might be due to the low fungal biomass of the samples.

Statistical analysis

Bar graphs and strip charts of CFU and qPCR data were plotted using Prism (GraphPad Software, version 5.0a; Irvine, CA). Significance (P < 0.05) between groups was tested with the Kruskal-Wallis test followed by Dunn’s post-hoc test.

Amplicon sequence analysis

Bacterial ASV sequences and Fungal OTUs were summarized to the family and/or genus level using QIIME [91]. The ALDEx R package version 2 [70] was used to statistically compare the relative abundances of bacterial family level taxa between the different flights and locations based on the expected values of 128 Dirichlet Monte Carlo instances of centered log ratio (clr) transformed data [71]. A value of zero indicates that organism abundance was equal to the geometric mean abundance. Thus, organisms more abundant than the mean would have positive values, and those less abundant than the mean would have negative values. Significance was based on the Benjamini-Hochberg corrected P value of the Kruskal-Wallis statistical test (significance threshold P < 0.05). ALDEx2 was also used to compare fungal genus level taxa between flights and differential ASVs and OTUs between samples and controls.

The R script of SourceTracker (version 0.9.1), the contamination predictor tool, was used to assess contamination of the samples [92]. ISS surface wipes were designated as sink and the field and Maxwell negative controls as sources. Samples were rarified to 1000 reads.

QIIME was also used to calculate Shannon’s diversity and taxa richness. Statistical analysis of Shannon’s diversity and taxa richness was performed in Prism using the non-parametric Kruskal-Wallis test with the Benjamini Hochberg FDR multiple test correction.

Genus level counts were clr transformed using the “compositions” package in R [93] and visualized with a heat-map created with the “gplot” package in R. Barplots, boxplots, CCA plots, and pie charts were all created in R.

Comparison of ISS environmental microbiome with Earth microbiome

The ISS environmental microbiome data were processed by Deblur 1.0.4 [94] trimming at 90 nt with defaults except for setting —min-reads 1 to avoid filtering sequences across samples prior to merging sample sets. The published Earth Microbiome Project 90 nt BIOM table [27] was obtained from ftp://ftp.microbo.me. Deblur 1.0.4 90nt BIOM tables of Hospital Microbiome Project (Qiita study 10,172) and Office Succession Study (Qiita study 10,423) were obtained from Gonzalez et al. [95] using redbiom analysis (https://github.com/biocore/redbiom). Only the reference-hit sOTUs were used across all studies including ISS microbiome datasets. All studies were merged using the BIOM Table Python application programming interface (API). Using the API, sOTUs with fewer than 25 total observed sequences were filtered as was previously performed [96] and samples were rarefied to 1000 sequences per sample. The data were then imported into QIIME2 2018.11 [97] and unique sOTUs were inserted into Greengenes 13_8 [98] using SEPP [99] via the QIIME2 fragment-insertion plugin [100]. For UniFrac, fragment insertion was performed, which was previously shown to ameliorate primer biases [100]. Unweighted UniFrac was computed using Striped UniFrac [101] through QIIME2’s diversity plugin with –p-bypass-tips, principal coordinates were computed using FSVD [102] as used elsewhere [101] and the coordinates were visualized using the EMPeror [103] plugin in QIIME2. Unique sOTUs were assessed in a Jupyter Notebook [104] using the BIOM Table API.

Controls and nomenclature of the samples

Controls were taken in all steps of the study for all three flight sessions. There was a field control “CTL,” which was a wipe that was opened to the ISS environment but was not used for active sampling and a Maxwell “DNACTL,” which was water that was used during the DNA extraction steps instead of surface or control wipe samples and acted as a DNA extraction reagent control. The field controls were either treated with PMA (“CTL_P”) or left untreated (“CTL”). In total, there were ten controls analyzed during bacterial qPCR and Illumina amplicon sequencing. Likewise, for fungal analysis, the same controls were collected; however, no amplicons were generated for “DNACTL” for either flight nor CTL_P for Flight 1 during qPCR or Illumina library prep and thus were not sent for amplicon sequencing. Similarly, for qPCR and Illumina sequencing, required reagent controls were tested. The samples during this study were designated with flight session number followed by location number (sampling sites). For example, sample number “F1_3” denotes that surface materials were taken from the first flight at location 3 but sample was not treated with PMA, whereas “F1_3P” denotes that same sample was treated with PMA.