Study Design and Tissue Collection

This cross-sectional study was designed as described in2 and was approved by the Institutional Review Board of Washington University School of Medicine St. Louis, MO (IRB ID 201012734). Informed consent was obtained from all subjects used in this study. All methods were performed in accordance with the guidelines and regulations of Washington University School of Medicine. Specimens were derived from term subjects, since we were primarily interested in spatial diversity and abundance of microbes under normal conditions. Our cohort included 57 term women, 34 who underwent Cesarean section and 23 of who delivered vaginally. Detailed characteristics of term cases are presented in Table S1. Placentas from 57 term mothers were biopsied ≤12 hours after delivery. Trained research assistants used sterile technique to harvest 5-8 mm samples from the placental villous, fetal membrane, and the basal plate, as described2. Samples were placed in sterile cryovials, snap frozen in liquid nitrogen, and stored at −80 °C until DNA extraction.

Extraction and Purification of bacterial genomic DNA from term placental tissue

The basal plate, placental villus, and fetal membrane of the placenta were processed and genomic DNA was isolated from each location. Extraction and purification of samples were performed using autoclaved tools. DNA extraction and purification was performed in a laminar-flow hood that wiped with bleach and ethanol. As the human gDNA was the predominant component relative to bacterial DNA, bacterial gDNA was further purified using the QiaQuick PCR Purification Kit (QIAGEN). DNA extraction of samples was performed according to the DNeasy Blood and Tissue Kit (QIAGEN) protocol. Briefly, frozen tissue samples were suspended in tissue lysis butter ALT. Tissues were homogenized using sterilized magnetic beads at a frequency (1/s) of 30 using TissueLyser II (QIAGEN). Homogenized samples were centrifuged using the Centrifuge 5417 R (Eppendorf). 20 μL of proteinase K was added to each sample and incubated in the 56 °C shaker for 3 hours. gDNA was eluted into 200 μL of molecular biology grade water (CORNING). Negative control blanks, which included ‘no tissue’, and molecular biology grade water, were subject to all steps of the DNA extraction and purification procedure including proteinase K and homogenization steps. Samples were quantified using NanoDrop Spectrophotometer (Thermo Scientific). Approximately 10 ng/μL of each purified (QIAquick Purification Kit, QIAGEN) DNA sample, blank, and molecular biology grade water were loaded into autoclaved 96-well PCR plates (BIO-RAD) and sealed with Microseal B Adhesive Seals (BIO-RAD). Samples were sent to the Genome Technology Access Center (GTAC) for 16S PCR amplification, sequencing, and sequencing data analysis.

Amplicon generation and Sequencing of Bacterial 16S rRNA Genes

Purified bacterial gDNA was then used for amplicon generation and next generation sequencing on the MiSeq platform using the bacterial ribosomal 16S gene primers (Table S1). Variable regions targeting V1–V9 were used. The Fluidigm Access Array System was used to construct 14 PCR amplicons, representing all 9 16S variable regions. 1X High Fidelity FastStart Reaction Buffer without MgCl 2 , 4.5 nM MgCl 2 , 5% DMSO, 200 μM PCR Grade Nucleotide Mix, 0.05 U/μL 5 U/μL FastStart High Fidelity Enzyme Blend, 1X Access Array Loading Reagent (Fluidigm), 1 μl of DNA at a concentration of 5 ng/μl, and water were put into each sample inlet. To add primers to the assay inlets, 200 nM forward and reverse primers were combined with the 1X Access Loading Reagent. PCR amplification of harvested inlet samples was performed on the Fluidigm Biomark. Each sample was then harvested and indexed using unique 10 base pair sequences with 14 rounds of PCR to incorporate each index sequence. All samples were pooled into 48 sample libraries and cleaned using bead purification. Samples were loaded and sequenced using the Illumina MiSeq platform. Primers used for 16S amplicon generation are shown in Table S2.

Sequencing Filtering, OTU Clustering, and alignment

Downstream analyses were performed on the PCR amplicons. Fast-join was used to join paired-end reads12. Sequences that corresponded to a particular amplicon were identified using their matching/corresponding primer sequences. 16S rRNA sequences were uploaded to the NCBI Sequence Read Archive (BioProject PRJNA395716). The Quantitative Insights to Microbial Ecology (QIIME) pipeline version 1.9.0 was used for read analysis14. In the QIIME analysis, each amplicon and sample pair was defined as a separate sample. Open-reference operational taxonomic units (OTU) were called using the Greengenes May 2013 release as the reference database15. Reads were clustered into OTUs by QIIME using UCLUST16 at a threshold of 97% similarity. Representative sequences for each OTU were classified taxonomically with the UCLUST consensus taxonomy assigner in QIIME using a sequence similarity of 0.9. To ensure that the de-novo OTUs aligned to 16S sequences and were not randomly constructed, we opted to use the no_pynast_failures file and removed OTUs that did not align with the pynast to the Greengenes core alignment.

16S rRNA Quantitative real-time qPCR

We performed qPCR, targeting the conserved V4 region of the 16S gene (primers F515: 5′ GTGCCAGCMGCCGCGGTAA 3′ and R806: 5′ GGACTACHVGGGTWTCTAAT 3′) in samples that had been filtered by removing OTUs observed once and rarefied to 300 OTUs. Purified extraction and purification blanks (that had been sequenced) and fresh molecular biology grade water (CORNING) were used as negative controls.

To calculate the 16S copy numbers per sample, a standard curve was generated using E. coli DNA. To do this, a laboratory stock of E. coli was streaked onto Luria Bertani (LB) agar plates and grown at 37 °C overnight. A colony was selected and inoculated into LB broth, grown overnight at 37 °C, centrifuged and the pellet re-suspended in 10 mls of PBS. 1 mL was used for the extraction as described﻿ above. E. coli 16S rRNA gene exists as 7 copies35 in each cell with a genome size of 5.18 Mb36. Using this information and given the concentration of the E. coli DNA stock, we calculated the quantity of 16S copies per uL of this stock. To generate the standard curve, a series of five 10-fold dilutions was generated from this initial stock.

Each quantitative polymerase chain reaction (qpcr) included 6 standards and the selected placental samples run in triplicate. Each 20 uL qPCR reaction consisted of 0.5 μL of 1X forward primer (71.4 ng/μL) and R (61.1 ng/μL) primer, 10 μL of Sso Advanced Universal SYBR (BIO-RAD), 5 μL of molecular biology grade water, and 1.0 of DNA template (at ~10 ng/μL) diluted in 3 μL of molecular biology grade water. Samples were loaded on into Hard-Shell 96-Well PCR Plates (BIO-RAD) and sealed with Microseal B Adhesive Sealers (BIO-RAD). These steps were performed in a laminar flow hood.

Quantitative PCR was performed using the C1000 Touch ThermoCycler (BIORAD) and the software and CFX Manager Software Version 3.1 (BIO-RAD). Cycling conditions were as follows: 94 °C for 10 minutes, and 40 cycles of 95 °C for 15 seconds, 60 °C for 1 minute. We prepared a logarithmic standard curve based on the calculated copy number per standard and their corresponding average Cq values. Based equation of the standard curve, we calculated the copy number of each sample using its average Cq37.

Ralstonia insidiosa-specific qPCR

We performed qPCR using R.i-specific primers designed previously (Rp-F1 5′ ATGATCTAGCTTGCTAGATTGAT 3′ and R38R1 5′ CACACCTAATATTAGTAAGTGCG 3′)38 (Integrated DNA technologies) to confirm R.i in samples positive for R.i (BP, N = 9) and negative in samples that did not detect this particular taxa during sequencing analysis (FM, N = 3 FM and BP, N = 3). Blanks (N = 3), water (N = 1), and E. coli (N = 1) were used as additional negative controls. qPCR cycling conditions were used as described previously. qPCR product was subsequently run on 2% agarose gel comprising of 2.0 g of agarose Standard Agarose (LAMDA BIOTECH), 100 mls of TAE buffer, and 1 μL of ethidium bromide 1% solution (Fisher Scientific). 2 μL of 5x GelPilot Loading Dye (QIAGEN) was loaded into 20 μL qPCR product and 10 μL of each sample was loaded into each well. Samples were run using the EC-105 Compact Power Supply (E-C apparatus Corporation) with a 100 base-pair DNA ladder (LAMDA BIOTECH) at approximately 100 Volts for 30 minutes. The agarose gel was photographed under ultraviolet light using AlphaImager 2200 (Alpha Innotech) using the software Alpha Ease FC Version 3.2.1.

α-Diversity Analysis

To compare the α-diversity (diversity within samples) the basal plate, placental villous, and fetal membranes, the QIIME commands alpha_diversity,py and collate_alpha.py were performed to generate the estimated Shannon diversity indices for validated samples. Consistent with sample validation pipeline, OTUs observed once were not considered in this analysis and samples were analyzed at a read depth of 300. At this depth, the estimated Shannon diversity (after 10 iterations) for each sample was averaged. Samples were grouped based on the site from which they were biopsied. The median Shannon diversity per site was statistically compared between sites.

Using the QIIME command, single_rarefaction.py, we rarefied the OTU biom files of validated samples to a read depth of 300. This biom file and rep_set.tre, a phylogenetic tree of OTUs within samples generated by QIIME, were imported in to R and the package “Phyloseq” was used to visualize the relative abundance of bacterial phyla in each sample, based on the taxonomic classification of 16S sequences called using Greengenes. Samples were grouped based on the site from which they were biopsied.

β-Diversity Analysis: Average Pairwise-Distances and Multidimensional Scaling Analysis

Using the phylogenetic tree generated by QIIME, rep_set.tre, QIIME outputs for beta_diversity.py and make_distance_plots.py, we generated a pairwise dissimilarity matrix table of validated samples grouped by site38, 39. These dissimilarity matrices were generated on the following community distance metrics: unweighted UniFrac, weighted UniFrac, and Bray Curtis. Pairwise distance values using unweighted UniFrac were calculated by comparing the fraction of total branch lengths that are unshared between OTUs in two samples38. To generate weighted UniFrac distance measurements, the phylogenetic profile and relative abundance of OTUs were compared between pairs of samples38. Finally, Bray-Curtis distance values involve comparing the relative abundance of a given OTU between pairs of samples40. In order to generate a site-specific ‘average distance value’, we averaged together the pairwise distances that corresponded to a particular sample and the other samples within the same sampling location. We then statistically compared the median ‘average distance value’ between sites. The biom file of validated samples and rep_set.tre (or “Phyloseq”-generated random_tree) were used to perform multidimensional scaling analysis based on the V4 dataset using R “Phyloseq” (Figs 2B, S5). In this plot, principal components 1 (PC1) and 2 (PC2) for the dataset were determined based on the component that best explained the variation between samples20.

Species Identification of Prevalent OTU IDs using Multiple Variable Regions

Top OTU IDs were selected based on (1) the prevalence of OTU across samples within a group (BP, PV, or FM), (2) the highest total number of reads for that OTU across samples within each group and (3) were not detected in the negative controls. The reference sequence was identified for the selected OTUs. For each validated sample, the amplicon nucleotide sequence associated with the detectable variable regions were identified. For each location and OTU ID, samples with the highest number of detectable variable regions were selected. For these samples, each amplicon was queried against its associated OTU reference to confirm sequence identities at ~97 using the NCBI Basic Local Alignment Search Tool (BLAST)21. For the selected samples, the sequences of the detectable variable regions were input together into BLAST and queried against the 16S Ribosomal RNA Sequences (Bacteria and Achaea) databases. The top BLAST hit was documented.

Bioinformatics and Statistical Analyses

To compare the median Shannon diversity or copy number between two sample groups, we used the non-parametric Wilcoxon-Mann-Whitney test because our sample sizes for each group did not meet the normality assumption. We followed the same procedure for the beta-diversity analysis to compare the average pairwise distances between locations and to compare 16S copy number/μL between groups.

Two-dimensional MDS plots and sample-specific relative abundance graphics were generated using “R” version 3.3.1. The following “R” packages were used: phyloseq (v. 1.16.2 and 1.19.1), ggplot2 (v. v. 2.1.0 and 2.2.1), vegan (v. 2.4.0 and 2.4.3), plyr (1.8.4), and devtools(1.12.0), cluster (v. 2.0.4), igraph (1.0.1), gridExtra (v. 2.2.1), and ape (v. 3.5)20, 41. PERMANOVA was completed to test the null hypothesis that all three locations have the same centroid, or average center, given the pairwise beta diversities. The same analysis was performed to test the null hypothesis that delivery method affected the centroid locations. PERMANOVA was completed for each of the three beta diversity measures — Bray-curtis, unweighted-UniFrac, and weighted Unifrac.