Ethics

It was confirmed that this and similar studies using human sewage in accordance with the Danish Act on scientific ethical treatment of health research (Journal no.: H-14013582) do not require preapproval from ethical committees.

Collection of urban sewage samples

The National Food Institute, Technical University of Denmark (DTU Food) launched an open invitation 25 September 2015 seeking potential collaborators to participate in the pilot study of the Global Sewage Surveillance Project (GSSP). The invitation was sent by electronic mail to the following networks and individual research collaborators for further dissemination: WHO Global Foodborne Infections Network (GFN) [http://www.who.int/gfn/en/], WHO Advisory Group on Integrated Surveillance of Antimicrobial Resistance (AGISAR) [http://www.who.int/foodsafety/areas_work/antimicrobial-resistance/agisar/en/], European Union Reference Laboratory for AMR network [www.eurl-ar.eu/], and to the European Food- and Waterborne Diseases and Zoonoses Network (FWD-Net) [https://ecdc.europa.eu/en/about-us/partnerships-and-networks/disease-and-laboratory-networks/fwd-net].

Participation was cost neutral and managed by online registration that also included requests for proper packaging and sample containers and the need to sign a material transfer agreement or similar approval/agreement to protect any intellectual property rights. All expenses related to the shipments were paid by DTU Food, and the shipments complied with the IATA regulation as per SP A197 because the content of the parcels was identified as UN3082 “Environmentally hazardous substance, liquid, not otherwise specified” but did not exceed 5 L of sewage.

A protocol that instructed how to collect the urban sewage samples as well as epidemiological, demographic, and geographical information was provided to each participant in the study (Supplementary Data 8). Participants were requested in conjunction with collecting the samples to submit via an online survey the captured information related to epidemiological, demographic, and geographical information as well as a digital image of the sampling site, if possible the GPS coordinates of the sampling location, the temperature of the sample at the time of sampling, pH of the sewage, and storage temperature of sample. From each location, on 2 consecutive days between 25 January and 5 February 2016, one representative, non-processed, unfiltered urban sewage sample of 2 L was collected from the respective main sewage pipeline(s) prior to the inlet of the wastewater treatment plant or from the main outlet to rivers or similar according to the protocol provided. Flow proportion sampling over 24 h was preferred. Alternatively, three crude point samples were collected in a short time interval, i.e., at least 5 min between each individual sample, to ensure as much randomness as possible. The collected urban sewage was not treated, neither with additives nor DNA stabilizers and was recommended to be stored at −80 °C for at least 48 h prior to shipment to avoid the use of dry ice. The sample-specific data are provided in Supplementary Data 1 and an explanation of the content of this file provided as Supplementary Note 1.

Sample handling and DNA extraction

At DTU Food, a photograph of each sample container was taken upon arrival combined with a short remark describing the condition of the sample, e.g., solid frozen, thawed, coloration, etc. The samples of the first of the consecutive days were thawed for 12 h at approximately 20 °C before processing. After thawing, 250 mL of each sample were pelleted in a centrifuge at 10,000 × g for 10 min. The pellet was stored at −20 °C or −80 °C before DNA extraction and metagenomics analysis, and the supernatant was stored at −80 °C for subsequent antimicrobial residue and virus analysis. DNA was extracted from the sewage pellets according to an optimized protocol using the QIAamp Fast DNA Stool Mini Kit including twice the input material and initial bead beating32. For each batch of DNA extractions, a DNA extraction blank control was processed in parallel with the sewage samples to monitor background DNA.

Whole-community sequencing

DNA was shipped on dry ice for library preparation and sequencing to the Oklahoma Medical Research Foundation (OMRF). DNA from all samples was mechanically sheared to a targeted fragment size of 300 bp using ultrasonication (Covaris E220evolution). Library preparation was performed with the NEXTflex PCR-free Library Preparation Kit (Bioo Scientific). The Bioo NEXTflex-96 adapter set (Bioo Scientific) was used, and in batches of roughly 60 samples, the libraries were multiplexed and sequenced on the HiSeq3000 platform (Illumina), using 2 × 150-bp paired-end sequencing per flow cell. The raw sequencing data have been deposited at the European Nucleotide Archive under accession number ERP015409.

Trimming and mapping of sequencing reads

The reads were trimmed, including adaptor removal, using BBduk [BBMap—Bushnell B.—https://sourceforge.net/projects/bbmap/] with a quality threshold at 20 and minimum length of 50 bp. Trimmed reads were used as input to the reference-based mapping and taxonomy-assignment tool MGmapper17, which is based on BWA-MEM33 version 0.7.12 and SAMtools34 version 1.6. Reads were aligned against reference sequence databases for the best hit (Bestmode, i.e., a read can only map to 1 reference sequence). An acquired AMR gene database (ResFinder)35 was used to annotate properly paired reads (MGmapper option fullmode) where each read pair had sequence coverage of at least 80% compared with the length of the trimmed reads. This was the only filter that was applied to discard a read pair. The AMR genes were of bacterial origin and could therefore align to both bacteria databases and the Resfinder database. To enable multiple database hits, AMR genes were mapped using the Fullmode option in MGmapper for the most optimal abundance calculation. Genomic annotation was performed by identifying the best hit (MGmapper option bestmode) for a pair of reads when aligned against a range of reference sequence databases. Databases were primarily downloaded via NCBI genbank clade specific assembly_summary.txt files unless another ftp site is provided below. The list of databases used by MGmapper includes: Human (GRCh38.p3), bacteria (closed genomes), MetaHitAssembly (PRJEB674—PRJEB1046), HumanMicrobiome (genomes assemblies), bacteria_draft, plasmid, archaea, virus, fungi, protozoa, vertebrates_mammals, vertebrates_other, invertebrates, plant, and nt. For the bacteria and bacteria_draft databases, sequences were selected from the assembly_summary.txt file, when annotated with the tags version_status=‘latest’ and genome_rep=‘Full’. Furthermore, assembly_level= ‘Complete genome’ or ‘Chromosome’ were required for entries in the bacteria database and refseq_category=‘representative genome’ for entries in the bacteria_draft database. The plasmid database was constructed as the subset of bacteria and bacteria_draft sequences having the word ‘plasmid’ in the fasta entry header line.

The total bacteria read count for a sample was calculated as the sum of read counts from each of the bacteria-related databases (bacteria, bacteria_draft, MetaHitAssembly, and HumanMicrobiome). The total fraction of unmapped reads for all sample sites were used to translate the percentage of unmapped reads into Z-scores, i.e., the number of standard deviations from the mean. Setting an absolute Z-score threshold at 3 retained data from 79 sample sites. Data from Chad were excluded based on the Z-score threshold together with data from Gambia with suspiciously low resistance read counts; i.e., lower than those observed from DNA extraction control samples.

Bacterial and AMR gene distribution

Inspection of the count tables and the mapped reads distribution on the genomes revealed an overestimation of some genomes due to plasmids with high copy numbers. The issue only occurred in the included draft genomes, as the plasmid DNA were left out of the database with complete bacterial genomes. The distribution of mapped reads across contigs was expected to be evenly distributed, with some variation. The plasmids were revealed by large hit counts to specific contigs compared with the associated contigs in a draft genome. For each draft genome, the hits to each contig was normalized with respect to contig size. The median of the contig hits was found and the third quartile and the interquartile range was calculated. If a hit count was above the third quartile plus 1.5 times the interquartile range, then the hit count was interpreted as an overestimation and adjusted by replacing the hit count with the median.

Relative abundance of AMR genes and bacterial genera were calculated as FPKM. This was done to account for both sample-wise sequencing depth differences and size-dependent probability of observing a reference. For bacterial genome assemblies, FPKM was calculated based on the adjusted sum of fragments assigned to a genome assembly, whether or not the genome was closed. For AMR genes, FPKM was calculated on each individual ResFinder reference sequence. FPKMs were subsequently summed up across categories to bacterial genera (NCBI taxid), drug class level (NCBI tax ID), and highly homologous AMR gene groups (CD-HIT-EST, 90% similarity)36.

Within-site reproducibility

In order to test reproducibility of sewage samples, a dendrogram of resistome composition from all sewage samples including samples from eight sites that were double-sampled sewage was generated using Bray–Curtis (BC) dissimilarities and hierarchical clustering (Supplementary Fig. 2). The replicated samples were taken by 2 days apart. The day 2 samples were taken twice. The day 2 samples were kept in freezer for 2 years prior sequencing. Samples from day 1 and day 2 were sequenced using the same DNA extraction and sequencing protocols. The replicated samples had higher resistome similarity to its own replicates and all eight sets of replicate samples were clustered with their own replicate. This result shows a strong reproducibility of sewage samples despite having different day of sampling and 2 years of storage. The repeats were not included in subsequent analyses.

Within-country representativeness

To assess whether samples from individual sites are representative of other sites in that country, we compared the BC dissimilarities for pairs of sites within the same country and in different countries for both resistome and bacteriome compositions. We assessed the significance of these differences using permutation tests. We permuted the country labels for each sample and reassessed the dissimilarities for pairs of sites within the same country and in different countries with permuted labels. We repeated this procedure 106 times to build up a null distribution of the differences in dissimilarity within and among countries. We found that resistome dissimilarities were on average 34% higher for pairs of sites in different countries than for pairs of sites within the same country (permutation, p < 0.0001, Supplementary Fig. 3A and B), while bacteriome dissimilarities were 46% higher for pairs of sites in different countries than for pairs of sites within the same country (p < 0.0001, Supplementary Fig. 3C and D). These results show that there is less variance across sites within countries than across sites within different countries. Thus individual sites in this study are representative of other sites in that country.

Sample composition comparison

Metagenomes with the following accession numbers were downloaded from SRA (December 2017): ERR011089, ERR011114, ERR011344, ERR1104480, ERR1104481, ERR1121453, ERR1121455, ERR1121556, ERR1135427, ERR1135431, ERR1135437, ERR1135693, ERR1278103, ERR1278104, ERR1278105, ERR1414230, ERR1414260, ERR1527239, ERR1527247, ERR1558700, ERR1559789, ERR1560016, ERR1560024, ERR1560100, ERR1655116, ERR1682090, ERR1682101, ERR1698980, ERR186217, ERR1950597, ERR1950599, ERR1950601, ERR469632, ERR469644, ERR469650, ERR675524, ERR675555, ERR675557, ERR675560, SRR1182511, SRR1202091, SRR1267595, SRR2175658, SRR2175725, SRR2175750, SRR2751194, SRR2891615, SRR2891618, SRR605600, SRR605634, SRR873603, SRR873608, and SRR924749. The quality of the data was assessed and if necessary adapters were removed and trimmed with a quality threshold at 20 and minimum length of 50 bp. All pairwise Jaccard distances of the aforementioned metagenomes, all global sewage samples, and the control samples were calculated with Mash37. The heat map was drawn in R 3.4.4 (pheatmap).

Sample dissimilarities

Matrices with relative abundances (FPKM) of AMR genes and bacterial genera were Hellinger-transformed using the decostand function in vegan. The BC dissimilarity matrices were then calculated using the vegdist function, also in vegan (Supplementary Data 6).

Heat maps

The relative abundances (FPKM) of AMR genes were log-transformed and visualized in a heat map (pheatmap), showing the 50 most abundant genes. The gene-wise dendrogram is based on Pearson product moment correlation coefficients (PPMC), while the sample dendrogram is based on the aforementioned BC dissimilarity matrices, not just for the visualized genes, but all genes. Both dendrograms are the result of complete linkage clustering. The bacterial genera heat map was produced in the same way, while the AMR class-level heat map differs by using PPMC for hierarchical clustering of both AMR classes and samples. Continent association was included as metadata for all heat maps. For the AMR genes, we included World Bank income levels [http://apps.who.int/gho/data/node.metadata.COUNTRY?lang=en], GEMS cluster (identifies countries with similar dietary intake)38, the HDI39, and GBD 2015 super-regions40.

Sample ordination

Dissimilarity matrices were subject to classical multidimensional scaling (PCoA) to obtain the first two principal coordinates as well as the variance explained by each, ignoring negative eigenvectors. This was done using the cmdscale command in R.

Testing of sample dissimilarities

The BC dissimilarity matrices used for PCoA were also used for permutational multivariate analysis of variance (adonis2 function in vegan). The geographical group assigned to each sample was used as a predictor for dissimilarity.

Alpha diversity

Diversity and richness were calculated on rarified versions of the resistance and bacterial count matrices. For resistance genes, the two samples with <1000 read pairs were excluded. Subsequently, count matrices were subsampled to the lowest samples’ depth, using the Vegan rarefy function. The Simpson diversity index (1-D), Pielou’s evenness, and the Chao1 richness estimates were calculated using the diversity function in the vegan package.

Procrustes analysis

The vegan package was used for comparing the resistome dissimilarities with the bacteriome dissimilarities. The protest function was used to scale and rotate the principal coordinates of the bacterial PCoA onto the principal coordinates of the resistome PCoA and testing the strength of the association.

Graphics and statistics

All plots and statistical analyses were carried out in Microsoft R Open 3.3.2.

Correlation between AMU and external explanatory variables

A multilevel Poisson model was developed to investigate the sources of variance for the abundance of AMR genes and the relationship between AMU and abundance of AMR genes found in the samples. The counts of the individual AMR genes in each of the samples (see “Collection of urban sewage samples” and “Whole-community sequencing“ sections) aggregated at the antimicrobial class level was used as the dependent variable.

An observation-level random effect was used to model the over dispersion inherent to count data41. Because several samples were sequenced more than once on separate sequencing runs, we were able to estimate and correct for the noise in mean AMR gene abundance owing to the sampling process. Therefore, a categorical variable indicating which sample was used was included as a random effect (sample). A categorical variable identifying each location to allow for the estimation of variation in abundance of resistance genes between the different sampling locations (location) was included. Furthermore, a variable identifying the resistance class a resistance gene is a member of was included to estimate the variance due to differences in abundance between antimicrobial classes (class).

For the fixed effects, we included a measure of AMU at the country level. Data from the ECDC database (Supplementary Data 1) and data from the IMS database (Supplementary Data 1) were used to calculate a new variable from the two data sets. To this end, we used the data from the ECDC data set where available to predict missing values from the IMS data set by using a linear regression model. Because the data from the two data sets are highly correlated (r = 0.97, p < 0.01, Pearson’s product moment correlation), we could infer the (approximate) corresponding ECDC value from the IMS value for that particular country. The antimicrobial usage data were then log-transformed. Because it has been argued that many antimicrobial classes show cross-resistance, where resistance to one antimicrobial class also confers resistance to another class, we accounted for the potential effects of cross-resistance by fitting effects of both usage of the antimicrobial class that a resistance gene primarily confers resistance to (direct selection for resistance) and total AMU (indirect selection via cross-resistance). We also included the total number of passengers arriving in a country in 2015 as a fixed effect in the model. Data on the number of passengers were extracted from the ICAO international flight database [https://portal.icao.int, downloaded April 2016] and log-transformed. Lastly, we included the United Nations HDI as a fixed effect. HDI data from 2015 were extracted for the United Nations Development Programme website39 and log-transformed and scaled before including them in the final model. The same model set-up was used to investigate the association between drug residue levels and AMR gene abundance. In this model, data on the drug residues in the samples was included as a fixed effect instead of the AMU data. All other effects (random and fixed) were kept the same. Others models including temperature at the collection site at the day of sampling and Gross Domestic Product showed no significant association with AMR genes abundance (glmm’s, all p > 0.05).

Resistance prediction

World Bank Health, Nutrition and Population as well as Development indicator data sets collected between the years 2000 and 2016 for 259 countries and territories were downloaded from http://databank.worldbank.org/data/home.aspx in October 2017 and used to formulate AMR predictive models. Imputations of missing data were conducted using the missForest R package, which is a random forest-based technique that is highly computationally efficient for high-dimensional data consisting of both categorical and continuous predictors42. The final data set consisted of 1503 economic and health indicator variables (Supplementary Data 7). The most important variables predicting total resistance (FPKM) were selected from the World Bank data set and a recursive feature elimination method from the R library caret (Supplementary Table 3). The model was also run on all 1503 variables for comparison (Supplementary Table 4) and a good correlation was observed. The machine learning algorithms Support Vector Machine and random forest were compared for accuracy in predictions based on their R2 and Root Mean Square Error43,44. Random forest was the best choice of model (Supplementary Table 2). Random forest is suitable for data sets with many features, especially where each of the features contributes little information45. The prediction model was trained for the 60 countries where resistance data were available from the current project followed by global predictions of resistance for all 259 countries and territories (Supplementary Data 5).

Overfitting remains a major hurdle when applying predictive models especially involving many predictors. Breiman45 proved that random forest is protected from overfitting by the incorporation of out-of-bag (OOB) estimates and from the law of large numbers. This followed from earlier proposals on the use of OOB estimates as a key part of estimation of generalization error46,47. In the study of error estimates for bagged classifiers, Breiman48,49 provided empirical evidence demonstrating same accuracy from using the OOB estimate as using a test set of the same size as the training set and further indicated that the use of the OOB error estimate removes the need for a set aside test set. OOB is the mean prediction error on each training sample x i , using only the trees that did not have x i in their bootstrap sample50. During random forest training, approximately one third of the instances are left out from each bootstrap training set. This means that the OOB estimates are computed from only about one third as many classifiers as in the ongoing main combination. However, the error rate decreases as the number of combinations increases, which means the OOB estimates will tend to overestimate the current error rate. To get unbiased OOB estimates, random forests are run past the point where the test set error converges. This has an added advantage that, unlike cross-validation, where bias is present but its extent unknown, the OOB estimates are unbiased51.

Random forests also aggregate many decision trees to limit overfitting as well as error due to bias because of the law of large numbers. Random forests limit overfitting without substantially increasing error due to bias due to their ability to mitigate the problems of high variance and high bias45.

However, a ten-fold cross-validation was included during our model building by randomly partitioning model input samples into ten sets of roughly equal size followed by estimation of accuracy based on held-out samples. This held-out sample was each time returned to the training set and the procedure was repeated with the second subset held out and so forth. Cross-validation and checking for valid accuracy were also performed with an implication that accuracy scores reduce if there is overfitting and only “valid accuracy” is finally used.

Creation of global figures and maps

QGIS 2.18.11 using the cartogram package was used to create colored and distorted maps.

Analysis of tetracyclines, sulfonamides, macrolides, and quinolone

The analysis of tetracyclines, sulfonamides, macrolides, and quinolones in the sewage samples were performed as in Beredsen et al.52, with adaptations as summarized in the following text. For pretreatment, two 10-mL portions of each sample of sewage supernatant were weighed into separate 50-mL tubes, after which internal standards were added. To one of these portions, antimicrobials were added at a level of 25 ng/L for the sulfonamides and 100 ng/L for the tetracyclines, quinolones, and macrolides. Four mL of EDTA-McIlvain buffer (0.1 M, pH 4.0) were added, after which the samples were shaken for 5 min head-over-head. The residue was taken up in 100 μL MeOH, after which 400 μL of water was added. For liquid chromatography tandem mass spectrometry (LC-MS/MS), the following gradient was applied: 0–0.5 min 1% B; 0.5–2.5 min linear increase to 25% B; 2.5–5.4 min linear increase to 70% B; 5.4–5.5 min linear increase to 100% B, with a final hold of 1.0 min. The injection volume was 5 μL. Detection was carried out by an AB Sciex (Ramingham, MA, USA) Q-Trap 5500 or a Q-Trap 6500 mass spectrometer in positive electrospray ionization (ESI). The parameters used for the Q-Trap 5500 and the Q-Trap 6500 were: capillary voltage, −4.0 kV; declustering potential, 10 V; source temperature, 450 °C; GAS 1 and 2, 50 (arbitrary units).

Analysis of aminoglycosides

The analysis of aminoglycosides in the sewage samples was performed as in Bello53, with adaptations as summarized in the following. For sample pretreatment, two 10-mL portions of each sample of sewage were weighed into separate 50-mL tubes, after which internal standards were added. To one of these portions, aminoglycosides were added at a level of 50 μg/L. Twenty mL of extraction liquid (10 mM KH 2 PO 4 with 0.4 mM EDTA and 2% TCA) were added, and samples were mixed by means of a vortex and shaken head-over-head for 10 min. The extract was then brought to pH 7.6–7.9 and centrifuged (15 min, 3600 × g). The complete extract was transferred to a conditioned CBX cartridge, followed by washing with 4 mL of water and drying. The aminoglycosides were eluted with 3 mL of acetic acid (10% in MeOH). The eluate was dried at 60 °C, evaporated under N 2 and taken up in 400 μL of HFBA (0.065%). For LC-MS/MS, the following gradient was applied: 0–0.5 min, 0% B; 0.5–5 min, linear increase to 45% B; 5–8 min, linear increase to 60% B; 8–10 min, linear increase to 100% B. The injection volume was 40 μL. Detection was carried out by a Waters (Milford, MA, USA) Quattro Ultima mass spectrometer in positive ESI mode. The parameters used were: capillary voltage, 2.7 kV; desolvation temperature, 500 °C; source temperature, 120 °C; cone gas, 150 L/h; and desolvation gas 550 L/h.

Analysis of β-lactams

The analysis of β-lactams in the sewage samples was performed as in Beredsen et al.54, with adaptations as summarized in the following. For sample pretreatment, two 10-mL portions of each sample of sewage were weighed into separate 50-mL tubes, after which internal standards were added. To one of these portions, β-lactams were added at a level of 50 μg/L for the penicillins and 500 μg/L for the cephalosporins and carbapenems. Detection was carried out by a Waters model Xevo TQS or an AB Sciex (Ramingham, MA, USA) Q-Trap 6500 mass spectrometer in positive ESI mode. The parameters used for the QTrap 6500 were: capillary voltage, 2.0 kV; cone voltage, 25 V; source offset, 20 V; source temperature, 150 °C; desolvation temperature, 550 °C; cone gas flow, 150 L/h; and desolvation gas, 600 L/h.

Calculation of defined daily dosages (DDDs) based on residues

After consumption, antimicrobials undergo (1) metabolization in the body, (2) are eliminated from the body with urine and/or feces, and might (3) further degrade in the sewer. A proper calculation of DDD/person/day from concentrations in sewage therefore requires information on elimination rates and a precise estimate of the number of persons connected to a sewer and the total water flow at this specific location (or the average water consumption per person as surrogate). Here information on elimination rates and water consumption were not included. In addition, we can only calculate DDD for human drugs used for systematic infections. As seen from Supplementary Data 1, the amount of dihysdrostreptomycin is almost absent, the use of Dapson very low, and the contribution of enrofloxacin and tylosin 0.1% and 0.3% of the residues of quinolones and macrolides, respectively. For the sulfonamides, the excluded drugs constitute 11%.

The concentrations of antimicrobial residues were transformed into DDDs of antimicrobials for systemic use using the WHO DDD database [www.whocc.no/atc_ddd_index]. Some residues were excluded as their predominant usage was not systemic. This was the case for Dapson (main treatment indication: Lepra treatment), sulfacetamide (main indication: acne), sulfadoxine (main indication: malaria), sulfathiazole (main indication: topical application, with varying dosages), enrofloxacin (animal use—no WHO/ATCC DDD available), tylosin (animal use—no WHO/ATCC DDD available), and dihydrostreptomycin (animal use—no WHO/ATCC DDD available). The DDD were summed over the respective antimicrobial classes. Calculation of the sum of DDD across classes was not deemed meaningful, because excretion rates differ largely between antimicrobial classes and also within antimicrobial classes.