No statistical methods were used to predetermine sample size. The experiments were not randomized and the investigators were not blinded to allocation during experiments and outcome assessment.

Ethical statement

This work was evaluated and approved by the relevant Institutional Review Boards (IRBs) or Ethics Review Committees at The Scripps Research Institute (TSRI) and the US Army Medical Research Institute of Infectious Diseases (USAMRIID) Office of Human Use and Ethics. This work was conducted as part of the public health response in Florida and samples were collected under a waiver of consent granted by the Florida Department of Health (DOH) Human Research Protection Program. The work received a non-human subjects research designation (category 4 exemption) by the Florida DOH because this research was performed with leftover clinical diagnostic samples involving no more than minimal risk. All samples were de-identified before receipt by the study investigators.

Florida Zika virus case data

Weekly reports of international travel-associated and locally acquired ZIKV infections diagnosed in Florida were obtained from the Florida DOH mosquito-borne disease surveillance system4. Dates of symptom onset from the Miami transmission zones (Wynwood, Miami Beach, and Little River) determined by the Florida DOH investigation process were obtained from the ZIKV resource website34 and daily updates35. International travel-associated ZIKV case counts in the United States (outside Florida) were obtained from the CDC36. The local and travel-associated ZIKV case numbers for Florida were obtained from the Florida DOH. The one local ZIKV infection diagnosed in Duval County was believed to have originated elsewhere in Florida. Therefore, this case is listed as ‘unknown origin’ in Fig. 1b. In Fig. 3e, only the countries visited five or more times by ZIKV-infected travellers diagnosed in Florida are shown. Countries with fewer than five visits were aggregated into an “other” category by region (that is, Caribbean, South America, or Central America).

Clinical sample collection and RNA extraction

Clinical samples from locally acquired ZIKV infections were collected between 22 June and 11 October 2016. The Florida DOH identified persons with compatible illness and clinical samples were shipped to the Bureau of Public Health Laboratories for confirmation by qRT–PCR and antibody tests following interim guidelines3,37,38,39. Clinical specimens (whole blood, serum, saliva, or urine) submitted for analysis were refrigerated or frozen at or below −70 °C until RNA was extracted. RNA was extracted using the RNAeasy kit (QIAGEN), MagMAX for Microarrays Total RNA Isolation Kit (Ambion), or MagNA Pure LC 2.0 or 96 Systems (Roche Diagnostics). Purified RNA was eluted into 50–100 μl using the supplied elution buffers, immediately frozen at or below −70 °C, and transported on dry ice. The Florida DOH also provided investigation data for these samples, including symptom onset dates and, when available, assignments to the zone where infection was likely to have occurred (Supplementary Table 1).

Mosquito collection, RNA extraction, and entomological data analysis

24,351 A. aegypti and Aedes albopictus mosquitoes (sorted into 2,596 pools) were collected throughout Miami-Dade County during June–November 2016 using BG-Sentinel mosquito traps (Biogents AG). Up to 50 mosquitoes of the same species and sex were pooled per trap. The pooled mosquitoes were stored in RNAlater (Invitrogen), RNA was extracted using either the RNAeasy kit (QIAGEN) or MagMAX for Microarrays Total RNA Isolation Kit (Ambion), and ZIKV RNA was detected by qRT–PCR targeting the envelope protein coding region39 or the Trioplex qRT–PCR kit40. ZIKV infection rates were calculated per 1,000 female A. aegypti mosquitoes using the bias-corrected maximum likelihood estimate (MLE)41. Days of insecticide usage by the Miami-Dade Mosquito Control were inferred from the zone-specific ZIKV activities timelines published by the Florida DOH34.

Relative monthly Aedes aegypti abundance

For the purpose of this study we used A. aegypti suitability maps from ref. 14 and derived monthly estimates based on the statistical relationships between mosquito presence and environmental correlates42. Following ref. 43, we used a simple mathematical formula to transform the probability of detection maps into mosquito abundance maps. We assumed P(Y = 1) where Y is a binary variable (presence/absence). Using a Poisson distribution X() to govern the abundance of mosquitoes, the probability of not observing any mosquitoes can be related to the probability of absence as: P(X = 0) = P(Y = 0). We used the following transformation to generate abundance (λ) estimates per county in Florida:

We did not consider A. albopictus abundance in this study because 99.8% of mosquitoes collected in Miami-Dade County were A. aegypti. Relative A. aegypti abundance in major US cities presented in Extended Data Fig. 8 was estimated as previously described22.

Zika virus quantification

ZIKV genome equivalents (GE) were quantified by qRT–PCR. At TSRI, ZIKV qRT–PCR was performed as follows: ZIKV RNA standards were transcribed from the ZIKV NS5 region (nucleotides (nt) 8,651–9,498) using the T7 forward primer (5′-TAATACGACTCACTATAGGGAGATCAGGCTCCTGTCAAAACCC-3′), reverse primer (5′-AGTGACAACTTGTCCGCTCC-3′), and the T7 Megascript kit (Ambion). For qRT–PCR, primers and a probe targeting the NS5 region (nt 9,014–9,123) were designed using the ZIKV isolate PRVABC59 (GenBank: KU501215): forward primer (5′-AGTGCCAGAGCTGTGTGTAC-3′), reverse primer (5′-TCTAGCCCCTAGCCACATGT-3′), and FAM-fluorescent probe (5′-GGCAGCCGCGCCATCTGGT-3′). The qRT–PCR assays were performed in 25-μl reactions using the iScript One-step RT–PCR Kit for probes (Bio-Rad Laboratories Inc.) and 2 μl of sample RNA. Amplification was performed at 50 °C for 20 min, 95 °C for 3 min, and 40 cycles of 95 °C for 10 s and 57 °C for 10 s. Fluorescence was read at the end of the 57 °C annealing–extension step. Tenfold dilutions of the ZIKV RNA transcripts (2 μl per reaction) were used to create a standard curve for quantification of ZIKV GE per μl RNA. The lower limits of quantification are 4 GE per μl RNA, or at a cycle threshold of ~36.

ZIKV GE were quantified at USAMRIID using the University of Bonn ZIKV envelope protein (Bonn E) qRT–PCR assay44. RNA standards were transcribed using an amplicon generated from a ZIKV plasmid containing T7 promoter at the start of the 5′ untranslated region (UTR). The plasmid was designed using the ZIKV isolate BeH819015 (GenBank: KU365778.1) and the amplicon included nt 1–4,348, which covers the 5′ UTR, C, prM, M, E, NS1, and NS2 regions. The qRT–PCR assays were performed in 25-μl reactions using the SuperScript III platinum One-step qRT–PCR Kit (ThermoFisher) and 2 μl of sample RNA was used. Amplification was performed following conditions as previously described44. Tenfold dilutions of the ZIKV RNA transcripts (5 μl per reaction) were used to create a standard curve for quantification of ZIKV GE per μl RNA.

Amplicon-based Zika virus sequencing

ZIKV sequencing at TSRI was performed using an amplicon-based approach using the ZikaAsian V1 scheme, as described24. This approach is similar to ‘RNA jackhammering’ to sequence low-quality viral samples described in ref. 45. In brief, cDNA was reverse-transcribed from 5 μl RNA using SuperScript IV (Invitrogen). ZIKV cDNA (2.5 μl per reaction) was amplified in 35 × 400-bp fragments from two multiplexed PCR reactions using Q5 DNA High-fidelity Polymerase (New England Biolabs). The amplified ZIKV cDNA fragments (50 ng) were prepared for sequencing using the Kapa Hyper prep kit (Kapa Biosystems) and SureSelect XT2 indexes (Agilent). Agencourt AMPure XP beads (Beckman Coulter) were used for all purification steps. Paired-end 251-nt reads were generated on the MiSeq using the V2 500 cycle or V3 600 cycle kits (Illumina).

Trimmomatic was used to remove primer sequences (first 22 nt from the 5′ end of the reads, which is the maximum length of the primers used for the multiplexed PCR) and bases at both ends with Phred quality score <20 (ref. 46). The reads were then aligned to the complete genome of a ZIKV isolate from the Dominican Republic, 2016 (GenBank: KU853012) using Novoalign v3.04.04 (www.novocraft.com). Samtools was used to sort the aligned BAM files and to generate alignment statistics47. Snakemake was used as the workflow management system48. The code and reference indexes for the pipeline can be found at https://github.com/andersen-lab/zika-pipeline. ZIKV-aligned reads were visually inspected using Geneious v9.1.5 (ref. 49) before generating consensus sequences. A minimum of 3 × read-depth coverage, in support of the consensus, was required to make a base call.

Enrichment-based Zika virus sequencing

ZIKV sequencing at USAMRIID was performed using a targeted enrichment approach. Sequencing libraries were prepared using the TruSeq RNA Access Library Prep kit (Illumina) with custom ZIKV probes. The set included 866 unique probes each of which was 80 nt in length (Supplementary Table 2a). The probes were designed to cover the entire ZIKV genome and to encompass the genetic diversity present on GenBank on 14 January 2016. In total, 26 ZIKV sequences were used during probe design (Supplementary Table 2b). Extracted RNA was fragmented at 94 °C for 0–60 s and each sample was enriched separately using a quarter of the reagents specified in the manufacturer’s protocol. Samples were barcoded, pooled and sequenced using the MiSeq Reagent kit v3 (Illumina) on an Illumina MiSeq with a minimum of 2 × 151-bp reads. Dual indexing, with no overlapping indices, was used.

The random hexamer associated with read one and the Illumina adaptors were removed from the sequencing reads using Cutadapt v1.9.dev1 (ref.50), and low-quality reads or bases were filtered using Prinseq-lite v0.20.3 (ref.51). Reads were aligned to a reference genome (GenBank: KX197192.1) using Bowtie2 v2.0.6 (ref.52), duplicates were removed with Picard (http://broadinstitute.github.io/picard), and a new consensus was generated using a combination of Samtools v0.1.18 (ref.47) and custom scripts (https://github.com/jtladner/Scripts/blob/master/reference-based_assembly/consensus_fasta.py). Only bases with Phred quality score ≥20 were used in consensus calling, and a minimum of 3 × read-depth coverage, in support of the consensus, was required to make a call; positions lacking this depth of coverage were treated as missing (that is, called as ‘N’).

Validation and comparison of sequencing methods

The consensus ZIKV sequences from FL01M and FL03M generated by sequencing 35 × 400-bp amplicons on the MiSeq were validated using the following approaches: (1) sequencing the 35 × 400-bp amplicons on the Ion S5 platform (ThermoFisher); (2) sequencing amplicons generated using an Ion AmpliSeq (ThermoFisher) panel custom-targeted towards ZIKV on the Ion S5 platform; and (3) sequencing 5 × 2,150–2,400-bp ZIKV amplicons on the MiSeq. For Ion library preparation, cDNA was synthesized using the SuperScript VILO kit (ThermoFisher). ThermoFisher designed 875 custom ZIKV primers to produce 75 amplicons of ~200 bp in two PCR reactions for use with their Ion AmpliSeq Library Kit 2.0. The reagent FuPa was used to digest the modified primer sequences after amplification. The DNA templates were loaded onto Ion 520 chips using the Ion Chef and sequenced on the Ion S5 with the 200-bp output (ThermoFisher). The 35 × 400-bp amplicons generated for the MiSeq as described above were introduced into the Ion workflow using the Ion AmpliSeq Library Kit 2.0, but without fragmentation. Primers to amplify 2,150–2,400-bp ZIKV fragments (Supplementary Table 2c) were kindly provided by S. O’Connor, D. Dudly, D. O’Connor, and D. Gellerup (AIDS Vaccine Research Laboratory, University of Wisconsin, Madison). Each fragment was amplified individually by PCR using the cDNA generated above, Q5 DNA High-fidelity Polymerase, and the following thermocycle conditions: 55 °C for 30 min, 94 °C for 2 min, 35 cycles of 94 °C for 15 s, 56 °C for 30 s, and 68 °C for 3.5 min, 68 °C for 10 min, and held at 4 °C until use. Each PCR product was purified using Agencourt AMPure XP beads, sheared to 300–400-nt fragments using the Covaris S2 sonicator, indexed and prepared for sequencing as described above, and sequenced using the MiSeq V2 500 cycle kit (paired-end 251-nt reads). Compared to the consensus sequences generated using 35 × 400-bp amplicons on the MiSeq, there were no consensus-level mismatches in the coding sequence using any of the other three approaches (Extended Data Table 2). There were, however, some mismatches in the 5′ and 3′ UTRs (where the genomic RNA is heavily structured), probably as a result of PCR bias and decreased coverage depth.

At least 95% of the ZIKV genome was covered from samples with as low as 4 and 9 GE per μl RNA from the amplicon and enrichment approaches, respectively. These results are similar to our previously determined clinical range of 10–16 ZIKV GE per μl RNA to achieve at least 95% genome coverage using our amplicon-based approach24. On average, the amplicon-based sequencing approach covered 97% of the ZIKV genome (≥3 × read-depth) and the targeted enrichment approach covered 82% of the ZIKV genome from clinical samples (Supplementary Table 2d).

Phylogenetic analyses

All published and available complete ZIKV genomes of the Asian genotype from the Pacific and the Americas were retrieved from GenBank public database in December 2016. Public sequences (n = 65) were codon-aligned together with ZIKV genomes generated in this study (n = 39) using MAFFT53 and inspected manually. The multiple alignment contained 104 ZIKV sequences collected between 2013 and 2016, from the Pacific (American Samoa, French Polynesia, and Tonga), Brazil, other South and Central Americas (Guatemala, Mexico, Suriname, and Venezuela), the Caribbean (Dominican Republic, Guadeloupe, Haiti, Martinique, and Puerto Rico), and the United States (Supplementary File 1).

To determine the temporal signal of the sequence dataset, a maximum likelihood (ML) phylogeny was first reconstructed with PhyML54 using the general time-reversible (GTR) nucleotide substitution model and gamma distributed rates amongst sites55(Supplementary File 1), which was identified as the best fitting model for ML inference by jModelTest256. Then, a correlation between root-to-tip genetic divergence and date of sampling was conducted in TempEst57.

Bayesian phylogenetic analyses were performed using BEAST v.1.8.4 (ref.25) to infer time-structured phylogenies. We used an SDR06 nucleotide substitution model58 with a non-informative continuous time Markov chain reference prior (CTMC)59 on the molecular clock rate. Replicate analyses using multiple combinations of molecular clock and coalescent models were explored to select the best fitting model by marginal likelihood comparison using path-sampling and stepping-stone estimation approaches60,61,62(Extended Data Table 1b). The best fit model was a relaxed molecular clock along with a Bayesian Skyline model63. All the Bayesian analyses were run for 30 million Markov chain Monte Carlo steps, sampling parameters and trees every 3,000 generations (BEAST XML file and MCC tree available in Supplementary File 1). Support values for all nodes are embedded in the phylogenetic tree files (Supplementary File 1). Tree visualizations were generated with baltic (https://github.com/blab/baltic).

The travel-associated ZIKV genomes add to the Caribbean dataset, but do not directly influence our conclusions about the source of ZIKV introductions into Florida.

Expected number and distribution of local cases from Zika virus importations

We used branching process theory64,65to generate the offspring distribution (subsequent local cases) that is expected from a single introduction. The offspring distribution L is modelled with a negative binomial distribution with mean R 0 and over-dispersion parameter k. The total number of cases j that is caused by a single importation (including the index case) after an infinite time66 has the following form:

The parameter k represents the variation in the number of secondary cases generated by each case of ZIKV64. In the case of vector-borne diseases, local heterogeneity is high owing to a variety of factors such as mosquito population abundance, human–mosquito interactions, and control interventions67,68,69,70,71,72. Here, we assumed high heterogeneity (k = 0.1) following previous estimates for vector-borne diseases65. This distribution L is plotted in Extended Data Fig. 4a. For the following, we took a forward simulation approach, drawing random samples from this distribution. All estimates were based on 100,000 random simulations.

We used this formula to estimate the probability of observing 241 local cases in Miami-Dade County alongside 320 travel-associated cases. We approached this by sampling 320 introduction events from L and calculating the total number of local cases in the resulting outbreak (Extended Data Fig. 4b). We also calculated the likelihood of observing 241 local cases in the total outbreak (Extended Data Fig. 4c), finding that the MLE of R 0 lies between 0.35 and 0.55. As a sensitivity analysis, we additionally modelled introductions with the assumption that only 50% of travellers were infectious at time of arrival into Miami-Dade County, resulting in an MLE of R 0 of 0.45–0.8.

We further used this formula to address the probability of observing 3 distinct genetic clusters (F1, F2 and F3) representing three introduction events in a sample of 27 ZIKV genomes from Miami-Dade County. We approached this by sampling introduction events until we accumulated 241 local cases according to L, arriving at N introduction events with case counts (j 1 , j 2 , … j N ). We then sampled 27 cases without replacement from (j 1 , j 2 , … j N ) following a hypergeometric distribution and recorded the number of distinct clusters drawn in the sample. We found that higher values of R 0 resulted in fewer distinct clusters within the sample of 27 genomes (Extended Data Fig. 4d). We additionally calculated the likelihood of sampling three distinct genetic clusters in 27 genomes (Extended Data Fig. 4e), finding an MLE estimate of R 0 of 0.7–0.9. Additionally, as a sensitivity analysis we modelled a preferential sampling process in which larger clusters are more likely to be drawn from than smaller clusters. Here, we used a parameter α that enriches the hypergeometric distribution following . In this case, we found an MLE estimate of R 0 of 0.5–0.9.

Using the overlap of estimates of R 0 from local case counts (0.35–0.8) and genetic clusters (0.5–0.9), we arrived at a 95% uncertainty range of R 0 of 0.5–0.8. As an additional sensitivity analysis, we incorporated under-reporting in which either 50% of travel-associated cases and 25% of local cases are reported or in which 10% of travel-associated cases and 5% of local cases are reported. We found that differential reporting of travel and local cases resulted in increased mean R 0 estimates when comparing counts of travel-associated to local cases (Extended Data Fig. 4f, g). Additionally, under-reporting increased estimates of R 0 from the sampling analysis (Extended Data Fig. 4h, i). Thus, moderate under-reporting is consistent with R 0 estimates of ~0.8.

We additionally performed birth–death stochastic simulations assuming a serial interval with mean of 20 days15. We recorded the number of stochastic simulations still persisting after a particular number of days for different values of R 0 (Fig. 2c).

Zika virus incidence rates

Weekly suspected and confirmed ZIKV case counts from countries and territories within the Americas with local transmission (1 January to 18 September 2016) were obtained from the Pan American Health Organization (PAHO)30. In most cases, the weekly case numbers per country were reported only in bar graphs. We contacted PAHO multiple times with the hope of gaining access to the raw data included in the bar graphs, but our requests were unfortunately denied. Therefore we used WebPlotDigitizer v3.10 (http://arohatgi.info/WebPlotDigitizer) to estimate the numbers. We compared the actual ZIKV case numbers reported in Ecuador73(the only country with available raw data and reported cases over 10 per week) to our estimates from the PAHO bar graphs and found that the WebPlotDigitizer was ~99% accurate (Extended Data Fig. 5a, b).

Country and territory total population sizes to calculate weekly and monthly ZIKV incidence rates were also obtained from PAHO74. Incidence rates calculated from countries and territories in the Americas during January–June 2016 (based on the earliest introduction time estimates until the first known cases) were used as an estimate for infection likelihood to investigate sources of ZIKV introductions.

Airline and cruise ship traffic

To investigate whether the transmission of ZIKV in Florida coincided with travel patterns from ZIKV endemic regions, we obtained the number of passengers arriving at airports in Florida via commercial air travel. We collated data for flights arriving at all commercial airports in Florida from countries and territories in the Americas with local ZIKV transmission between January and June 2016 (based on the earliest introduction time estimates until the first known cases, Supplementary Table 1b). The data were obtained from the International Air Transportation Association, which collects data on an estimated 90% of all passenger trips worldwide. Nelson et al.28 previously reported flight data from 33 countries with ZIKV transmission entering major US airports from October 2014 to September 2015, which we used to assess the potential for ZIKV introductions outside of Florida.

Schedules for cruise ships visiting Miami, Port Canaveral, Port Everglades, Fort Lauderdale, Key West, Jacksonville (all in Florida), Houston, Galveston (both in Texas), Charleston (South Carolina) and New Orleans (Louisiana) ports in the year 2016 were collated from www.cruisett.com and confirmed by cross-referencing ship logs reported by Port of Miami and reported ship schedules from www.miamidade.gov/portmiami/. Scheduled cruise ship capacities were extracted from www.cruisemapper.com. Every country or territory with ZIKV transmission visited by a cruise ship 10 days (the approximate mean time to ZIKV clearance in human blood (that is, the infectious period))75 before arrival was counted as contributing the ship’s capacity worth of passengers to Miami to the month of arrival (Supplementary Table 1b). While the air traffic was based on the reported number of travellers, we estimated the sea traffic by ship capacity. Lee and Ramdeen76 reported that the average occupancy of cruise ships travelling to the Caribbean Islands exceeded 100% in 2011, and according to the Florida-Caribbean Cruise Association77, it remained >100% in 2015. Occupancy data for 2016 was not available at the time of publication, but we assumed that it was also near 100%.

Expected number of travellers infected with Zika virus

We estimated the expected number of travellers entering Miami who were infected with ZIKV (λ) by using the total travel capacity (C) and the likelihood of ZIKV infection (infections (I) per person (N)) from each country/territory (i):

We summed the number of expected infected travellers from each country or territory with ZIKV transmission by region and travel method (flights or cruises). The number of ZIKV cases reported by each country are likely to be underestimates, in part because the majority of ZIKV infections are asymptomatic2,31. We normalized some of the potential reporting variances between countries by reporting the data as the relative proportion of infected travellers (Fig. 3c, Extended Data Fig. 7a) and as the absolute number of infected travellers (Fig. 3d, Extended Data Fig. 7b, Supplementary Table 1b) from each region. We also accounted for potential reporting biases with incidence rates by using ZIKV attack rates (that is, proportion infected before epidemic burnout) to estimate peak transmission intensity. Attack rates were calculated using a susceptible–infected–recovered (SIR) transmission model derived from seroprevalence studies and environmental factors as described78. Using attack rates as an estimate of infection likelihood, we predict that ~60% of the infected travellers entering Miami came from the Caribbean (Extended Data 7b), which is in agreement with our methods using incidence rates of ~60–70% (Fig. 3c). A list of countries and territories used in these analyses can be found in Supplementary Table 1b.

Maps

The maps presented in our figures were generated using Matplotlib79 and ESRI basemaps (www.esri.com/data/basemaps). The software and basemaps are open source and freely available to anyone.

Data availability

All ZIKV sequencing data are available under NCBI BioProjects PRJNA342539 and PRJNA356429. Individual sample GenBank access numbers are listed in Supplementary Table 1a. All other data are available in the Extended Data or Supplementary Information, or upon request.