Zika virus genomes from Brazil The Zika virus outbreak is a major cause for concern in Brazil, where it has been linked with increased reports of otherwise rare birth defects and neuropathology. In a phylogenetic analysis, Faria et al. infer a single introduction of Zika to the Americas and estimated the introduction date to be about May to December 2013—some 12 months earlier than the virus was reported. This timing correlates with major events in the Brazilian cultural calendar associated with increased traveler numbers from areas where Zika virus has been circulating. A correlation was also observed between incidences of microcephaly and week 17 of pregnancy. Science, this issue p. 345

Abstract Brazil has experienced an unprecedented epidemic of Zika virus (ZIKV), with ~30,000 cases reported to date. ZIKV was first detected in Brazil in May 2015, and cases of microcephaly potentially associated with ZIKV infection were identified in November 2015. We performed next-generation sequencing to generate seven Brazilian ZIKV genomes sampled from four self-limited cases, one blood donor, one fatal adult case, and one newborn with microcephaly and congenital malformations. Results of phylogenetic and molecular clock analyses show a single introduction of ZIKV into the Americas, which we estimated to have occurred between May and December 2013, more than 12 months before the detection of ZIKV in Brazil. The estimated date of origin coincides with an increase in air passengers to Brazil from ZIKV-endemic areas, as well as with reported outbreaks in the Pacific Islands. ZIKV genomes from Brazil are phylogenetically interspersed with those from other South American and Caribbean countries. Mapping mutations onto existing structural models revealed the context of viral amino acid changes present in the outbreak lineage; however, no shared amino acid changes were found among the three currently available virus genomes from microcephaly cases. Municipality-level incidence data indicate that reports of suspected microcephaly in Brazil best correlate with ZIKV incidence around week 17 of pregnancy, although this correlation does not demonstrate causation. Our genetic description and analysis of ZIKV isolates in Brazil provide a baseline for future studies of the evolution and molecular epidemiology of this emerging virus in the Americas.

Zika virus (ZIKV) is a single-stranded, positive-sense RNA virus with a 10.7-kb genome encoding a single polyprotein that is cleaved into three structural proteins (C, prM/M, E) and seven nonstructural proteins (NS1, NS2A, NS2B, NS3, NS4A, NS4B, and NS5) (1). ZIKV is a member of the family Flaviviridae, genus Flavivirus, and is transmitted among humans by Aedes mosquito species such as A. aegypti, A. albopictus, and A. africanus. The virus was first isolated in 1947 from a sentinel rhesus monkey in the Zika forest in Uganda (2) and is classified by sequence analysis into two genotypes, African and Asian (3). In humans, ZIKV infection typically causes a mild and self-limiting illness known as Zika fever (4), which is accompanied by maculopapular rash, headache, conjunctivitis, and myalgia. In April 2007, a large epidemic of Asian genotype ZIKV was reported in Yap Island and Guam, Micronesia (5, 6). Between 2013 and 2014, the ZIKV Asian genotype caused epidemics reported in several Pacific Islands, including French Polynesia (7), New Caledonia (8), the Cook Islands (9), Tahiti (10), and Easter Island (11).

By May 2015, ZIKV was reported in Brazil (12) and, subsequently, in several Central and South American countries and in the Caribbean. In Brazil, nearly 30,000 cases of ZIKV infection had been reported by 30 January 2016 (supplementary materials section 1.4). These occurrences point to an epidemic peak in mid-July 2015 (Fig. 1A), and most Brazilian ZIKV cases (93%) were reported in Bahia state (Fig. 1B). Surveillance of ZIKV in Brazil began after the country’s first reported case and is conducted through the national Notifiable Diseases Information System (SINAN), which currently relies on passive case detection and reporting and therefore underestimates incidence (13). ZIKV is now widespread in Brazil, with autochthonous transmission and high incidence notified in 22 of 27 administrative states (14). ZIKV infection during pregnancy has been hypothesized to cause microcephaly and congenital abnormalities (15–20). The detection of ZIKV in fetal brain tissue (17, 20) and amniotic fluid (21) supports the hypothesis that the virus is transmitted from mother to child (22); further, the virus infects neural progenitor cells in vitro (23). In Brazil, between November 2015 and 30 January 2016, 4783 suspected cases of microcephaly were reported electronically to the RESP database (www.resp.saude.gov.br; Ministry of Health, Brazil) (supplementary materials section 1.4) (Fig. 1C), although most suspected cases are still under investigation and a substantial proportion may represent misdiagnosis and overreporting (24). Using the 4 March 2016 World Health Organization guidelines for microcephaly diagnosis (25), we identified a total of 1118 suspected microcephaly cases suitable for analysis. The relation between total per-capita ZIKV incidence (Fig. 1B) and per-capita suspected microcephaly cases (Fig. 1C) in each state is weak and only significant under nonparametric correlation (P < 0.01) (fig. S1A); noise and uncertainty probably affect both variables. However, the relation is strengthened if suspected microcephaly cases are measured per pregnancy (fig. S1B). For municipalities with reported ZIKV incidence and cases of suspected microcephaly, we used a simple linear model to link microcephaly cases as a function of past ZIKV incidence (supplementary materials section 1.5). On average, suspected microcephaly cases are best predicted by ZIKV incidence during week 17 of pregnancy (95% confidence interval of mean = ±0.11 weeks) or week 14 for suspected severe microcephaly cases (±0.08 weeks). These findings are in general agreement with individual reports of the timing of ZIKV symptoms in mothers of infants with microcephaly (16, 19, 21). We stress that these results quantify only the correlation between ZIKV and suspected microcephaly and do not demonstrate a causal link. Ongoing studies are aiming to establish whether ZIKV is a causal factor in microcephaly and other conditions (15–17, 23, 26).

Fig. 1 Time series and cartography of reported Zika virus and microcephaly cases in Brazil. (A) Number of suspected cases of ZIKV per week in 5596 municipalities in Brazil. The epidemic peaked from 12 to 18 July 2015 (n = 2791 cases). Letters indicate months. (B) Total incidence of ZIKV cases per 100,000 people in each federal state. Triangles indicate sampling locations of the sequences reported here; circles indicate locations of other genomes from Brazil [municipality of Natal in Rio Grande do Norte state (16) and an unknown municipality in Paraiba state (21)]. Red symbols indicate ZIKV genomes isolated from microcephaly cases. Federal states are indicated by two-letter codes: PA, Pará; MA, Maranhão; CE, Ceará; RN, Rio Grande do Norte; PB, Paraíba; SP, São Paulo. Per-capita incidences in each state were calculated using high-resolution gridded human population–size data sets for Brazil (45). (C) Incidence of suspected microcephaly cases per 100,000 people in each federal state. Per-capita incidences for each state were calculated as described for (B).

We used phylogenetic, epidemiological, and mobility data to quantify ZIKV evolution and explore the introduction of the virus to the Americas. As part of ongoing surveillance by the Brazilian Ministry of Health, national laboratories, and other institutions, we used next-generation sequencing to generate seven complete ZIKV coding region sequences from samples collected during the outbreak. Our samples include one from a deceased newborn with microcephaly and congenital malformations collected in Ceará and one from a fatal adult case with lupus and rheumatoid disease from Maranhão state (Fig. 1B). None of the Brazilian patients reported overseas travel (information unavailable in one case), and one individual was a blood donor (supplementary materials section 2). A comparison of our genomes with other available Brazilian strains reveals that Brazilian ZIKV isolates differ at multiple nucleotide sites across the 10.3-kb coding region. The ZIKV genome recovered from isolate ZIKSP, from São Paulo, had 32 nucleotide changes compared with the microcephaly case (BeH823339) and 34 compared with the fatal case from Maranhão (BeH818305). Isolates BeH819966 (from Belém), BeH815744 (from Paraíba), and BeH18995 (from Belém) had a maximum of five nucleotide changes.

Maximum likelihood analysis of complete coding regions from our and other ZIKV genome sequences revealed that all viruses sampled in the Americas, including those from Brazil, form a robust monophyletic cluster (bootstrap score = 94%) within the Asian genotype (Fig. 2 and fig. S2) and share a common ancestor with the ZIKV strain that circulated in French Polynesia in November 2013 (Fig. 3). Previous analyses of outbreaks of related flaviviruses (27, 28) suggest that, to be informative, molecular epidemiological studies of the current ZIKV epidemic should use full or near-complete coding region sequences.

Fig. 2 Maximum likelihood phylogeny of ZIKV complete coding region sequences. Bootstrap scores are shown next to well-supported nodes, and the phylogeny was midpoint rooted. A fully annotated tree is provided in fig. S2. The American ZIKV outbreak clade is drawn as a narrow white triangle and is shown in detail in Fig. 3. Asterisks highlight the four internal branches that are ancestral to the American ZIKV lineage (see main text and fig. S3). There is a correlation between the sampling date of each sequence and the genetic distance of that sequence from the root of a maximum likelihood phylogeny of the Asian genotype (correlation coefficient R2 = 0.997). A molecular clock phylogeny of these data is shown in Fig. 3. The Malaysian strain (HQ234499) sampled in 1966 is the oldest representative of the Asian genotype and falls on the regression line, indicating that it does not appear to be unusually divergent for its age. A similar analysis with the HQ234499 strain excluded is shown in fig. S5C.

Fig. 3 Time scale of the introduction of ZIKV to the Americas. (A) Molecular clock phylogeny of the ZIKV outbreak lineage estimated from complete coding region sequences, plus six sequences (KJ634273, KU312315, KU312314, KU212313, KU646828, and KU646827) longer than 1500 nucleotides (available data as of 7 March 2016). For visual clarity, three basal sequences—HQ23499 (Malaysia, 1966), EU545988 (Micronesia, 2007), and JN860885 (Cambodia, 2010)—are not displayed here (see fig. S3). Gray horizontal bars represent 95% Bayesian credible intervals for divergence dates. Letters A and B denote clades discussed in the main text; numbers under the clade letters denote posterior probabilities. Diamond sizes represent, at each node, the posterior probability support of that node. Taxa are labeled with accession number, sampling location, and sampling date. Names of sequences generated in this study are underlined. (B) Posterior distributions of the estimated ages (TMRCAs) of clades A and B, estimated with BEAST software using the best-fitting evolutionary model (table S2). The time and duration of the three events (i to iii) discussed in the main text are shown. (C) Number of airline passengers from specific countries arriving in Brazil per month versus number of suspected cases of ZIKV in French Polynesia. The blue curve (left y axis) shows a polynomial fitting of the number of travelers (blue points) from countries with recorded ZIKV outbreaks between 2012 and 2015 (French Polynesia, Thailand, Indonesia, Malaysia, Cambodia, and New Caledonia) (supplementary materials section 6), aggregated across 20 Brazilian national airports. The purple bars represent weekly numbers of suspected ZIKV cases (right y axis) in French Polynesia (FP) from 30 October 2013 to 14 February 2014 (4).

We used a phylogenetic molecular clock approach to further explore the molecular epidemiology of ZIKV in the Americas. A strong correlation between genetic divergence and sampling time within the outbreak lineage (Fig. 2, inset) shows that our approach is appropriate, provided that whole genomes are used. The estimated time-scaled phylogeny (Fig. 3A) again contains a well-supported clade of American ZIKV strains [denoted B; posterior probability (PP) = 1.00] that share a common ancestor (denoted A) with the French Polynesian lineage (PP = 0.92). Within the American ZIKV lineage (clade B), Brazilian isolates are interspersed among isolates from elsewhere in the Americas. The mingling of ZIKV genomes from different countries reveals ZIKV movement within the Americas since its introduction to the continent. Two observations suggest that the common ancestor of the American ZIKV lineage existed in Brazil. First, Brazil was the first country in the Americas to detect ZIKV (29), and second, Brazilian strains are phylogenetically more diverse within clade B than those from elsewhere. However, these observations may reflect differences in surveillance intensity among countries, and more data are required before we can exclude the scenario that ZIKV was introduced to Brazil multiple times from other locations. Although two of three ZIKV-associated microcephaly isolates group together in the phylogeny, there is no reason to posit that this lineage is associated with increased disease severity.

Estimated rates of ZIKV molecular evolution are consistent among different evolutionary models and vary from 0.98 × 10−3 to 1.06 × 10−3 nucleotide substitutions per site per year (table S3). Although these rates are high compared with whole-genome rates for other flaviviruses (28), they are consistent with retrospective analyses of previous epidemics, which show that evolutionary rate estimates decline as the epidemic progresses (30, 31). Hence, this result should not be interpreted as implying that ZIKV in the Americas is unusually mutable. We estimate that the date of the most recent common ancestor (TMRCA) of all Brazilian genomes (clade B) is Aug 2013 to Apr 2014 [95% Bayesian credible intervals (BCIs); point estimate = mid-December 2013] (Fig. 3B). The common ancestor of the French Polynesian and America lineages (clade A) was dated from December 2012 to September 2013 (BCIs; point estimate = late May 2013) (Fig. 3B). The posterior distribution for the age of clade B encompasses the recorded duration of the ZIKV outbreak in three of five island groups of French Polynesia (4) (Fig. 3C). Divergence date estimates are robust among different combinations of prior distributions, molecular clock models, and coalescent models (supplementary materials sections 4 and 5) and are more likely to shift into the past than toward the present as virus genomes accumulate through time (30).

To explore the possible routes of entry of ZIKV in Brazil, we collated airline flight data from all countries with reported ZIKV outbreaks between 2012 and the end of 2014. From late 2012, we find an increase in the number of travelers arriving in Brazil from these countries, rising from 3775 passengers per month in early 2013 to 5754 passengers per month a year later (Fig. 3C). This increase in visitors to Brazil from ZIKV-affected countries coincides with the period during which ZIKV is estimated to have entered the Americas (i.e., between TMRCAs of clades A and B) (Fig. 3B and supplementary materials section 5). If the ZIKV epidemic in Brazil did indeed arise from a single introduction, then the virus must have circulated in the country for at least 12 months before the first case was reported in May 2015. ZIKV clinical symptoms may be confused with those caused by dengue and chikungunya viruses, two endemic and epidemic viruses that cocirculate and share mosquito vectors with ZIKV in Brazil (27, 32, 33). Reliable differential diagnosis is possible only by using improved surveillance and laboratory diagnostics, which are now being implemented throughout the country.

There are two published hypotheses for how ZIKV came to be introduced into Brazil: during (i) the 2014 World Cup soccer tournament (12 June to 13 July 2014) (29) or (ii) the Va’a canoe event held in Rio de Janeiro between 12 and 17 August 2014 (34). Alternatively, introduction could have occurred during (iii) the 2013 Confederations Cup soccer tournament (15 to 30 June 2013). Notably, events (ii) and (iii) included competitors from French Polynesia. Our results suggest that the introduction of ZIKV to the Americas predated events (i) and (ii). Although the molecular clock dates are more consistent with the Confederations Cup tournament, that event ended before ZIKV cases were first reported in French Polynesia (4). Consequently, we believe that large-scale patterns in human mobility will provide more useful and testable hypotheses about viral introduction and emergence (33, 35, 36) than ad hoc hypotheses focused on specific events.

The ZIKV genome we obtained from a microcephaly case in Ceará, Brazil, contains eight amino acid changes not observed in any other complete genome in our data set. However, none of these mutations are shared with either of two recently published genomes from microcephaly cases (16, 21). Thus, if a causal link between Asian lineage ZIKV and microcephaly is confirmed, it is possible that putative viral genetic determinants of disease will be found among the amino acid changes that occur on the ZIKV phylogeny branches ancestral to the French Polynesian and American ZIKV lineages (i.e., the two lineages associated with reports of microcephaly, Guillain-Barré syndrome, and congenital abnormalities) (37). Phylogenetic character mapping using parsimony reveals 11 amino acid changes on the four internal branches (Fig. 2, asterisks; and fig. S3) leading to these two lineages. We identified the structures of homologous proteins most closely related to the ZIKV proteins (supplementary materials section 7) and used them to map 7 of the 11 amino acid changes, in a structural context, to five proteins: the pr-peptide region of prM [changes Val123→Ala123 (V123A) and S139N (S, Ser; N, Asn)], NS1 (A982V), the RNA helicase [NS3; N1902H and Y2086H (H, His; Y, Tyr)], the FtsJ-like methyl transferase domain [NS5; M2634V (M, Met)], and the thumb domain of RNA-directed RNA polymerase (NS5; M3392V) (fig. S7). None of these mutations are predicted to substantially affect the physicochemical properties of the protein environment, except possibly Y2086H (in the helicase; fig. S8), which may increase the hydrophilicity of the region. The remaining four amino acid changes could not be accurately mapped due to the absence of suitable related x-ray structures (supplementary materials section 7). Notably, none of the observed changes map to the E glycoprotein ectodomain, the primary target of humoral immune responses against flaviviruses (38, 39). Factors other than viral genetic differences may be important for the proposed pathogenesis of ZIKV; hypothesized factors include coinfection with chikungunya virus (40), previous infection with dengue virus (41), or differences in human genetic predisposition to disease.

Besides vector-borne and mother-to-child transmission, Zika virus may also spread via sexual contact (42, 43) and blood transfusion (44). The evidence of ZIKV in blood donors raises the possibility of ZIKV transmission through transfusion and indicates that it may be prudent to consider the screening of blood donors.

Supplementary Materials www.sciencemag.org/content/352/6283/345/suppl/DC1 Materials and Methods Supplementary Text Figs. S1 to S8 Tables S1 to S5 References (46–78)

Acknowledgments: We thank X. de Lamballerie and J. Lednicky for permission to include their unpublished ZIKV genomes in our analysis. We thank the Death Verification Service (SVO); Central Laboratories of Public Health (LACEN); and health departments of the Ceará State and Maranhão State, Brazil, for collaboration. O.G.P. is supported by the European Research Council (ERC) under the European Union’s Seventh Framework Programme (FP7/2007-2013)/ERC grant 614725-PATHPHYLODYN. J.L. is supported by the ERC under the European Union’s Seventh Framework Programme (FP7/2007-2013)/ERC grant 268904-DIVERSITY. O.G.P. received consulting fees from Metabiota between 2015 and 2016. This study is made possible in part by the generous support of the American people through the United States Agency for International Development (USAID) Emerging Pandemic Threats Program - PREDICT. The contents are the responsibility of the authors and do not necessarily reflect the views of USAID or the U.S. government. S.I.H. is funded by a Senior Research Fellowship from the Wellcome Trust (095066) and grants from the Bill and Melinda Gates Foundation (OPP1119467, OPP1093011, OPP1106023, and OPP1132415). M.R.T.N. is funded as an associated researcher in public health by the Evandro Chagas Institute, Brazilian Ministry of Health, and as a researcher in scientific productivity by CNPq (Brazilian National Council for Scientific and Technological Development) grants 302032/2011-8 and 200024/2015-9 and is also supported in part by the National Institute of Science and Technology for Viral Hemorrhagic Fevers. R.T. is funded by grant R24 AT 120942 from the U.S. NIH. S.C.H. is supported by a Wellcome Trust grant (102427). T.A.B. and I.R. are supported by grants from the UK Medical Research Council (MR/L009528/1) and the Wellcome Trust (090532/Z/09/Z). P.F.C.V. is supported by CNPq–National Agency for Scientific and Technologic Development (grants 573739/2008-0, 301641/2010-2, and 457664/2013-4). All samples were obtained from persons visiting local clinics or hospitalized by the Brazilian Ministry of Health personnel as part of dengue, chikungunya, and Zika fever surveillance activities. In these cases, patient consent was oral and not recorded. The study was authorized by the Coordination of the National Program for Dengue, Chikungunya, and Zika Control coordinated by Brazil’s Ministry of Health. The data are available at DRYAD (DOI: 10.5061/dryad.6kn23). The ZIKV genomes reported in this study are deposited in GenBank under accession numbers KU321639, KU365777 to KU365780, KU729217, and KU729218.