Significance The indigenous populations of the Brazilian coast were decimated by European conquerors and declared extinct by the 18th century. The disappearance of these populations created a gap in the understanding of South American settlement. The present study rescues the genome of an extinct coastal lineage of the Tupí branch through the examination of a small, admixed, self-reported Native American community. Our results suggest that genetic lineages representative of the Tupí peoples who inhabited the coast survived in this specific extant population. We also show the relationships among Coastal, Amazonian, and ancient Brazilian populations and elucidate the putative migratory routes used by Amazonian peoples between the Amazon and the Atlantic coast ∼2,000 y ago.

Abstract In the 15th century, ∼900,000 Native Americans, mostly Tupí speakers, lived on the Brazilian coast. By the end of the 18th century, the coastal native populations were declared extinct. The Tupí arrived on the east coast after leaving the Amazonian basin ∼2,000 y before present; however, there is no consensus on how this migration occurred: toward the northern Amazon and then directly to the Atlantic coast, or heading south into the continent and then migrating to the coast. Here we leveraged genomic data from one of the last remaining putative representatives of the Tupí coastal branch, a small, admixed, self-reported Tupiniquim community, as well as data of a Guaraní Mbyá native population from Southern Brazil and of three other native populations from the Amazonian region. We demonstrated that the Tupiniquim Native American ancestry is not related to any extant Brazilian Native American population already studied, and thus they could be considered the only living representatives of the extinct Tupí branch that used to settle the Atlantic Coast of Brazil. Furthermore, these data show evidence of a direct migration from Amazon to the Northeast Coast in pre-Columbian time, giving rise to the Tupí Coastal populations, and a single distinct migration southward that originated the Guaraní people from Brazil and Paraguay. This study elucidates the population dynamics and diversification of the Brazilian natives at a genomic level, which was made possible by recovering data from the Brazilian coastal population through the genomes of mestizo individuals.

In the 15th century, the Brazilian coast was densely populated by Native American populations. At that time, a total of 3 million indigenous individuals lived in the territory currently corresponding to Brazil, with about a third inhabiting its coast (1). The conquest of the Brazilian territory by the Portuguese (circa 1500) led to a rapid decline of the coastal native populations, culminating in their extinction by the end of the 18th century (2). This massive depopulation completely changed the distribution of the Native American populations within Brazil, delimiting their territory to the Amazon region and the inland. At present there are just two small admixed communities self-reported as coastal Tupí (Tupiniquim and Tupinambá) living in Brazil; however, they do not speak any indigenous language.

When the Portuguese first arrived in South America, the Tupiniquim and Tupinambá, both originally Tupí speakers, were the dominant groups in the Brazilian Atlantic Coast (2). It is not clear how the Tupí speakers arrived on the east coast after they left the Amazonian basin. The origins of the Proto-Tupí (Amazonian, southern, and coastal Tupí ancestrals) dates back to possibly 5,000 y before present (YBP) in the Northwest Amazon (ref. 3 and references therein). More than 2,000 YBP, different Tupí populations expanded from this region over 4,000 km eastward and southward, respectively peopling the Atlantic coast and the western Brazilian inland. They expanded to most of the South American lowlands during the late Holocene epoch, becoming one of the most populous and diverse linguistic families (with >35 languages still spoken). The Tupí expansion is comparable in importance to the Bantu expansion in Africa; however, relatively little is known about the event. There is no consensus in the literature regarding linguistic expansion models for the Tupí family (4, 5). Genetic studies based on uniparental markers are consistent with linguistic data indicating that northwestern Amazon was the center of diversification of the Tupí (3, 6), but they do not define any clear route of expansion, mainly due to lack of data from coastal populations. The causes of expansion are also unknown, and could have involved ecological adaptation or cultural issues (7). The Tupí-Guaraní branch (which includes coastal and southern Tupí groups) has assumed an expansionist character over the last 2,000 to 3,000 y, populating the Brazilian southwest, northeast, and entire coast, distinguishing them from the other Tupí speakers. On the basis primarily of archeological and linguistic evidence (2, 8), two main broad and contrasting hypotheses regarding the settlement of the Brazilian coast by the Tupí groups can be distinguished in the literature (Fig. 1). The first proposes that the Tupí from the Brazilian coast reached this region after coming from southwest Brazil, deriving from the same Tupí-Guaraní branch of Guaraní populations (9, 10) (blue arrow in Fig. 1). This hypothesis (10, 11) is based on archaeological data, linguistic analysis, and paleoenvironmental data, and associates the Tupí expansion with forest reductions that would have occurred during the Holocene. In this context, changes in vegetation would have forced these nonceramicist, preagriculturalist populations to seek new subsistence niches. Although these forest refuges were located both to the south and the east, linguistic data suggest that the most likely migration route to the Atlantic coast would have been through Brazil’s western border, and then to the east shore. The alternative hypothesis assumes that one branch of Tupí moved first eastward, reaching the coast, and then southward along the coast, originating the coastal Tupí, whereas the other branch went southward, originating the Guaraní people (12) (red arrows in Fig. 1). According to this interpretation, the Proto-Tupí were already agriculturalists and ceramists, and the reason for their expansion was likely the demographic pressure caused by a continuous increase in population, which forced them to disperse in search of new lands to cultivate. This proposition (12) is motivated by the independent mode and evolution of Guaraní and Tupinambá potteries from the Amazonian Polychrome Tradition of Proto-Tupí speakers (characterized by the use of red and black paint on a white engobe). Tupinambá pottery is only found in the northeast Amazon and along the Brazilian coast to the Tropic of Capricorn, while Guarani pottery has been found from southern Amazon to northern Argentina, Paraguay, and southern Brazil.

Fig. 1. Tupí Expansion hypotheses. Two main contrasting broad hypotheses can be recognized from literature (2, 8), which try to explain the Tupí Expansion. In hypothesis 1, the coastal Tupí would have derived from Guaraní populations in the south, which would have arrived there expanding southward from the Amazon Basin, here represented by the blue arrow. Conversely, hypothesis 2 postulates that the coastal Tupí and the Guaraní populations would have been originated in two separated expansions, with the former expanding eastward along the coast from the Amazon River mouth, the latter southward from the Amazon, here indicated by the two red arrows.

To reconstruct the history of the Tupí, we generated genomic data for the last remaining putative representatives of the Tupí coastal branch, a small admixed self-reported community of Tupiniquim people; for a Guaraní native population from Southern Brazil; and for three native populations from the Amazonian region. We investigated their genetic origins and demonstrated that the Tupiniquim Native American ancestry is not related to any extant Brazilian Native American population for which genetic data have been generated to date. Therefore, we infer that the Tupiniquim are the only living representatives of this extinct Tupí branch that was settled along the Brazilian Atlantic Coast at the arrival of the Europeans. Leveraging genomic information of the coastal Tupí branch retrieved from these admixed individuals, we elucidated the pre-Columbian dispersion of the Tupí-stock from the Amazon to Southern Brazil and to the coast, finding evidence of two migrations: there was a direct migration from the Amazon to the coast, which originated the Tupí coastal populations, and a single distinct migration to south that originated the Guaraní people from Brazil and Paraguay. We further showed the existence of genetic continuity within Brazil when comparing ancient and modern individuals. The intensity of this continuity changed when the linguistic groups split and became structured, around 6,000 YBP, producing specific patterns of shared ancestry.

Materials and Methods To investigate the admixture events, we applied Rolloff (19) and TRACTS software (20, 21). Using AdmixTools (19) we computed F 3 , outgroup-F 3 , d-statistics, and F 4 , clustering the Tupiniquim as a population and treating them as separate individuals in the calculation in some of the analyses. For these analyses, datasets v [47 Tupiniquim (masked) + 48 Guaraní Mbyá + 48 Native Americans (13) + 7 newly genotyped Native Americans + HGDP], vi [1 Tupiniquim (ID: 2004) + 4 Guaraní Mbyá (IDs: 3001, 3036, 3038, 3051) + 48 Native Americans (13) + 7 newly genotyped Native Americans + HGDP], ix [47 Tupiniquim (masked) + 48 Guaraní Mbyá +48 Native Americans (13) + 7 newly genotyped Native Americans + HGDP +15 Ancient DNA samples (15)], and x [1 Tupiniquim (ID: 2004) + 4 Guaraní Mbyá (IDs: 3001, 3036, 3038, 3051) + 48 Native Americans (13) + 7 newly genotyped Native Americans + HGDP +15 Ancient DNA samples (15) + Anzick-1 Clovis Culture associated ancient DNA (14); SI Appendix, Table S3] were used. We calculated F ST and F 2 for all pairs of populations (SI Appendix, Table S3: datasets v and vi), to shed light on the relations between these populations and to pinpoint where the Tupiniquim fit within the Native American groups. Matrices containing pairwise genetic distance values were produced using R scripts (https://github.com/BenjaminPeter/cph_course/blob/master/scripts/analysis.R) and plotted as Neighbor-Joining trees using R packages ape and ggtree (30, 31) to provide models for the history of population splits between these populations. We also used Treemix (27) to estimate the Maximum Likelihood tree and fit putative admixture events. For a subset of populations that included all Tupí (SI Appendix, Table S3: datasets v and vi), we tested the fit between empirical data and the pairwise F ST and F 2 NJ trees, along with the Maximum Likelihood trees produced with Treemix, using AdmixTools (19). Finally, we tried to explicitly model the two main Tupí Expansion hypotheses (2, 8), producing several models for each hypothesis with different populations, repositioning the Tupiniquim in the trees (again using datasets v and vi; SI Appendix, Table S3). Model fit was assessed by the differences between estimated and expected F-statistics values. Models with |Z| < 3 for all (or almost all) differences were considered to present a good fit to the data. Ancestry-specific Effective Population Size (N e ) history was reconstructed for both the Tupiniquim and the Guaraní Mbyá (SI Appendix, Table S3; datasets vii [47 Tupiniquim (unmasked) + 48 Guaraní Mbyá + Sub-Saharan Africans, Europeans, and East Asians (1000 Genomes Project)] and xi [48 Guaraní Mbyá + Peruvians from Lima, Sub-Saharan Africans and Europeans (1000 Genomes Project)], respectively). First phasing was done with Beagle v.5 (32), and IBD segment estimation with RefinedIBD (33) and Local Ancestry Inference implemented with RFMix (34). Finally, IBDNe (25) was used to estimate ancestry-specific N e from the estimated IBD segments and the ancestry blocks identified through the Local Ancestry Inference. ROH were identified using PLINK v1.9 (35) with a minimum length of 500 Kb, using a sliding window of 50 SNPs, a maximum gap of 100 Kb between consecutive SNPs, a proportion of 5% overlapping windows forming homozygous segments, and an SNP density of at least one per 50 Kb. A complete description of sampling, genotyping strategies, dataset assembly, quality control procedures, and methods is included in the SI Appendix. Ethical approval for sample collection was provided by the Brazilian National Ethics Commission (CONEP Resolution no. 123 and 4599). CONEP also approved the oral consent procedure and the use of these samples in studies of population history and human evolution. Individual and/or tribal informed oral consents were obtained from participants who were not able to read or write. All sampling was coordinated by coauthors of this study (F.M.S. and J.G.M.) and their collaborators, in a manner consistent with the Helsinki Declaration and Brazilian laws and regulations applicable at the time of sampling. Logistical support for the sample collection was provided by the Fundação Nacional do Índio. The results of this study were discussed with the participating communities. A description of the sampling and genotyping strategies, along with the dataset assembly and quality control procedures is included in the SI Appendix. Our dataset has been deposited at the European Genome-phenome Archive, which is hosted by the European Bioinformatics Institute (EBI) and the Centre for Genomic Regulation (CRG), under accession number EGAS00001004036. The informed consent associated with these samples is restricted to population history/evolutionary analyses. The data will be available to researchers who sign the Data Access Agreement with the Data Access Committee on the European Genome-phenome Archive website.

Acknowledgments We thank Rui Sérgio Sereni Murrieta and André Menezes Strauss for their helpful comments on the historical and archeological data. We are also grateful to Regina Cália Mingroni Netto and Lilian Kimura for laboratory assistance and technical support. Finally, we would like to thank all the native communities who participated in the study without whom this work would not have been possible. M.A.C.e.S was supported by Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP) (2018/013716; 2015/26875-9) and K.N was funded by Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) (PNPD/1645581); NIH (R01 GM075091).

Footnotes Author contributions: T.H. designed research; M.A.C.e.S., D.C., and T.H. performed research; A.M.-S., J.E.K., J.G.M., F.M.S., M.C.B., A.d.C.P., and T.H. contributed new reagents/analytic tools; J.G.M. and F.M.S. collected the biological data; M.A.C.e.S., K.N., and R.B.L. analyzed data; and M.A.C.e.S. and T.H. wrote the paper with contributions from C.E.G.A., M.C.B., A.d.C.P., and D.C.

The authors declare no competing interest.

This article is a PNAS Direct Submission.

Data deposition: The newly genotyped datasets reported in this paper have been deposited in the European Genome-phenome Archive and are available for download under accession no. EGAS00001004036.

This article contains supporting information online at https://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1909075117/-/DCSupplemental.