We developed probes targeting 605 putative single copy nuclear regions of Ipomoea (see Data S1 ) through comparison of genomic data from I. lacunosa and coding sequence (CDS) of Solanum tuberosum. Regions between Ipomoea and Solanum with a one-to-one match at 70% identity along at least half the length of a Solanum CDS were filtered to retain Ipomoea loci that were at least 1000 bp. Along these loci, 100 bp RNA probes were developed by MycroArray (Ann Arbor, MI), excluding probes with GC content < 25%. We also obtained the whole chloroplast genome of all specimens.

We extracted DNA from fresh material using CTAB method [], and from herbarium specimens using the Plant Tissue Mini protocol for QIAGEN DNEasy Plant Mini Kit (QIAGEN). We created genomic libraries using the NEBNext Ultra DNA Library Prep Kit for Illumina v.3.0. (New England BioLabs).

Banks & Solander’s specimen was sequenced using the MiSeq and 25bp paired reads, instead of target enrichment. We evaluated the degree of DNA damage in this specimen using mapDamage 2.0 [] and found no signs of damage different from levels found in other herbarium specimens (see Data S2 ).

We implemented target enrichment using MYBaits [] to capture nuclear regions of interest, following the protocol described in [] and using Beckman Coulter Agentcourt AMPure XP for product purification. We sequenced a 1:1 mixture of target enriched and unenriched libraries, in order to obtain the chloroplast and nuclear ribosomal Internal Transcribed Spacer (rDNA ITS) region with genome skimming []. Sequencing was conducted using the Illumina HiSeq 3000 at the Center for Genome Research and Biocomputing, Oregon State University (Corvallis, United States). Sequences were trimmed for Illumina adapters and for quality, Q15 on the left and Q10 on the right of the reads. 100 bp paired reads were obtained.

We conducted a three-stage assembly process: first we generated draft gene assemblies with YASRA [] that served as target regions in a second assembly run using PRICE []. We finally implemented SSPACE [] to extend the gene assemblies. Final assembled contigs were aligned back to the reference sequences using BLASTN [] to target assembled contig assignments.

We collected information on ploidy levels of the species from the literature and from CIP. We aligned the nuclear raw reads back to the assembled contigs using Bowtie []. From this alignment, we created a variant call file that described the SNPs found within the alignment. We then ran Hapcompass [] to divide the assembled contig into haplotypes based on SNP phasing. We finally separated assembled contigs that show haplotype-defining SNPs into distinct contigs for downstream analysis. We ran a coalescent analysis using Astral-II [] considering independent alleles for all genes and samples, and found no significant intra-specimen variation ( Figure 2 B). We therefore conducted all subsequent phylogenetic analyses using consensus sequences.

We assembled the chloroplast genomes and the ITS region using SPAdes genome assembly algorithm [], using as reference the chloroplast genome of Ipomoea batatas cultivar Xushu18 [] and the full ITS fragment (including 5.8S region) of an I. batatas herbarium specimen (C. Whitefoord 71) previously sequenced using Sanger. Chloroplasts show the general structure in angiosperms, with one long single copy, one short single copy and two inverted repeats. Chloroplast size ranges from 160,382 to 174,715 base pairs, except for Ipomoea lactifera which presents several large deletions (150,628 base pairs).

The reads obtained using MySeq allowed us to target several fragments across the nuclear regions (1,016 reads mapped). We assembled into contigs only those read pairs where both reads matched the reference sequence at approximately the expected distance or those positions covered by at least three reads. We then aligned these fragments to all other specimens in this study and discarded all sites with ambiguous nucleotides, as well as all sites where only the Banks and Solander specimen incorporated indels. We finally retained 12,905 sites, 5,735 of which variable positions. We further explored DNA degradation in this specimen by calculating base percentages in these variable positions and found no differences compared with more recent material (see Data S1 ).

We aligned every nuclear region individually using L-INS-I strategy in MAFFT v7.271 [] (gap penalty = 1.53), and used default parameters in Gblocks [] to remove poorly aligned positions from the alignment. We estimated evolutionary models for each region using jModelTest 2 [] and obtained independent gene trees using default parameters in FastTree 2.1.9 []. In a dataset this large, neither intralocus recombination, incomplete lineage sorting (ILS) nor reticulation can be discounted []. Therefore, we ran multiple analyses to evaluate the effect of these processes. First, to reduce the possible effect of recombination, we ran the PHI statistical test [] to identify those regions in our dataset likely to contain recombination (see Data S1 ). We ran all subsequent analyses using two datasets in parallel: one including all 605 regions, and another including only the 307 regions that did not show evidence of recombination according to the PHI test. In addition, to explore the effect of ILS we ran phylogenetic analyses using both coalescent-based and concatenated methods. First, we used gene trees as input to infer the species tree using Astral II []. Second, using the concatenated alignments we conducted Approximate Maximum Likelihood as implemented in FastTree 2.1.9 [], and SVDQuartets [], a coalescent-based method available in PAUP 4.0 [] (800,000,000 random quartets). We ran FastTree analysis using the CIPRES Science Gateway [], and SVDQuartets using the supercomputer at University of Oxford Advanced Research Computing.

We generated three phylogenetic networks: one including all Ipomoea batatas and I. trifida specimens, another one including all species in the group, and the third one including all I. batatas specimens plus Banks and Solander (675, 1,051 and 522 segregating sites respectively). We used the Integer Neighbor-Joining method implemented in PopART (ε = 1) []. To further confirm our results, we ran independent phylogenetic analyses of the most variable regions of the chloroplast [] ( Figure S5 B). We also estimated pairwise distances (p-distance) between all sweet potato accessions using Mega 6.0 [].

To evaluate the robustness of the topology showing two sweet potato gene pools, we additionally produced an alternative topology enforcing sweet potato monophyly using RAxML []. We evaluated both topologies using the approximately unbiased test [] as implemented in IQ-Tree 1.5.0a [] (see Data S3 ).

We aligned the chloroplast genomes using FFT-NS-2 strategy in MAFFT [] (gap penalty = 1.53). The alignment was visually checked and minimal corrections were made in the poly-A and poly-T regions, only to minimize random alignment of these regions. We then used Gblocks [] to remove poorly aligned positions and jModelTest 2.1.7 [] to estimate the best substitution model for this alignment (GTR+I+G). We conducted Maximum Likelihood analysis using RAxML 8.0 [] as implemented in CIPRES [] (1,000 bootstrap replicates), and parsimony analysis using PAUP 4.0 [] (1,000,000 trees based on 1,294 parsimony-informative characters, best tree = 2,631 steps). We also performed a parsimony analysis of 282 parsimony informative indels in PAUP (100,000 trees, best tree = 975 steps), coding them as presence/absence [] using SeqState 1.4.1 [].

We randomly extracted 3,000 variable positions from the alignments of nuclear regions and used them as input for Structure [] with 150,000 MCMC replications and 100,000 burn-in repetitions, using an admixture model and assuming independent allele frequencies among populations (λ = 0.4469; K = 1–5; 3 runs). We also ran independent analyses with the same parameters using 16 variable positions from the alignment of ITS sequences (λ = 0.4605; K = 1–4; 3 runs), 522 variable positions from the chloroplast alignment (λ = 0.3081; K = 1–5; 3 runs), and 5,735 variable positions from the nuclear alignments including Banks and Solander specimen (λ = 0.3483; K = 1–5; 3 runs).

Divergence time estimation and population size

66 Höhna S.

Heath T.A.

Boussau B.

Landis M.J.

Ronquist F.

Huelsenbeck J.P. Probabilistic graphical model representation in phylogenetics. 67 Höhna S.

Landis M.J.

Heath T.A.

Boussau B.

Lartillot N.

Moore B.R.

Huelsenbeck J.P.

Ronquist F. RevBayes: bayesian phylogenetic inference using graphical models and an interactive model-specification language. 89 Magallón S.

Gómez-Acevedo S.

Sánchez-Reyes L.L.

Hernández-Hernández T. A metacalibrated time-tree documents the early rise of flowering plant phylogenetic diversity. We implemented divergence time estimation in RevBayes [], a graphical modeling framework enabling highly flexible model specification. Because of a lack of previous divergence time estimates in Convolvulaceae, we constructed a supermatrix of three chloroplast genes (matK, rbcL, atpB), the chloroplast trnL-trnF intergenic spacer, and the nuclear ribosomal ITS region which incorporates a balanced sample of taxa from across both Convolvulaceae and its sister family Solanaceae (passport data in Data S1 ). This matrix covers a sufficiently broad phylogenetic scale to enable the implementation of temporal calibrations. In our analyses, we used a single normally distributed calibration (mean = 67.34 million years, standard deviation = 9.980 million years) for the divergence between Convolvulaceae and Solanaceae. This calibration age is derived from a previous study which simultaneously implements 132 fossil calibrations across angiosperms []. This calibration is likely to represent an underestimation of the true age of the divergence between the two families because many of the 132 fossils that were used are likely to be significantly younger than the true age of the node which they were used to calibrate. In turn, this is likely to result in the age estimates inferred in this study to be biased toward younger ages. Despite this apparent limitation, we believe this approach is appropriate for the purposes of our study—namely to infer whether the origin of the sweet potato occurred in pre-human times.

90 Särkinen T.

Bohs L.

Olmstead R.G.

Knapp S. A phylogenetic framework for evolutionary study of the nightshades (Solanaceae): a dated 1000-tip tree. 91 Wilf P.

Carvalho M.R.

Gandolfo M.A.

Cúneo N.R. Eocene lantern fruits from Gondwanan Patagonia and the early origins of Solanaceae. The utility of our pragmatic calibration approach is further highlighted by recent work which demonstrates apparent conflict within the Solanaceae fossil record (the closest fossil relatives to Ipomoea) []. Although our approach was useful for the purposes of this study, extreme caution should be taken if using dates inferred in this study as secondary calibrations in future studies which aim to answer different questions.

We used this matrix and age calibration to infer a time-calibrated phylogeny for Convolvulaceae and Solanaceae. A GTR+I+G model of DNA substitution was implemented, and branch-specific substitution rates were inferred using an uncorrelated lognormal relaxed clock with a standard deviation 0.2972 (corresponding to 0.5 orders of magnitude). We partitioned the supermatrix such that separate parameters for nucleotide substitution and branch-specific substitution rates were inferred for the chloroplast and ITS data. A constant rate birth-death branching process was implemented as the time prior in this analysis.

A matrix containing samples from throughout Ipomoea based on 21 nuclear genes for which there was high coverage (99%) was then used to infer divergence times within the genus such as the crown nodes for Ipomoea series Batatas and the Tuboides clade. A GTR+G+I model was implemented, and branch-specific substitution rates were inferred using an uncorrelated lognormal relaxed clock with a standard deviation 0.2972. A single set of parameters for nucleotide substitution and branch-specific substitution rates were estimated for the entire 21 gene matrix. We implemented a constant rate birth-death branching process as the time prior. The age for the root node of this tree is determined by the sampled ages for the equivalent node in the Convolvulaceae and Solanaceae time-calibrated phylogeny.

Based on the inferred ages for the crown node of Ipomoea series Batatas and the Tuboides group, we inferred three more time-calibrated phylogenies: two for series Batatas—one based on plastome data and one based on a matrix of the 21 nuclear genes for which there was 100% coverage, and one for the Tuboides group—based on the same 21 nuclear genes. In each of the three separate trees, we implemented a GTR+G+I model and inferred branch-specific rates of DNA substitution with an uncorrelated lognormal relaxed clock with a standard deviation of 0.2972. Neither the chloroplast plastome dataset nor the nuclear datasets were partitioned. Therefore, we estimated a single set of parameters for nucleotide substitution and branch-specific substitution rates for each of the three time-calibrated phylogenies.

38 Rannala B.

Yang Z. Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. We also conducted a multispecies coalescent analysis on all sequenced plastomes for Ipomoea batatas and I. trifida. We conducted this analysis to estimate effective population sizes for species and ancestral lineages within this clade [] (I. batatas lineage 1, I. batatas lineage 2, I. trifida) and to infer when potential population bottlenecks associated with the origin of this crop are likely to have occurred. Of particular interest was whether a bottleneck is associated with the population in which chloroplast capture may have occurred (in the chloroplast phylogeny inferred in this study, this corresponds to the ancestral lineage of I. trifida and I. batatas lineage 2). In this analysis, we used fixed species and gene tree topologies in accordance with those inferred in phylogenetic analyses in this study. Specifically, I. batatas lineage 2 is designated as the sister taxon of I. trifida. A GTR+G+I model of sequence evolution was implemented, and overall rates of sequence evolution were assumed to be constant among different branches of the gene tree. Effective population sizes on the species tree were assigned an exponential prior distribution with a rate parameter of 0.1, and the species tree (three taxa) was assumed to evolve under a constant rate of speciation and extinction. The age for the root node of the species tree was determined by the sampled ages for the equivalent node in the time-calibrated phylogeny for Ipomoea series Batatas inferred from the plastome dataset.