Ethics statement

Research was approved by the Stanford University Administrative Panel on Laboratory Animal Care (protocol 26920) and conducted under the appropriate Costa Rican permits (RT-044-2015-OT-CONAGEBIO, RT-042-2015-OT-CONAGEBIO, 121-2012-SINAC, RT-019-2013-OT-CONAGEBIO, 226-2012-SINAC).

In order to ascertain broader patterns of spillover and Bartonella transmission between species, sequences were downloaded from Genbank on 30 November 2016 using the search term “Bartonella gltA” and again on 21 May 2018 using the search terms “Bartonella” AND (“gltA” OR “GltA” OR “GLTA” OR “glta” OR “citrate synthase”) and limiting the search to sequences uploaded in the previous 900 days in order to update our dataset with the most recently published sequences. A separate search was conducted using the search term “Bartonella 16s” on 1 February 2017. Insect microbiome studies that detected Bartonella were also used [9–15]. Bartonella from Costa Rican bats in a mosaic agricultural landscape, including previously published [16] and 67 new sequences (isolated as in Judson et al. [16]) are also incorporated in this study (Genbank accession numbers MH234314 –MH234380). Metadata were downloaded from Genbank and/ or confirmed by examining the cited publication and are summarized in S1 and S2 Files. When data in Genbank were not associated with a publication, geography was inferred by the host range and/or title information in Genbank. The host of questing ticks was undetermined and therefore denoted as “unknown.” In some cases, genomes of Bartonella strains were published independently from their hosts; in this case we searched other literature to find the source of the strain. Sequences that were not in fact Bartonella gltA were removed manually and sequences were aligned using the Geneious alignment algorithm and refined using MUSCLE in Geneious (version 8.1.9 [17]). Sequences that were significantly redundant (or multiple sequences of the same species of Bartonella) were excluded to reduce the size of the resultant phylogenies. This was especially the case for the published genomes of named Bartonella species; many of these sequences lacked data on when and where they were isolated and were therefore excluded from our analysis. We also excluded some fragments that were too short or had substantial missing data within the alignments, as well as fragments which misaligned significantly at the ends, causing us to doubt the quality of these end base calls. Sequences that contained one or two base pair deletions not found in other sequences were also eliminated as we doubted the quality of the sequences. Sequences with deletions in multiples of three base pairs were retained as these likely represent actual deletions of amino acids. Alignments were manually inspected and corrected. Two alignments were produced, one of 540 bp and one of 277 bp. The first contained 677 sequences in total and the second included 1,060 unique sequences.

In order to test for patterns in host specificity and biogeography we also constructed Bayesian phylogenies using BEAST 2 [18] for the 540 bp fragment and the 277 bp fragment. Alignments were split into three partitions based on the base pair’s position in the codon and run in PartitionFinder to determine the best nucleotide substitution models using AICc [19]. These parameters were then used to configure the parameters for the BEAST 2 run. For both the 540 bp and 277 bp run, PartitionFinder determined that all three positions should be run under the same mode, a GTR+I+G model. As empirical and maximum likelihood estimated base frequencies usually have little impact on the phylogeny, we used observed base frequencies for both sets of nucleotides [19].

We tested two different models for the phylogenetic hypothesis based on the 540 bp fragment. Both analyses were run with a gamma site model with empirical base frequencies, an estimated proportion of invariant sites and all nucleotide transition/transversion frequencies except the CT transition rate estimated. The gamma shape prior was set to an exponential distribution with a mean of 1; the proportion of invariant sites was set to a uniform distribution between 0 and 1; all nucleotide substitution rates were set to a gamma distribution with an alpha of 2 and a beta of 0.5 or 0.25 for transitions and transversions respectively. In all cases Bartonella was constrained to be monophyletic with Brucella melitensis as an outgroup. The first model tested was a strict clock model with a constant population size coalescent model with vague priors as has been used for previous phylogenetic analyses of Bartonella [20,21] with the population size prior set to a 1/X distribution. The second was a birth death model run with a log-normal distributed relaxed molecular clock. The birth rate prior was set to a uniform distribution between 0 and 10,000; the relaxed clock mean prior was set to a uniform distribution between negative infinity and infinity; the relaxed clock standard deviation prior was set to an exponential distribution with a mean of 1; the death rate prior was set to a uniform distribution between 0 and 1. Additionally a clade of Artibeus lituratus and Artibeus watsoni-associated Bartonella was estimated to have evolved at the divergence of the two bat species (KJ816682, MH234319, MH234329, MH234330) and the prior distribution was estimated with a log-normal distribution with a mean of 8.5 mya (SD = 2.73) based on previous estimates [22–27] collated in TimeTree [28]. We used this clade as a calibration point as it was strongly supported in all of our initial analyses, regardless of model and was nested within other Central American bat-associated strains and therefore unlikely to have been impacted by human influence.

For the 277 bp tree we ran our simulations with a GTR distribution, an estimated proportion of invariant sites and a gamma distribution of rates. We tested two models, a strict clock, constant population size coalescent model as described in the first model for the 540 bp alignment and a birth death model with a relaxed log normal clock as described in the second model for the 540 bp alignment. In both models we constrained Brucella melitensis to be an outgroup but no other calibrations were included. All gltA model were run for 2.5 x 107 generations and sampled every 50,000 generations.

All gltA models converged with all parameters showing an effective sample size (ESS) over 100 (with the exception of the inferred relative death rate in the relaxed clock model of the 540bp alignment) and most showing an ESS over 200. The two models for the 540bp alignment were compared using AICM of the likelihood [29] with 1,000 bootstraps implemented through Tracer as model comparison using path sampling was not practical. For the 540bp alignment the best model was the second–a relaxed log normal clock calibrated with host divergence dates (dAICM = 356.48). For the 277bp alignment a strict clock was favored over a relaxed clock (dAICM = 441.86). Maximum clade credibility (MCC) trees were produced using TreeAnnotator, mean heights and a burn in of 10%.

In order to understand the evolutionary origin of Bartonella we constructed a phylogeny using sequences from the 16s rRNA gene. All 450 sequences were aligned and trimmed to the same length (259 bp) in Geneious. We constructed a phylogenetic hypothesis in BEAST 2 using a strict clock and a birth death model with vague priors as described in the birth death models for the gltA genes with Rhizobium leguminosarum as an outgroup. The model was run for 107 generations; most ESS were above 300, though the birth rate and death rate ESS were roughly 100. As we were not concerned with speciation dynamics but rather broad topology, we considered this hypothesis to be sufficiently sampled.

The 277 bp gltA MCC tree was used in an analysis of host specificity and geographic conservation between related Bartonella species. Many nodes did not have good support so we conducted all analyses using only nodes with a Bayesian posterior probability of the likelihood of 0.7 or above. Using the fitDiscrete function in geiger [30], four models of discrete character evolution were fit—one using a lambda transformation, one using a white noise transformation, one using an early burst transformation and one using no transformation to model the evolution of host order (with strains isolated from ectoparasites assigned to the ectoparasite’s host) and broad geographic region of isolation both by continent (all except Antarctica) and by Old World versus New World. Fit of the models was assessed using AICc weights and log-likelihoods.

Host switches and sharing of clades between geographic regions was assessed by manually examining the MCC phylogenetic hypothesis based on a 277bp fragment of gltA, by examining the location of Genbank records with identical genotypes and by searching the literature for the distribution of named Bartonella species. A host switch or geographic shift was inferred so as to capture the minimum number of shifts with Bayesian posterior probabilities of at least 0.7. We also assessed shifts at posterior probabilities of the likelihood of at least 0.9 and 0.95 to ensure our results were robust regardless of our cut-off.

In inferring the influence of humans on Bartonella, we categorized genotypes or monophyletic clades as being found in one or more of the following categories: rats, other rodents (excluding rats), humans, domestic carnivores, wild carnivores, domestic artiodactyls, wild artiodactyls, shrews, and bats. We also noted other wild animals that were rarer in our dataset (e.g. pikas, wild hares and wild primates). Genotypes found in rats, domestic animals and humans were categorized as “human-associated” and the rest as “non-human-associated”. Human-associated Bartonella in this study does not necessarily mean the genotype has been found in a human but rather in a human or a commensal animal or an ectoparasite on a human or commensal animal. Sometimes the metadata contained within Genbank and the publications were insufficient to allow us to distinguish the exact species from which the Bartonella was isolated. All genotypes isolated from Rattus sp., their ectoparasites or an organism denoted “rat” were counted as human-associated rats. Although many species of rodents are commensal with humans (e.g. [31]), all other rodents were counted as not being human-associated to avoid the need to categorize over 100 rodent species, as well as incorporate rodents for which the species was unknown (N = 50). This division therefore renders our analyses of human-associated strains conservative.

Additionally, we determined all instances of Bartonella transferring between host orders represented in our dated, 540bp phylogenetic hypothesis to determine the ages of such transfers and test whether transfers to humans were more recent than other transfers. For these analyses we only used clades with at least 0.7 posterior probability support, which meant that a few (3) clades were assessed as older than they may actually be. In each of these cases the transfer involved a genotype found in a human, meaning our hypothesis testing is conservative. Difference between zoonotic transfers and other transfers was assessed using one-tailed t-tests assuming unequal variance on the inferred node age and lower bound of the 95% highest probability density to account for uncertainty in dating of the nodes.

All alignments, metadata and R code are available in the supporting information.