Closely related pathogens may differ dramatically in host range, but the molecular, genetic, and evolutionary basis for these differences remains unclear. In many Gram- negative bacteria, including the phytopathogen Pseudomonas syringae, type III effectors (TTEs) are essential for pathogenicity, instrumental in structuring host range, and exhibit wide diversity between strains. To capture the dynamic nature of virulence gene repertoires across P. syringae, we screened 11 diverse strains for novel TTE families and coupled this nearly saturating screen with the sequencing and assembly of 14 phylogenetically diverse isolates from a broad collection of diseased host plants. TTE repertoires vary dramatically in size and content across all P. syringae clades; surprisingly few TTEs are conserved and present in all strains. Those that are likely provide basal requirements for pathogenicity. We demonstrate that functional divergence within one conserved locus, hopM1, leads to dramatic differences in pathogenicity, and we demonstrate that phylogenetics-informed mutagenesis can be used to identify functionally critical residues of TTEs. The dynamism of the TTE repertoire is mirrored by diversity in pathways affecting the synthesis of secreted phytotoxins, highlighting the likely role of both types of virulence factors in determination of host range. We used these 14 draft genome sequences, plus five additional genome sequences previously reported, to identify the core genome for P. syringae and we compared this core to that of two closely related non-pathogenic pseudomonad species. These data revealed the recent acquisition of a 1 Mb megaplasmid by a sub-clade of cucumber pathogens. This megaplasmid encodes a type IV secretion system and a diverse set of unknown proteins, which dramatically increases both the genomic content of these strains and the pan-genome of the species.

Breakthroughs in genomics have unleashed a new suite of tools for studying the genetic bases of phenotypic differences across diverse bacterial isolates. Here, we analyze 19 genomes of P. syringae, a pathogen of many crop species, to reveal the genetic changes underlying differences in virulence across host plants ranging from rice to maple trees. Surprisingly, a pair of strains diverged dramatically via the acquisition of a 1 Mb megaplasmid, which constitutes roughly 14% of the genome. Novel plasmids and horizontal genetic exchange have contributed extensively to species-wide diversification. Type III effector proteins are essential for pathogenicity, exhibit wide diversity between strains and are present in distinct higher-level patterns across the species. Furthermore, we use sequence comparisons within an evolutionary context to identify functional changes in multiple virulence genes. Overall, our data provide a unique overview of evolutionary pressures within P. syringae and an important resource for the phytopathogen research community.

We provide a phylogenetically comprehensive genomic view of P. syringae with a focus on TTE repertoire evolution. We analyzed data from draft or complete genome sequences of 19 diverse isolates, including 14 new draft genome sequences. We couple these genome sequences with a functional screen to identify new TTE families from diverse strains. The TTE content within these strains, as well as the presence of other known pathogenesis-related genes, is volatile. We show that cost-efficient genome sequencing placed within a phylogenetic context provides a thorough and unique viewpoint into P. syringae evolution and sheds light on previously unrecognized evolutionary patterns and structural diversity for this important plant pathogen.

To date, genomics studies in P. syringae suggest that virulence mechanisms within this species are evolutionarily dynamic and have experienced strong selective pressures [3] , [36] . Complete genome sequences exist for three phylogenetically diverse P. syringae isolates representing MLST groups I, II, and III (Pto DC3000, P. syringae pv. syringae B728a (Psy), and P. syringae pv. phaseolicola 1448a (Pph), respectively; [26] – [28] . Recently, additional draft genome sequences were generated by either Roche/454 or Illumina sequencing technologies (for Pto T1, group I; Pta ATCC11528, pathovar tabaci; Psv NCPP3335, pathovar savastanoi; and multiple strains from pathovars aesculi and glycinea, all group III; [30] , [37] – [40] ), or a hybrid genome assembly pipeline utilizing both Illumina and Roche sequencing technologies (Por 1_6, pathovar oryzae, group IV; [41] ). The genomes of these P. syringae strains differ dramatically in gene and plasmid content and in the presence/absence of many virulence-related genes. Given that these strains represent only a fraction of the known diversity within Pseudomonas isolates, much of the phylogenetic, ecological, and host diversity for this plant pathogen remains unexplored.

The TTSS is not the sole determinant of virulence and host range for P. syringae; coordination of host physiological responses and metabolic pathways is also necessary for pathogen growth within host tissue [32] . Phytotoxins, which can be coordinately regulated with the TTSS, but secreted independently from the TTEs [33] , can disrupt host metabolism or act as mimics of plant hormones. Hence, they may replace or complement virulence functions of TTEs [34] . Indeed, manipulation of stomatal function by coronatine, a structural mimic of the plant hormone jasmonic acid, is essential for invasion of A. thaliana leaves by P. syringae pv. tomato (Pto) DC3000. However, coronatine also possesses independent virulence functions during the colonization of roots [35] . Therefore, pathogenesis of P. syringae on any given plant host species, results from both the absence of avirulence factors (an operational definition of TTEs that activate a host immune receptor) and the presence of multiple virulence factors acting coordinately to promote disease and to suppress host immune responses [4] , [7] , [13] .

Multiple screens, primarily within the three P. syringae strains with completely sequenced ‘gold standard’ genomes [3] , [21] – [30] , have suggested that the number of TTEs per genome ranges from ∼20 to 33, with the total number of validated TTE protein families ∼50. However, efforts to catalogue the TTE repertoires from various strains often fall short of capturing a complete picture. For instance, false negatives occur from lack of saturation in functional TTE screens or because sequence divergence confounds hybridization based methods. False positives also occur with hybridization methods when the gene sequences are present but contain frame-shifts or disruptions, or are only partial matches to the known TTE genes (i.e. chimeras [31] ). Most of these limitations are obviated by whole genome sequences, especially when combined with orthogonal functional methods to validate candidates as TTEs.

Isolates of P. syringae are subdivided into approximately 50 pathovars based upon host range and comparison with type strains [16] . These are further subdivided into races based upon differential ability among strains within a pathovar to grow and cause disease across host genotypes [17] . Recent multilocus sequence typing (MLST) segregated P. syringae pathovars into at least 5 distinct phylogenetic clades [6] , [16] , [18] , which largely mirror 9 genomospecies based on DNA hybridization [19] , [20] . While the selection pressures determining host range may be similar throughout the species, there has simply not been deep enough phenotypic sampling or sequencing of genomes across the species to uncover trends indicating evolutionary differentiation among the clades [4] .

A high degree of divergence among commonly investigated isolates makes it nearly impossible to pinpoint all the changes that lead to host differentiation or specialization at the present time. As a result, key questions, such as what determines the overall plasticity of host range, remain unanswered. Deep sampling of diverse genomes within a phylogenetic framework can reveal general evolutionary trends indicative of changes in lifestyle and allow for the identification of genetic changes that differentiate between strains that have recently undergone host range shifts [3] , [15] .

The host ranges of many P. syringae isolates or pathovars have not been thoroughly characterized. Research has largely focused on identifying the molecular basis of pathogenesis across three divergent strains with finished genome sequences and investigation of virulence mechanisms for a smattering of strains on a limited number of hosts [7] . These studies have shown that a Type III secretion system (TTSS), which acts like a molecular syringe to translocate a suite of type III effector (TTE) proteins into plant cells, is a key virulence determinant [7] – [9] . Once inside the plant cell, TTEs promote pathogenesis by disrupting and suppressing host defense responses at multiple levels [10] – [13] . TTEs can also be recognized by plant disease resistance proteins and recognition of a single effector is sufficient to trigger successful host immune response. However, the virulence functions of many TTEs are redundant, making these phenotypes potentially robust to host-mediated selection against single TTE genes [14] . Thus, host range is structured by the totality of a strain's TTE repertoire.

Pseudomonas syringae is a Gram-negative bacterial phytopathogen responsible for worldwide disease on many crop species [1] . Despite a collectively broad pathogenic range for the species, individual isolates of P. syringae display pathogenic potential on a limited set of plant species and either elicit immune responses, or simply fail to thrive on alternative species [2] – [4] . In addition to disease outbreaks, strains can be isolated as epiphytes from non-diseased plants as well as from multiple phases of the water cycle [5] , [6] . How species with varied lifestyles like P. syringae maintain the genomic flexibility required to survive across this broad range of ecologies is not known. It remains particularly unclear how evolutionary forces shape the pan-genome of this species, especially virulence-related genes.

One allele that was not recognized by Pto, AvrPto Pgy R4 , differs in only 2 residues from the recognized allele AvrPto Lac 107 ( Figure S16A ). We used phylogeny-directed mutagenesis to create AvrPto Pgy R4 I85M and AvrPto Pgy R4 S95G, recreating the conserved sequence of the recognized AvrPto orthologs in each case. We tested the ability of these alleles to be recognized by Pto. AvrPto Pgy R4 I85M was not recognized, while AvrPto Pgy R4 S95G triggered Pto-dependent cell death ( Figure 7C ). In a second assay for HR, we found that, while AvrPto Pgy R4 was inactive, AvrPto Pgy R4 S95G induced ion leakage to levels indistinguishable from AvrPto Pto DC3000 ( Figure 7E ). Glycine 95 is common to all the active AvrPto alleles and lies within the GINP loop, a region required for the AvrPto-Pto physical interaction and hence avirulence function [58] , [63] – [65] ; Figure S16C ). Recently, Glycine 95 has been shown to be required for recognition of AvrPto by Pto [66] . Both Serine 94 and Isoleucine 96 are required for avirulence [64] . Isoleucine 96 has been previously shown to tolerate mutation to valine (but not alanine) [64] . Accordingly, while both AvrPto Lac 107 and AvrPto Pgy R4 S95G contain valine at position 96, they both trigger avirulence. A more distantly related, non-recognized allele, AvrPto Pmo also contains a non-consensus arginine residue at position 95. Mutation of AvrPto Pmo R95G did not restore recognition ( Figure 7D ). Thus, other AvrPto Pmo -specific polymorphisms contribute to loss of recognition. The non-recognized ortholog AvrPto Pgy R4 retains its virulence function on tomato leaves that lack Pto function ( Figure 7F ), suggesting that the two amino acid differences that distinguish it from AvrPto alleles that are recognized are dispensable for virulence. A similar separation of AvrPto avirulence and virulence functions has been previously reported for missense mutations of the canonical AvrPto allele from Pto JL1065 [64] . The virulence effect of AvrPto Pgy R4 was consistently greater than that of AvrPto Pto C3000 ( Figure 7F ). This relative difference could be due to either residual avirulence of the Pto DC3000 ortholog dependent on glycine 95, or to uncharacterized residues polymorphic between AvrPto Pgy R4/Lac107 and AvrPto Pto C3000 .

( A ) Bayesian phylogeny for the AvrPto superfamily. Orthologs in red are recognized by Pto, orthologs in black are unrecognized, while orthologs in gray are untested. ( B ) Agrobacterium/N. benthamiana transient assay. Indicated AvrPto orthologs were co-expressed with Pto and symptoms are assessed at 4 days post innoculation. The Pto DC3000 AvrPto G2A mutant is a mislocalized negative control. ( C ) Mutation of S95G restores activity to the unrecognized ortholog AvrPto Pgy R4 . Mutation of M85I does not result in recognition of AvrPto Pgy R4 . ( D ) Mutation of R95G does not restore recognition of the AvrPto Pmo ortholog. ( E ) Ion leakage assay of Agrobacterium/N. Benthamiana transient inoculations. Error bars are one standard deviation. ( F ) Expression of either AvrPto Pto DC3000 or AvrPto Pgy R4 is sufficient to increase the virulence of Pto T1 on tomato plants incapable of recognizing AvrPto (tomato cultivar 76R prf3). Bars indicate growth at zero and four days after dip inoculation with 10 5 CFU/mL bacteria. Error bars are 2× standard error.

To show the utility of deep phylogenetic sequencing, we asked if a diverse collection of orthologs could be used to predict functional information about a TTE protein. AvrPto is a widely distributed and well-characterized TTE that interacts with a tomato host cellular target, the protein kinase Pto, in a well-defined manner that results in disease resistance mediated by Pto and the NB-LRR immune receptor Prf [58] , [59] . AvrPto also confers added virulence to P. syringae strains that lack it when assayed on pto or prf genotypes of tomato [60] . We assayed 10 AvrPto orthologs ( Figure S16A , Figure 7A ) for their ability to trigger Pto dependent HR using a standard assay in N. benthamiana [61] . As expected, AvrPto from Pto DC3000 and closely-related orthologs triggered Pto-dependent HR. More distant orthologs did not ( Figure 7B ). A negative control, AvrPto DC3000 with a G2A mutation, previously reported to be mislocalized [62] , failed to elicit HR in the presence of Pto. Expression of AvrPto orthologs and Pto was verified by Western blotting ( Figure S16B ).

We tested the virulence function of both of the diverged group I hopM1 variants, from Pto DC3000 and from Pmp using a previously published assay [55] . Briefly, a strain carrying a deletion of the Conserved Effector Locus (CEL) that eliminates hopM1 and avrE1 from Pto DC3000 displays attenuated disease symptoms and less growth on Arabidopsis. avrE1 and hopM1 are likely redundant for this virulence function [14] , [56] , [57] . We found that hopM1 Pmp , expressed from a constitutive promoter, did not complement the virulence defect of Pto DC3000 ΔCEL ( Figure 6C ), even though this effector is translocated ( Dataset S9 ). Therefore, allelic variants of hopM1 display functional divergence for virulence on tomato.

( A ) A genomic region including hopM1 was aligned for all group I P. syringae strains with draft genome sequences and pairwise nucleotide diversity values (π) were calculated in 25 bp sliding windows. ( B ) Phylogenies were constructed using Bayesian methods for both avrE1 and hopM1 for all publicly available alleles. Posterior probabilities are shown if support for nodes is <0.99. Color-coding of strain names represents phylogenetic group designation as described in Figure 1 . ( C ) schM/hopM1 from Pmp is unable to complement the virulence defect of Pto DC3000 ΔCEL in dip assays on Arabidopsis. Bars indicate mean growth at zero and four days after inoculation. Error bars are 2× standard error. Different letters indicate statistically significant differences (ANOVA, Tukey's HSD).

Our sequence diversity analysis showed that hopM1 has experienced unusual evolutionary dynamics ( Figure 5 ), especially within the group I strains. We aligned the largest contiguous genomic region (bordered by scaffold breaks) including hopM1 for all group I strains that contain a full-length hopM1 allele. We computed π values for the nucleotide sequence of this region for these strains. A small fraction of this genomic region, which includes hopM1 as well as fragments of the TTSS helper protein hrpW and the TTE chaperone shcE, displays inflated π values relative to the bordering regions ( Figure 6A ). Therefore, the observed inflation of nucleotide diversity for hopM1 ( Figure 5 ) is localized. The phylogenies of both TTSS linked TTEs avrE1 ( Figure 6B ) and hopAA1 (data not shown) match those created from MLST loci. However, the hopM1 phylogeny shows that a recombination event involving the hopM1 locus splits group I strains into two divergent groups (Pmp/Pan or Pto DC3000/Pla 106/Pto T1) ( Figure 6B , right, in green). This result underscores how localized homologous recombination of existing sequences can drive diversification and adaptation of P. syringae TTE repertoires [54] .

Pairwise nucleotide diversity values (π) were calculated between TTE genes shared across multiple strains within each P. syringae groups, color coded by phylogenetic group as in Figure 1 , and compared to (π) values for housekeeping gene fragments from these same strains. The solid line indicates a 1∶1 ratio of π values between housekeeping genes and shared TTE. Only effector families with high π values from within phylogenetic groups are labeled.

We compared values for π (pairwise nucleotide diversity) for shared TTE subfamilies within the three clades of P. syringae for which there are genome sequences from multiple strains ( Figure 5 ). As a baseline, we compared this value to the π values for the same groups of strains calculated from the concatenated MLST loci used to construct the overall phylogeny in Hwang et al. [16] . This metric requires accurate placement of TTEs into subfamilies ( [48] ; Materials and Methods ). Thus, there is an upper limit to π values for the shared TTEs because extremely divergent alleles will be placed into different subfamilies. A majority of the π values for the TTE subfamilies match, or are slightly higher than, values for the housekeeping genes, consistent with vertical inheritance or low levels of diversifying selection. However, there are numerous instances where diversity within a TTE subfamily far exceeds the diversity values for the housekeeping genes within the same comparison. Future work will show whether horizontal transfer or mutation has enabled functional diversification of these protein families.

We analyzed the relationships of virulence gene suites across strains by hierarchical clustering of strains with respect to the distributions of individual virulence genes and pathways (TTE, phytotoxin pathways, plant hormone mimics) ( Figure S13 ). Although we hoped to uncover novel associations between virulence genes, small sample sizes for numerous virulence genes provide little resolution to identify correlations that are independent of phylogeny (i.e. hopN, hopS, hopY) or known proximity on the chromosome (i.e. hopO and hopT). However, clustering of strains by their virulence gene repertories does highlight some interesting trends. Despite phylogenetic assignment to group I, both Pan and Pmp diverge from Pto T1/Pto DC3000/Pla 106 in their virulence gene repertoires. Such patterns reflect the classification of these strains within different genomospecies [20] . Indeed, Pan clusters more closely with group III strains, likely due to the presence of scaffolds and TTE's related to those found on the large virulence plasmid of Pph 1448a, as well as the pathway for the production of phaseolotoxin. This could signal underlying similarity in the virulence strategies of P. syringae pathogens of beans and kiwi. Likewise, two group III strains (Pta and Pla 107) cluster with group II strains based on virulence gene profiles, suggesting that there is a fundamental difference in virulence gene repertoire for these two strains compared to their group III relatives. Lastly, hierarchical clustering clearly demonstrates divergence in virulence genes suites between Ppi R6 and other group II strains.

We investigated the presence of pathways encoding the best understood P. syringae phytotoxins (coronatine, tabtoxin, syringolin, syringopeptin, syringomycin, phaseolotoxin), a gene (avrD) whose enzymatic product, syringolide, can cause a hypersensitive response on specific soybean genotypes [52] , and genes involved in production or modification of the plant hormones ethylene and auxin ( Figure 3 ). It should be noted, however, that allelic diversification within these pathways can lead to the production of slightly different toxins [53] . In most cases, pathways coding for toxins found together in Psy B728a (syringomycin, syringopeptin, syringolin) are found in the genomes of group II strains, with the exception of Ppi R6. In only one other case did a strain contain genes known to be involved in the production of multiple toxins (coronatine and tabtoxin in Por). Moreover, although all strains in all groups appear capable of producing the plant hormone auxin, only the group II strains and Pph 1448a lack a gene to modify auxin once it is made (iaaL). avrD is widespread throughout the phylogeny, although the functional significance of different avrD alleles remains unresolved [52] . Only Pgy R4 and Ppi R6 appear to be capable of ethylene production by known pathways. The relative wealth of phytotoxins and the reduced TTE suites of group II strains, compared to the other phylogenetic groups, suggest that the genetic basis of pathogenicity within this clade has diverged from the rest of the P. syringae species.

We investigated diversity for 35 TTE families ( Table S3 ) present within a majority of strains (>12) by calculating measurements of pairwise amino acid diversity among all known alleles ( Table S3 ). The most diverse TTE families are hopW, hopZ, avrB, hopAO, hopT, hopAB and hopF ( Table S3 ). The diversity values for hopW, hopT and hopAO are somewhat misleading, however, as these families contain alleles of vastly different lengths. We built phylogenies from protein sequences of the remaining diverse TTEs (hopAB, hopF and avrB) ( Figure S14 ) and note that similar analyses exist for hopZ [36] . The resulting phylogenies differ extensively from the phylogenies inferred from MLST sequences, hence we infer that these widely distributed TTE families are often lost, but can be regained by horizontal transfer of divergent alleles ( Figure S14 ). This could imply that these TTE families play important roles in virulence across a broad range of host species, and are thus often re-incorporated into a strain's TTE suite. But the dynamism of these families also suggests that they may be evolutionarily costly on certain hosts (again likely through host immune recognition) and are therefore lost at higher rates than other TTE families. In support of this model, we note that host disease resistance genes exist that recognize specific members of each of these four TTE families, and that specific members of each of these TTE families can confer virulence on particular hosts [2] , [50] , [51] .

TTE truncations and transposon disruptions are common across the P. syringae phylogeny ( Figure 4 ). However, in only two cases, a truncation of hopAA1 and a transposon disruption of hopAG1, are these events shared by a majority of strains within a clade. Similarly, in only three other cases did events occur that were shared between multiple strains. Conversely, of the 46 total TTE gene truncations or transposon disruptions we identified, 41 appear at the tips of our phylogeny. Given the rarity of “older” truncations and disruptions, and given that many of these altered TTEs are found undisturbed in closely related genomes, we believe TTE loss is recent in most cases. This is consistent with ongoing dynamism in host range determination, whether across plant species or within a species, driven by host immune recognition. Additionally, there is a distinctive proliferation of IS element disruptions amongst the group I strains. The other clades appear to display higher rates of disruption by truncation (mostly via frameshift mutations) than IS elements, with the exception of Pla 107 which possesses a relatively high number of TTE with IS element disruptions. This trend potentially reflects differences in the activity levels of clade specific transposases.

Parsimony was used to determine at which phylogenetic nodes TTE disruptions occurred according to the phylogeny in Figure 1 . The gene names of TTEs that are truncated are displayed in red, while those that are disrupted by insertion sequence elements are in blue. TTE disruption events that could not be phylogenetically placed, and presumably occurred only in one strain, are listed to the left of the phylogeny. Question marks next to the TTE name indicate that there is conflicting information concerning disruption or disruptions that could not be verified. We include a disruption of hopF2 from Pla 106 because, even though the TTE sequence is complete, there is a transposon disruption in the shcF chaperone. At the far right, TTE conservation was determined for each genome within P. syringae groups where multiple strains were sequenced (groups I, II, III). The graph displays the percentage of each strain's TTE repertoire shared with other sequenced strains within an MLST group. The X-axis of this graph displays the number of potential strains within each MLST group that share particular TTEs with the genome of interest (max of 5 for group I and 6 for groups II and III). The percentage of singleton TTEs (found only within the strain of interest) is at the far left side of the graph, while the percentage of a strain's TTE repertoire conserved throughout the group is at the far right. The Y-axis represents the percentage of the TTE repertoire for each strain shared with other strains within the same MLST group and is scaled differently on each graph, however, the total area represented by each graph is 100% of the total effector repertoire for each strain.

We investigated the genome dynamics of TTE genes within the three most deeply sampled clades in order to characterize the evolution of TTE repertoires ( Figure 4 ). For each strain, the majority of the TTE ORFs are present within other closely related strains from that clade. Moreover, most strains share almost all of their TTEs with at least one additional strain from within the same clade. This result is particularly striking for Pta, Pla 107, Pmp, Pla 106, Ptt, Pja, Cit7, Pac, and Psy B728a, which only have a small percentage of novel TTEs in relation to the rest of their clade ( Figure 4 , right). In contrast, a handful of isolates (Pgy R4, Pph 1448a, Pae, Ppi R6, Pan, Pto DC3000) have gained numerous TTE that are not present within any other related strains.

We characterized the TTE content for each of the sequenced strains by similarity searches to all known P. syringae TTE ( Figure 3 ; left). Our query list was generated by combining all previously identified P. syringae TTE with the eight new TTE families we identified ( Materials and Methods ; Dataset S9 ). We also acknowledge the limitation of our study in that TTEs and phytotoxin pathways may have been contained on plasmids or in other regions lost during sub-passaging of these strains. Overall, the total number of potential TTEs (defined as full length ORFs, confirmed for HrpL-induction, and with at least one family member translocated) varies dramatically between strains, from a minimum of 9 (Pja) to a maximum of 39 (Pto DC3000) ( Figure 3 ; right). Furthermore, we find that the group II strains possess lower numbers of known TTEs on average than the other 4 groups. There are a total of five TTE families (hopAA, avrE, hopM, hopI, hopAH) present in some form (as either full length or truncated ORFs or disrupted by IS elements) within each of the sequenced strains. These represent the core TTEs found within all pathogenic P. syringae strains. These TTE are all found in syntenic regions of each genome and three (hopM, avrE and hopAA) are closely linked to TTSS structural genes, as noted [49] . A second class of TTE families (hopX, hopAE, hopAF, hopR, hopAS, hopAB, hopQ, hopD, hopT, hopO, hopW, hopF, hopV, hopAZ, avrPto) are predominantly absent from group II strains. They are located in a wide variety of genomic locations and can be very diverse in sequence, suggestive of extensive horizontal transfer (see below). Sequence differences among members of these families suggest that this class of TTE families may be under different evolutionary pressures relative to the core TTEs ( Table S3 , Figure S14 ).

We list instances where homologues of previously known TTE families were identified in the screened genomes ( Dataset S9 ). In this file we also list instances where genes were identified as HrpL-regulated within the screen, but which were not translocated according to our tests. Since these genes are confirmed to be HrpL-regulated, and are therefore linked to the major pathogenic regulon in P. syringae, they could contribute to virulence in a translocation-independent way [29] .

We are confident that there are now few undiscovered TTE families left to be found that are shared by a majority of these strains. First, we maximized phylogenetic diversity and screened four of the five major phylogenetic groups, using divergent strains within each of these groups. Second, our functional screen is close to saturation as measured by the recovery of known hrpL-regulated TTSS loci from the respective genomes at frequencies similar to previously published reports ( Table S2 ; [23] ).

TTE, toxin, and plant hormone biosynthesis genes are listed across the top, P. syringae genomes, color-coded by phylogenetic group as in Figure 1 . At the left, a blue box indicates presence of full-length ORFs or complete pathways within each genome. Green boxes indicate that genes or pathways are present by similarity searches, but the presence of full-length genes could not be verified by PCR, or the pathways are potentially incomplete. Yellow boxes indicate that genes are either significantly truncated or are disrupted by insertion sequence elements. White boxes indicate absence of genes or pathways from the strains based on homology searches. At the far right, the total number of potentially functional TTE proteins is shown for each genome and displayed according to the color-coded strain and group symbols shown in Figure 1 .

The genomes from a subsample of the total sequenced isolates (Pgy R4, Pmo, Pla 107, Pac, Ptt, Ppi R6, Pla 106, Pmo, Pan, Psy B728a) were functionally screened for new TTE genes using a previously established method [23] based on the observation that many important virulence genes (and all known TTE) are regulated by the alternative sigma factor HrpL. Two additional strains that were not sequenced (P. syringae pv. atrofaciens DSM50255, and pv. maculicola M4) were also screened, and novel TTEs identified from these strains were included in all similarity searches ( Dataset S9 ). We report the full results from all screened putative TTEs, as well as the type of locus identified in the screen (ORF only, or ORF including the putative hrp-box) in Dataset S9 . From this screen, we identified and functionally validated by translocation assays members of eight new TTE families ( Figure 3 , Table 2 ). This increased the number of validated TTE families in P. syringae to 58 (not including avrD, defined according to unified nomenclature rules; [48] ).

The Pla 107 genome assembled into scaffolds representative of a typical chromosome, as well as a ∼1 Mb scaffold with approximately the same GC content and depth of Illumina read coverage as known chromosomal genes ( Figure S11 ) but with little sequence homology to the other P. syringae genomes. We used PCR and Sanger-based sequencing to confirm that this large scaffold was circular ( Figure S12A,B ). Hence, Pla 107 contains a ∼1 Mb megaplasmid. PCR-based screening shows that this megaplasmid is present within a closely related strain (Pla N7512), but absent or significantly modified in two other closely related strains (Pla YM7902 and Pla YM8003; Figure S12C ). Draft genome sequences of closely related outgroups (Pmo, Pta) also lack the megaplasmid. Both strains that contain this extra-chromosomal element grow more slowly in planta and on plates ( Figure S12D,E ). Since these four Pla strains possess nearly identical sequences at their MLST loci, this megaplasmid is a recent acquisition. Although this extra-chromosomal element encodes an astonishing fraction of hypothetical proteins according to the NCBI annotation (776 of 1080 genes), as well as 35 additional conserved but uncharacterized proteins, it also contains “housekeeping” genes highly similar to those in other Pseudomonas species, a potential type IV secretion system distantly related to the Legionella Dot/Icm system, and 38 additional tRNA loci (bringing the total in this strain from 47 to 85). Although many type IV secretion system related structural genes do appear to be present, tBLASTn searches using sequences of known effector proteins did not produce likely hits [46] , [47] . We also searched both the Conserved Domain Database (CDD) and KEGG to identify potential biochemical pathways on the megaplasmid, but found that no complete pathways were present ( Fig. S12F ). The “housekeeping” genes do not appear to be essential as there are often P. syringae homologues found on the main chromosome. The recent acquisition of this megaplasmid could signal the potential for a dramatic ecological shift across these closely related strains.

Current assembly methods for short read technologies are poor at assembling across repetitive regions. Thus, we investigated the presence of plasmids within these strains using a multi-phase approach based upon the presence of plasmid structural genes within the draft genomes, similarity of loci present within these suspected plasmids to known plasmids from the NCBI database, and an approximation of plasmid coverage using Illumina read depth. We find that 13 out of 15 draft genomes likely contain plasmids ( Table S1 , Figure S11 ), highlighting the importance of extra-chromosomal elements in the evolution of P. syringae [45] .

We analyzed the core and pan genomes for the three major clades of P. syringae (groups I, II, and III according to [16] ) ( Figure 2D ; Dataset S5 , S6 , S7 ). Even though each group possesses similar levels of nucleotide sequence divergence, we found that group I strains have ∼500 more genes within their core genome than groups II and III ( Figures 2D , S10 ). Since the number of sequenced isolates is smaller for group I, we performed bootstrapping analysis of the other two clades to show that the inflated core genome of the group I strains is robust to differences in the number of genome sequences sampled (data not shown).

Collectively, P. syringae isolates share ∼50% of their ORFs with other pseudomonads. ( A ) The P. syringae core genome contains 3397 genes. ( B ) The P. syringae pan genome contains 12749 ORFs. ( C ) P. syringae, P. fluorescens, and P. putida share 2501 ORFs. P. syringae has the smallest core genome (3397) compared to P. fluorescens and P. putida (4422, 4034 respectively). P. fluorescens and P. putida share more genes with each other than either does with P. syringae. ( D ) Phylogenetic distribution of shared and clade/strain specific genes. Numbers on the earliest branch for each group indicate the size of the core (black) and pan (red) genomes for groups with multiple sequenced genomes (I, II, III), as well as the number of clade specific ORFs (blue, conserved within each group but absent from other groups). Internal branches display the number of ORFs gained, and shared by all genomes, after each branch bifurcation (see Methods). Numbers of ORFs within each genome absent from other strains within the relevant P. syringae group (black) and throughout the species (blue) are shown at the far right. Group I strains (including Pto DC3000) contain the largest number of shared ORFs and the smallest number of pan ORFs. Pja and Pla 107 have the smallest and the largest number of unique ORFs (88, 1507 respectively).

Phylogenetic analysis of the 19 strains included in this study based on nucleotide sequence of seven conserved loci. Bayesian posterior probabilities are displayed on the phylogeny only at nodes where these values are <0.95. For these unresolved notes, we used an independent phylogenetic approach on another 324 genes that confirmed that this tree captures the evolutionary history of these nodes (methods; Figure S2 ). Each phylogenetic group as defined in [16] was assigned its own color to the left of the phylogeny and strains were assigned symbols; this color and marker scheme continues throughout the figures. In all cases but one (Cit7; leaf surface of healthy Orange tree [95] ) strains were isolated from diseased host plants listed at right.

We created a Bayesian phylogeny for the sequenced strains using fragments based on the MLST loci used in [16] , but extended as far in these gene sequences as was possible to align ( Figure 1 ). We also built maximum likelihood phylogenies by concatenating 324 protein sequences from a subset of proteins present in all strains, after establishing orthology and producing amino acid alignments using a hidden Markov model ( Figure S2 ), as well as individual phylogenies of these 324 protein sequences ( Dataset S8 ). Our phylogenies are largely congruent with prior work, however, we find that the exact placement of Por and Pma and the resolution of the relationships between Pmo, Pta, Pae, and Pla 107 (see Table 1 for strain key) are sensitive to the phylogenetic method used (data not shown). In cases of discrepancies between the tree inferred from MLST sequences, the tree from 324 concatenated sequences, and the individual protein trees, such as the placement of Pae, the second most probable protein tree invariably supported the topology inferred from MLST sequences ( Figure S2 , Dataset S8 ).

We employed a hybrid approach [41] utilizing reads from both Illumina and 454 platforms to generate draft genome sequences for 14 phylogenetically diverse strains of P. syringae ( Table 1 ). These draft genomes are each contained on 32 to 222 scaffolds with the N50 value at 81,010 bp (e.g. half of the total sequenced genome, calculated by summing the lengths of all contigs and scaffolds within a given strain, is found in scaffolds 81,010 bp or greater). Although each genome assembly varies slightly, the size distribution of contigs and scaffolds ( Figure S1 ) is equivalent to what we previously described for our hybrid assembly of re-sequenced Pto DC3000 compared to the published sequence of the same strain [26] , [41] .

Discussion

Bacterial genomes are dynamic. Large-scale changes occur rapidly and differentiate even closely related isolates within the same species. P. syringae, an important pathogen of many plant species, is a diverse assemblage of strains isolated from different host plants as well as from the environment. To reveal the evolutionary history of pathogenesis within this species, we catalogued the virulence gene repertoires for 19 isolates using genome sequencing coupled with a nearly saturating screen for novel TTE families for a subset of the strains. These phylogenetically diverse genome sequences provide a comprehensive comparative tool to investigate pathogenicity and virulence across plant hosts and a means to gain insight into host range and adaptation of this important phytopathogen.

Genome Structure and Diversity Although individual isolates of P. syringae contain upwards of 6000 genes, only 3397 are shared amongst all 19 sequenced strains (Figure 2A). While estimates of core genomes typically decrease with further sampling from diverse isolates, we do not expect the P. syringae core to significantly decrease, given the slope of the data in Figure 2 and our sampling of much of the known phylogenetic diversity of pathovars. For comparison, we identified species-specific core genomes, using the same methods, from multiple sequenced genomes of both P. fluorescens (a plant-associated microbe) and P. putida (a soil bacterium) as well as the subsets of genes shared between different combinations of these species (Figures 3C, S7, S8). Both the unique portion of the P. syringae core and the core genes shared between P. syringae and P. fluorescens, are enriched for proteins involved in protein localization and transport (comparing Figures S7, S8, S9), highlighting the potential role for such processes in adaptation to plant hosts. Surprisingly, the core genomes for both P. putida and P. fluorescens are larger than that for P. syringae (Figure 2C), which could reflect differences in evolutionary pressures for the pathogenic strains or the smaller number of sequenced genomes for P. putida and P. fluorescens (the number of core genes could drop substantially with further sampling). P. syringae group I strains share ∼500 more genes than either of the other two well sampled groups (Figure 2D). The majority of these clade-specific genes encode proteins of unknown function (Figure S10). The 19 P. syringae strains define a larger core genome than do 20 strains of Escherichia coli (3397 vs. 1976) but a substantially smaller pan-genome (12,749 vs. 17,838) [67]. These numbers are surprising given the larger overall genome for pseudomonads in general (average of ∼6000 genes compared to ∼4700 for E. coli). Therefore, even though pseudomonads are ubiquitous across many environments, it is possible that E. coli fills more diverse ecological niches, requiring lower numbers of shared genes but correspondingly higher numbers of unique pathways. However, we have only sequenced isolates from crops and the size of the core genome may be reduced when sampling from more diverse environments. Plasmids, which contain many pathogenicity genes and have the potential for horizontal transfer across strains, are known determinants of virulence evolution within P. syringae [45]. However, plasmids are often filled with repetitive regions that make assembly from short-read sequencing data difficult. We have attempted to identify plasmid regions using a combination of presence of conserved plasmid genes as well as sequencing coverage levels. However, we ultimately found that it was difficult to truly identify presence of plasmids using assembly information alone. The best strategy for sequencing of these difficult regions may still be to isolate individual plasmids and sequence them separately. Not surprisingly, 13 of the 15 pathovars show evidence of plasmids that are of the same approximate size and genomic composition as previously identified or sequenced P. syringae plasmids (Table S1). They contain TTE loci and therefore likely contribute to virulence, as noted previously [45]. Moreover, the backbone and many virulence genes found on the large plasmid of Pph 1448a are present within Pgy R4, Pmo, and Pan and this could reflect a larger role for this plasmid as a virulence factor in multiple host species than previously recognized. Indeed, the virulence gene suite of Pan is more similar to these group III strains than other more closely related strains (Figure S13). We also found a recently acquired 1 Mb megaplasmid in the cucumber isolate Pla 107, and in a closely related strain also isolated from diseased cucumber. This megaplasmid is absent from two other cucumber isolates (Figure S12) and appears to possess the same copy number as the chromosome (Figure S11). This finding was both unexpected and unprecedented, as previously identified megaplasmids are typically conserved within all isolates of a species [68]. Megaplasmids can facilitate dramatic ecological shifts within bacteria [68], but we have not been able to predict phenotypic changes from pathways present on pMPPla107. Additionally, this megaplasmid contains a complete type IV secretion system (TFSS) most closely related to the Dot/ICM system from L. pneumophila. It is unknown whether this TFSS is used strictly for self-transmission and conjugation or if it actively secretes effector proteins. Although we did not find evidence for known type IV effectors on the megaplasmid, presence of this TFSS could represent a completely new contributor to virulence within P. syringae.

Identification and Distribution of Pathogenicity Factors The primary determinants of virulence in P. syringae are TTEs and phytotoxins. Combining a high-throughput promoter trap screen with draft genome sequences for a subset of strains, we identified eight new TTE families (Table 2, Figure 3). In sum there are now 58 (not including avrD) TTE families across these 19 strains [48]. As we identified only nine new TTE families by screening these phylogenetically diverse strains, we believe that we have nearly saturated the discovery of TTE families found within P. syringae. Furthermore, additional candidate loci identified in our functional screen as HrpL-regulated were not translocated (Dataset S9). Each of these loci with non-translocated proteins possesses a functional hrp-box, linking gene expression to a known pathogenicity regulon, and therefore implicating these genes as virulence factors. Moreover, some of these loci have not been identified in previously sequenced genomes or by previous screens. We are confident that we have an exhaustive list of potential effectors for most of the sequenced strains. However, there is still likely to be substantial undiscovered diversity in the HrpL regulon across P. syringae. Comparisons of evolutionary rates for TTE families shared throughout the P. syringae phylogeny could reveal specific TTEs important for virulence on a particular host. Yet, the virulence activity of any TTE can drive strong selection against its presence in a pathovar if that activity leads to recognition by a plant immune receptor. To capture this dynamic, we analyzed two classes of TTEs present within the 19 genomes. First, TTE effector families with wide distribution and very little divergence likely perform conserved virulence functions in a range of plants, and may additionally be evolutionarily ‘unrecognized’ across a wide range of plant hosts. Surprisingly, there are only five core TTE genes present in all pathogenic strains (Figure 3), and at least 4 of these have known virulence functions in A. thaliana [14], [55], [69], [70]. By virtue of their positional orthology in each genome, these few TTE potentially provide basic virulence functions. Second, TTE genes found at different genomic locations in many strains, encoding proteins that are highly divergent (Table S3), could be under intense host selection driving diversification. This may mean that these TTEs have great potential to limit growth or help a pathovar expand across hosts. These widely distributed yet diverse TTE families could represent a class of virulence genes specialized to target rapidly evolving plant genes or pathways. They could, therefore, be most responsible for large-scale differences and limitations in host range. Interestingly, two of these, avrPto and hopAB are known to interact with common, and highly diverged, host PRR kinase domains to suppress host defense [71]–[74]. The rapid evolution within these TTE genes suggests that these TTEs are also widely recognized by the host immune system, leading to rapid loss, replacement, gain and, potentially, diversification. The most divergent TTE families are also experiencing high levels of horizontal gene transfer since their evolutionary histories do not mirror the phylogenies of the respective housekeeping genes (Figure S14). Broadly, group II strains (including the completely sequenced isolate Psy B728a) contain fewer TTEs on average than the other clades (Figure 3). We hypothesize that strains within group II rely more heavily on non-TTSS based virulence factors for virulence as almost all members of this group share two of three known phytotoxin pathways. Indeed, the one strain from this group with the most TTEs (Ppi R6) is the only strain lacking genes for these pathways from this group. Furthermore, although all strains contain genes for the production of the plant hormone auxin, which can be an important virulence factor, only group II strains and the bean pathogen Pph 1448a lack a gene for auxin modification. Taken together, strains in this clade have apparently shifted their mechanisms of pathogenesis through TTE loss coupled with acquisition of phytotoxins by an ancestor of group II. In support of this hypothesis, we note a recent report where syringolin A modifies stomatal function in a manner that is phenotypically similar to, but mechanistically independent of, coronatine [75]. The smaller TTE repertoire of group II strains is not a sampling artifact. First, the gold standard genome of strain Psy B728a has been searched for the presence of TTEs by both experimental and bioinformatic methods [25]. Only 16 TTEs are found within this strain, still significantly less than most of the strains from the other phylogenetic clades. Secondly, we thoroughly sampled Psy B728a and other strains from this group (Pac, Ppi R6, Ptt, Pat) in our screen for novel TTE (Table S2). Thirdly, strains within group IIC have lost the canonical type III secretion system and most associated TTEs but maintain phytotoxin pathways [18]. It is unlikely that the progenitor of this group lacked the canonical TTSS, because structural genes of the TTSS (data not shown) as well as linked TTEs (avrE1, hopAA1, hopM1) appear to have been vertically inherited within this group, including strain Cit7 which diverges earlier than strains with an atypical TTSS (Figure 6B). There is no pattern in the distribution of the remaining phytotoxin pathways in these sequenced strains. Excluding the group II clade and Por 1_6, remaining strains do not harbor multiple pathways for the production of known phytotoxins (Figure 3). As previously reported, only Pph 1448a and Pan share the genes involved in phaseolotoxin production. These pathovars also share many TTE loci commonly found in group III (Figure S13). Since Pan appears to have recently acquired these TTE genes (many of which may be present on a plasmid that resembles the large virulence plasmid of Pph 1448a), it is possible that phaseolotoxin and these shared TTE families target complementary host defense functions. Virulence genes that modify plant hormonal pathways are also evolutionarily dynamic (Figure 3). Both Pgy R4 and Ppi R6 have recently acquired a gene involved in ethylene production. Ethylene production is thought to suppress host responses to biotrophic pathogens [76]. P. syringae strains are biotrophic, at least early in their life cycle on plants. The importance of coronatine, a structural mimic of the plant hormone jasmonic acid, as a virulence factor during Pto DC3000 infection of Arabidopsis has been noted [34]. Our data suggest that coronatine biosynthesis is a recent import into the Pto DC3000 genome and its HrpL regulon. Pto DC3000 mutants deficient in the production of coronatine are impaired during invasion via stomata, but are capable of causing disease if delivered directly into the host tissue [34]. Thus, absence of the coronatine pathway may partially explain why the Pla 106 and the tomato pathogen, Pto T1, are not virulent in Arabidopsis.

Host Range Evolution Host range is notoriously difficult to define because strains could be pathogenic on unrelated, unknown, and invariably untested, host plant species. Additionally, there may be quantitative differences in pathogen growth on a given host species, even among strains grouped as non-pathogens. Furthermore, basic evolutionary questions such as the plasticity of host range and the number of steps involved in adaptation to a new host remain unanswered [4]. However, as the HrpL regulon is important in structuring host range and promoting virulence [4], [8], [29], understanding the evolution of TTE repertoires can reveal the potential for host range evolution across the P. syringae phylogeny. Across three well-sampled clades of the phylogeny, the majority of the TTE suites for each strain are shared between at least two other strains within each group (Figure 4). In only a few cases do singleton TTEs make up a significant part of the total TTE repertoire of each strain. Although only a small number of strains have a high proportion of singleton TTEs, singletons may indicate recent shifts in the host ranges of these strains mediated by TTE gain. They are, therefore, important targets of future research into virulence mechanisms. Other isolates form well-defined groups according to conservation of TTEs (Figure 4, S12). These strains potentially utilize similar virulence strategies during infection, which limits the potential for host shifts among their particular host plants. For instance, Pla 106 was isolated from diseased cucumbers but shares much of its TTE suite with two tomato pathogens. Thus, host shifts might be more likely between tomato and cucumber because these strains carry similar suites of TTEs. Furthermore, as noted above, host range shifts or pathogenicity across bean and kiwi plants may be correlated due to the recent acquisition by Pan of a plasmid likely containing many virulence genes found in Pph 1448a as well as the pathway for phaseolotoxin production. In striking contrast to the conservation patterns of functional TTE is the diversity of TTE inactivation due to truncations and transposon insertions (Figure 4). Because any given TTE can trigger a specific immune response during infection, inactivation of TTEs (i.e. removing the trigger) may play an important role in broadening and maintaining host range [25], [77]. Inactivated and truncated TTEs are more frequently found at the tips of the phylogeny than at internal nodes. Two non-mutually exclusive possibilities can explain this trend; the majority of TTE disruptions occurred recently or inactivated TTE are rapidly purged from the genome [78]. If the rate of TTE truncation and pseudogenization is truly higher at the tips of the phylogeny, then the lack of recognized TTE may be more important for recent changes in host range than the presence of functional TTE. Recent studies using Xanthomonas suggested that isolates convergently evolved to infect the same hosts have acquired very similar sets of TTE [79]. To test this idea in P. syringae, we compared the TTE repertoires of Pla 106 and Pla 107, two distantly related strains from designated as the same pathovar (Figure S15). Five shared TTE families are common to these two strains that could act as general cucumber virulence factors. Three of these appear to be fairly recent acquisitions within both of these strains, in that they are only present within a limited number of other strains within the clade (hopE1, hopA1, hopBD1). Furthermore, hopAG1 is a TTE that has been convergently lost in each of these strains, suggesting that HopAG1 is recognized by cucumber. We tested for recognition of the Psy B728a hopAG1 allele in Pla 107 during growth in planta, but found no effect (Figure S15). Although generalizations of the role of hopAG1 in limiting host range on cucumber should include tests of multiple alleles on multiple cultivars of cucumber, these results suggest that hopAG1 does not play a broad role in limiting host range for pv. lachrymans and such gene loss may just be coincident pseudogenization of unnecessary proteins.

Type III Effector Function and Evolution Host range could also be modified by diversification of shared TTE [36], [80]. Our draft genome data enable the identification of evolutionary signatures of diversification across shared alleles. Although pairwise diversity is slightly higher than the housekeeping loci for most TTE subfamilies, a handful of shared alleles display elevated levels of divergence, suggesting dramatic changes in specificity or function of TTE families. In the hopM1 subfamily, allelic diversification may contribute to host range. This is consistent with the positional conservation of hopM1 across strains, its linkage to the TTSS-encoding pathogenicity island, and its defined virulence function in Pto DC3000. A fragment of HopM1 interacts with the A. thaliana protein AtMIN7, an ARF-GEF protein likely to be involved in vesicle transport and potentially in secretion of anti-microbial products [55]. As shown in Figure 6, it is striking that this TTE has undergone a clean gene conversion event while divergent alleles of other shared TTE are horizontally transferred to different places within the genome. While the hopM1 allele from Pto DC3000 complements virulence deficiencies of the Pto DC3000 ΔCEL mutant, the divergent hopM1 allele from Pmp does not (Figure 6C). Therefore, sequence divergence of hopM1 within the group I strains leads to functional divergence during Arabidopsis infection. These diverse alleles could target different host proteins, have host-dependent specificities of interaction, for example with diverged AtMIN7 orthologs, or have functionally co-evolved with other virulence-related genes in these strains. Interestingly, avrE1 from the Pmp/Pan clade appears to be vertically inherited from the ancestor of the group I strains suggesting that it would complement the Pto DC3000 ΔCEL virulence deficiencies (Figure 6B). In this case, the functions of the Pmp alleles of hopM1 and avrE1 are not likely to be redundant, in contrast to the finding that either allele from Pto DC3000 can complement Pto DC3000 ΔCEL [14], [55], [56]. Given the high levels of divergence across the hopM1 sub-family, it is difficult to pinpoint causal amino acid changes for the virulence defects. As this example illustrates, evolutionary divergence among shared TTE could structure changes in host range and pathogenicity. Our unbiased measurements of diversity are a first step towards identifying TTE families with interesting evolutionary signatures. Our deep phylogenetic sequencing generated a large collection of orthologs. These orthologs are a natural allelic series. As a test case, we used AvrPto to see if its natural diversity could uncover important functional information. AvrPto has been extensively studied, and its physical interaction with Pto has been characterized by both mutagenesis and co-crystallization [58], [60], [64]. We found that the orthologs of AvrPto most closely related to AvrPto Pto DC3000 were able to trigger a Pto-dependent HR. The most informative orthologs were AvrPto Pla 107 and AvrPto Pgy R4 . AvrPto Pla 107 triggered a Pto-dependent HR, while AvrPto Pgy R4 did not. Both of these alleles are divergent from the Pto DC3000, Psy B728a and Pla 106 group, but relative to each other they are only polymorphic at 2 residues (positions 85 and 95); these are conserved in all AvrPto proteins that cause Pto-dependent HR. Both residues lie in or near the previously characterized AvrPto-Pto interaction surface. By individually mutating these two residues back to the consensus residue, we demonstrated that G95 is required for recognition, while M85 is not. G95 is in the GINP loop critical for AvrPto-Pto interaction. Thus, isolation of a natural allelic series allowed us to locate the binding surface on a TTE required for recognition by a host protein. This approach is generalizable to uncharacterized TTEs, given the identification of assayable host response. The hypothesis that the G95 residue of an ancestor of the Pgy R4 ortholog has evolved to avoid recognition by a Pto/Prf-like system is consistent with the AvrPto phylogeny. Our virulence assay on tomato indicates that, consistent with previous studies, AvrPto is capable of mutation away from avirulence, while retaining virulence. The Pgy R4 ortholog is a striking evolutionary confirmation of the generation of avirulence-compromised, but virulence-competent mutants of AvrPto [64]. These data suggest that, at least in the case of AvrPto/Pto, P. syringae may be capable of quickly evolving at the level of a single amino acid to evade host R-gene recognition.