Salmonella enterica subspecies enterica is traditionally subdivided into serovars by serological and nutritional characteristics. We used Multilocus Sequence Typing (MLST) to assign 4,257 isolates from 554 serovars to 1092 sequence types (STs). The majority of the isolates and many STs were grouped into 138 genetically closely related clusters called eBurstGroups (eBGs). Many eBGs correspond to a serovar, for example most Typhimurium are in eBG1 and most Enteritidis are in eBG4, but many eBGs contained more than one serovar. Furthermore, most serovars were polyphyletic and are distributed across multiple unrelated eBGs. Thus, serovar designations confounded genetically unrelated isolates and failed to recognize natural evolutionary groupings. An inability of serotyping to correctly group isolates was most apparent for Paratyphi B and its variant Java. Most Paratyphi B were included within a sub-cluster of STs belonging to eBG5, which also encompasses a separate sub-cluster of Java STs. However, diphasic Java variants were also found in two other eBGs and monophasic Java variants were in four other eBGs or STs, one of which is in subspecies salamae and a second of which includes isolates assigned to Enteritidis, Dublin and monophasic Paratyphi B. Similarly, Choleraesuis was found in eBG6 and is closely related to Paratyphi C, which is in eBG20. However, Choleraesuis var. Decatur consists of isolates from seven other, unrelated eBGs or STs. The serological assignment of these Decatur isolates to Choleraesuis likely reflects lateral gene transfer of flagellar genes between unrelated bacteria plus purifying selection. By confounding multiple evolutionary groups, serotyping can be misleading about the disease potential of S. enterica. Unlike serotyping, MLST recognizes evolutionary groupings and we recommend that Salmonella classification by serotyping should be replaced by MLST or its equivalents.

Microbiologists have used serological and nutritional characteristics to subdivide pathogenic bacteria for nearly 100 years. These subdivisions in Salmonella enterica are called serovars, some of which are thought to be associated with particular diseases and epidemiology. We used MultiLocus Sequence-based Typing (MLST) to identify clusters of S. enterica isolates that are related by evolutionary descent. Some clusters correspond to serovars on a one to one basis. But many clusters include multiple serovars, which is of public health significance, and most serovars span multiple, unrelated clusters. Despite its broad usage, serological typing of S. enterica has resulted in confusing systematics, with a few exceptions. We recommend that serotyping for strain discrimination of S. enterica be replaced by a DNA-based method, such as MLST. Serotyping and other non-sequence based typing methods are routinely used for detecting outbreaks and to support public health responses. Moving away from these methods will require a major shift in thinking by public health microbiology laboratories as well as national and international agencies. However, a transition to the routine use of MLST, supplemented where appropriate by even more discriminatory sequence-based typing methods based on entire genomes, will provide a clearer picture of long-term transmission routes of Salmonella, facilitate data transfer and support global control measures.

Funding: MA and JLH were supported by the Science Foundation of Ireland (05/FE1/B882), www.sfi.ie . Initially work by MA and VS was supported by the Max-Planck Gesellschaft ( www.mpg.de ). JW, SN and GD were supported by the Wellcome Trust of Great Britain ( www.welcome.ac.uk ). AE was supported by the BMBF (grant 01 LW 06001), www.bmbf.de and MIWFT (313-21200200) www.wissenschaft.nrw.de . Work by F-XW and SB was supported by the Institut Pasteur ( www.pasteur.fr ) and a grant from the Institut de Veille Sanitaire (Saint-Maurice, France). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

This is an open-access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.

Here we describe the population structure of subspecies enterica on the basis of MLST, examine the extent of congruence between serotyping and MLST clusters, and conclude that serotyping of S. enterica should be replaced by MLST.

We investigated isolates from diverse hosts, both diseased and healthy, as well as from the environment. We screened isolates from all continents and deliberately included representatives of rare serovars as well as unusual monophasic and diphasic variants from reference collections. All this data was submitted to a publically accessible MLST database ( http://mlst.ucc.ie/mlst/dbs/Senterica ). In April, 2011, that database included 4,257 isolates ( Table S1 ) from 554 serovars of S. enterica subspecies enterica that had been assigned to 1,092 STs. The database also contained 436 isolates from the other S. enterica subspecies as well as Salmonella bongori, whose properties will be described elsewhere, as will analyses of associations with host or geography.

Instead of MLEE, a sequence-based alternative, MultiLocus Sequence Typing (MLST), has gained broad acceptance for many microbial species [32] . MLST is based on similar principles to MLEE, but has greater discrimination and is more objective because it is based on sequences of multiple housekeeping gene fragments rather than electrophoretic migration of proteins. Of equal importance, MLST schemes are community efforts because the data are publicly available online ( http://pubmlst.org/databases.shtml ) and data can be entered from decentralized sources. Isolates that possess identical alleles for all gene fragments are assigned to a common Sequence Type (ST), and STs that share all but one or two alleles are grouped into ST-based clonal complexes [33] on the basis of eBurst [34] . An MLST scheme involving seven housekeeping gene fragments was developed for the analysis of serovar Typhi [9] , and subsequently tested with 110 isolates from 25 serovars of S. enterica subspecies enterica [35] , most of which were from Selander's SARB collection of reference strains for MLEE [30] . Subsequent analyses have used this scheme to survey serovars Newport [12] , [36] and Typhimurium [37] – [39] , as well as smaller numbers of isolates of various serovars from wild animals in Australia [40] and the mesenteric lymph nodes of cattle in Canada [41] . The same scheme has also been used to survey the genetic properties of antibiotic-resistant isolates among a global sample of various serovars [42] . These initial results suggested that MLST often correlates with serovar, with some exceptions. If this inference were correct, it would be advisable to replace serotyping by MLST for routine epidemiological purposes. We therefore embarked on a major, decentralized effort to test this hypothesis.

We recommend another approach, namely using neutral markers to identify genetically related clusters of S. enterica. Serovar designations that reflect such groupings could be preserved, and possibly be detected by informative SNPs in those neutral markers, whereas other serovars need to be revised or possibly eliminated. Twenty years ago, a valiant attempt was made to identify natural groupings within S. enterica on the basis of MultiLocus Enzyme Electrophoresis (MLEE) [29] – [31] . MLEE data identified multiple monophyletic lineages that corresponded to individual serovars. Problematically, most serovars that were examined included exceptional isolates that were unrelated to the main lineage, and some serovars were composed of multiple, genetically unrelated lineages rather than one predominant lineage. MLEE was never generally accepted by microbiologists and these observations have not influenced the general use of serovar designations.

Serovar designations are widely used for epidemiological purposes due to the belief that they are discriminatory, and because serovars represent a globally understandable form of communication. However, as noted by McQuiston et al. [13] , [14] , serotyping has multiple disadvantages, including low throughput, high expense, and a requirement for considerable expertise as well as numerous antibodies made by immunizing rabbits. As a result, various molecular methods have been proposed as potential alternatives to serotyping for subdividing Salmonella (and other microbes) [15] , [16] , ranging from PFGE (Pulsed-Field Gel Electrophoresis) [17] , [18] through to MLVA (MultiLocus Variable number of tandem repeats Analysis) [19] , [20] . These methods are possibly useful for recognizing a common source of microorganisms from a single outbreak [21] , but they are inappropriate for reliable assignments of isolates to one of the 2,500 S. enterica serovars. Still other attempts have been made to develop DNA-sequence based equivalents of serotyping [22] – [26] , including the detection of particular single nucleotide polymorphisms (SNPs) within flagellar antigens [13] , [14] . This approach shares with serotyping the assumption that serotyping reflects genetic relatedness or disease specificity, which needs not be generally true [12] . For example, genes encoding antigenic epitopes can be imported by horizontal genetic exchange and homologous recombination from unrelated lineages. As a result, genetically related serovars such as Heidelberg and Typhimurium possess very different fliC alleles whereas genetically distinct serovars can possess nearly identical alleles [27] . Thus, replacing serological determination by serotype-based molecular assays would maintain a system that does not necessarily reflect genetic relatedness. Furthermore, some serovar designations will need revision because they distinguish between minor antigenic variants of organisms that are genetically very similar, e.g. Dublin and Rostock [28] or Paratyphi A and Sendai [29] .

The use of serotyping within Salmonella as a typing method is so widely accepted that governmental agencies have formulated guidelines intended to reduce human salmonellosis by targeting Typhimurium, Enteritidis and three other common serovars in domesticated animals (European Union EC Regulation 2160/2003 of 12/12/2003). Such regulations implicitly assume that serovars are associated with a particular disease potential [3] , [4] , an assumption that is also suggested by some of their names, e.g. Abortusequi, Abortusovis and Choleraesuis. These designations reflect a medical microbiological tradition of assigning distinctive taxonomic designations to microorganisms that are associated with particular diseases or hosts. However, this tradition is not necessarily warranted from an evolutionary perspective, as illustrated by the following examples. For some taxa, species designations have been used to designate genetically monomorphic clones of a broader species with a different pathogenic potential, e.g. the clone of Yersinia pseudotuberculosis that is called Y. pestis [5] , the host-specific ecotypes of the Mycobacterium tuberculosis complex that are designated M. bovis, M. microti, M. pinnipedii and M. caprae [6] , or the isolates of Escherichia coli that have been assigned to multiple species of the genus Shigella [7] . In other cases, taxonomic designations have grouped members of paraphyletic groups of microorganisms because they cause similar diseases, such as the anthrax toxin-producing variants of Bacillus cereus that are designated Bacillus anthracis [8] . That all isolates of an individual serovar of S. enterica share a common phylogenetic ancestry should therefore be considered to represent a working hypothesis that requires confirmation. Similarly, a supposed host and/or disease specificity needs to be confirmed by genetically informative methods with isolates from diverse geographical regions. These working hypotheses has been confirmed for serovar Typhi, which corresponds to a genetically monomorphic, recently evolved clone that causes typhoid fever in humans [9] – [11] . In contrast, multiple, discrete lineages have been identified within serovar Newport [12] . Close genetic relatedness and a monolithically uniform association with host/disease specificity remain to be demonstrated for most other serovars, especially because only few of them have yet been investigated in detail.

For over 70 years, epidemiological investigations of Salmonella that infect humans and animals have depended on serotyping, the binning of isolates into serovars [1] , [2] . Salmonella serotyping depends on specific agglutination reactions with adsorbed antisera that are specific for epitopes (‘factors’) within either lipopolysaccharide (O antigen; encoded by rfb genes) or one of the two, alternate flagellar antigens (phases 1 and 2 of H antigen, encoded by fliC and fljB). Various combinations of 46 O antigens and 85 H antigens have resulted in ∼1,500 serovars within S. enterica subspecies enterica and ∼1000 in the other subspecies of S. enterica plus S. bongori ( Fig. 1 ) [2] .

Results

Many Salmonella STs cluster together in discrete groups, which we refer to as eBGs (eBurstGroups). We chose the designation eBG rather than “Clonal Complex” or “ST Complex” because Clonal Complex implies clonality [43], whereas homologous recombination between unrelated lineages is frequent in S. enterica [12], [44], [45], and ST Complex does not specify a grouping algorithm. Following the recommendations by Feil et al. [46], [47], we designated as an eBG all groups of two or more STs that were connected by pair-wise identity at six of the seven gene fragments, i.e. they shared six of the seven alleles that defined the ST. As the MLST database has grown, multiple singleton STs containing multiple isolates have formed eBG clusters via the incremental identification of novel, related STs. We therefore also designated ungrouped singleton STs as eBGs when they contained 10 or more isolates. Finally, a few existing eBGs were expanded to include singleton STs that shared five identical alleles (double locus variants; DLVs) as well as a common serovar. Based on these criteria, 3,550 of the 4,257 isolates were assigned to a total of 138 eBGs, containing between 580 isolates in multiple STs and two isolates in two STs (Table S2).

Variable association between eBG and serovar Some eBGs exhibit a unique one-to-one relationship with serovar, for example eBG13 (Typhi), eBG11 (Paratyphi A) and eBG26 (Heidelberg) (Table S1). Of the 48 eBGs containing at least 15 isolates, 22 contain a single serovar, or its monophasic variants. In contrast, 26 other eBGs contain multiple serovars (or isolates whose serovar is unknown), as indicated by white sectors in Fig. 2. Similarly, of the 42 serovars from which we sampled at least 15 isolates, 17 were associated with a single eBG but the remaining 25 serovars were associated with multiple eBGs and/or STs. Particularly dramatic examples of serovars that encompass multiple, distinct eBGs are Newport [12], Paratyphi B (see below) and Oranienburg (Fig. 2, Table S2) but multiple MLST clusters per serovar are common throughout the entire dataset, even in serovars from which only two isolates were tested (Fig. S2). Discrepancies between serotyping and assignments to eBGs by MLST might reflect mistakes in serotyping or MLST sequencing, or both. Due to the decentralized sources of data, such mistakes almost certainly exist within the database. However, the MLST database is actively curated. Each nucleotide within a new MLST allele must be supported by at least two independent sequence traces before that allele is accepted by the curator, which has led to the rejection of multiple submissions of new alleles. All STs containing novel combinations of known alleles are examined visually for internally consistent genetic relationships to other STs and serovars. In multiple cases, this curation has resulted in rejecting such STs and subsequent resequencing of the gene fragments revealed technical errors. However, the most common discrepancy which we have encountered has been inaccurate serotyping, which has plagued several percent of database entries from all the laboratories involved in this project, as well as in ring trials for testing laboratory accuracy [52]. In numerous cases where the serovar and the ST of new entries were discordant with other isolates, re-serotyping revealed that the original culture had been contaminated, or had been inaccurately serotyped. However, despite active curation and rechecking serotypes and STs, multiple discrepancies remain between genetic relationships of STs and serovar, which are described below in greater detail for four test cases of increasing complexity.

Serovar Typhimurium eBG1 contained 482 isolates of serovar Typhimurium, which has the antigenic formula [1],4,[5],12:i:1,2 (Table S2). [The colons divide the epitopes within the lipopolysaccharide (LPS) O antigen (4,12) from those in the phase 1 flagellar antigen (i) and the phase 2 flagellar antigen (1,2). Numbers in square parentheses designate epitopes that are variably present within a serovar, in some cases due to lysogenic conversion by bacteriophages.] eBG1 also contained so-called monophasic variants of Typhimurium, 88 isolates that do not express the phase 2 antigen and four isolates that do not express the phase 1 antigen, as well as rough and non-motile variants (Fig. 4, Table S2). The presence of these serological variants within eBG1 indicates that they are genetically related to Typhimurium, and therefore these monophasic, rough and non-motile variants potentially represent mutations or recombination events affecting expression of LPS or the flagellar antigens encoded by fliC (phase 1) and fljB (phase 2). Prior work has indicated that monophasic variants represent multiple, independent genetic events [53], [54], and our results support this interpretation. ST19, the central ST in eBG1, contains two distinct forms of monophasic variants, and both monophasic as well as diphasic variants are also found in ST34. eBG1 also includes one isolate each of the serovars Hato and Farsta, whose antigenic formulas differ from Typhimurium at the phase 1 and 2 antigens, respectively (Table S3). PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Figure 4. MSTree of Typhimurium plus its serological variants. Each circle represents one ST, subdivided into one sector per isolate, flanked by the ST number in small print. The primary links between STs within the MSTree are indicated by straight lines and additional cross-links at the same level of identity are indicated by lines that are terminated by bars. eBG designations are indicated by rounded white boxes. White sectors indicate a lack of serological information. Serological formulas are summarized in Table S3. Other details are as in Fig. 2. https://doi.org/10.1371/journal.ppat.1002776.g004 Not all Typhimurium isolates are grouped in eBG1 (Table S1, S3) and exceptional isolates were found in eBG138 and ST513. eBG138 shares only three identical alleles with eBG1 although it contains seven Typhimurium isolates plus nine monophasic Typhimurium isolates. Similarly, ST513 contains five Typhimurium isolates plus one Kunduchi isolate, whose phase 1 antigen differs from that of Typhimurium. ST513 also shares only three alleles with eBG1. Thus, serotyping has conflated Typhimurium with isolates from genetically distant eBGs while failing to group related Typhimurium with its monophasic variants. Serotyping has also conflated genetically unrelated isolates of serovars Kunduchi, Farsta and Hato. Isolates of these serovars are found in six additional STs, each of which is unrelated to the others or to the STs containing Typhimurium (Fig. 4, Table S3).

Serovars Enteritidis and Dublin Two hundred and forty two serovar Enteritidis isolates ([1],9,12:g,m:-) were present in eBG4, as well as two non-motile variants (Table S2, Fig. 5). eBG4 also includes several serovars that differ from Enteritidis by their phase 1 (serovars Rosenberg, Moscow, Blegdam and Antarctica) or O antigens (Nitra) (Table S4). In addition, eBG4 includes a discrete sub-lineage consisting of multiple isolates of the serovars Gallinarum and Gallinarum var. Pullorum (henceforth referred to as Pullorum). In fact, Gallinarum and Pullorum are non-motile serological variants of Enteritidis that cause distinctive forms of lethal disease in poultry (fowl typhoid and pullorum disease, respectively), but can otherwise be difficult to distinguish because they differ in nutritional capabilities (biotypes) rather than serologically [55]. According to MLST, four STs containing Gallinarum were closely related to ST11, the most common ST in eBG4. Two STs containing Pullorum isolates branched from the basal Gallinarum ST, ST470 (Fig. 5). Similar results have previously been obtained with MLEE [56] and a genomic comparison of one strain each of Enteritidis and Gallinarum also indicated a close relationship [57]. Two Enteritidis isolates were assigned to ST77 and ST6, and a unique, diphasic Enteritidis isolate is in ST746, which are all unrelated to eBG4. Thus, like Typhimurium, most Enteritidis isolates are in one primary eBG but rare isolates are present in multiple unrelated eBGs and STs. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Figure 5. MSTree of Enteritidis, Dublin, Paratyphi B and their serological variants. Serological formulas are summarized in Tables S4 and S5. Other details are as in Fig. 4. Additional information on Paratyphi B and Java isolates can be found in Tables 2 and S6. https://doi.org/10.1371/journal.ppat.1002776.g005 Serovar Dublin ([1],9,12,[Vi]:g,p:-) contains the flagellar p epitope rather than the m epitope in serovar Enteritidis. The majority (115) of Dublin isolates were grouped in eBG53, which shares only three alleles with eBG4, the main Enteritidis cluster, supporting this serological distinction. The remaining Dublin and Enteritidis isolates were found in eBG93 (Enteritidis: 5 isolates, Dublin: 1) and ST74 of eBG32 (Enteritidis: 1, Dublin: 1, Enteritidis/Dublin 1). eBG93 is intermediate between eBG4 and eBG53, sharing four alleles with each. ST74 shares none with either and other STs of eBG32 contained monophasic isolates of serovars Paratyphi B and Paratyphi B var. Java (henceforth Java) (Fig. 5), which only share the O12 antigen with Enteritidis or Dublin. It has previously been reported that strain RKS1550 (also designated SARB14; MLEE ET Du2) has the phase 1 antigenic formula g,m,p, which is a combination of the phase 1 antigens found in Enteritidis (g,m) and Dublin (g,p) [28]. Its FliC sequence encodes Ala220 and Thr315, which are typical of Enteritidis, as well as Ala318, which is typical of Dublin [28]. SARB14 was one of the three strains assigned to ST74. We confirmed by sequencing the presence of these three amino acids in its FliC sequence, and also found that the two other ST74 isolates possessed the same three substitutions. One of those two isolates had been serotyped as Dublin and the other as Enteritidis. However, we have now found that some such strains can be variably serotyped as Enteritidis, Dublin or both because different laboratories use different strains to generate and absorb serological typing sera. In agreement with observations from MLEE [28], the primary Dublin eBG, eBG53, also includes six isolates of serovar Rostock. It also includes one isolate each of serovars Naestved and Kiel. Serovars Rostock and Naestved contain additional epitopes in the phase 1 antigen while serovar Kiel contains a distinct epitope in the O antigen. Rostock, Naestved and Kiel have not yet been found outside eBG53.