Francisella tularensis is classified as a Class A bioterrorism agent by the U.S. government due to its high virulence and the ease with which it can be spread as an aerosol. It is a facultative intracellular pathogen and the causative agent of tularemia. Ciprofloxacin (Cipro) is a broad spectrum antibiotic effective against Gram-positive and Gram-negative bacteria. Increased Cipro resistance in pathogenic microbes is of serious concern when considering options for medical treatment of bacterial infections. Identification of genes and loci that are associated with Ciprofloxacin resistance will help advance the understanding of resistance mechanisms and may, in the future, provide better treatment options for patients. It may also provide information for development of assays that can rapidly identify Cipro-resistant isolates of this pathogen. In this study, we selected a large number of F. tularensis live vaccine strain (LVS) isolates that survived in progressively higher Ciprofloxacin concentrations, screened the isolates using a whole genome F. tularensis LVS tiling microarray and Illumina sequencing, and identified both known and novel mutations associated with resistance. Genes containing mutations encode DNA gyrase subunit A, a hypothetical protein, an asparagine synthase, a sugar transamine/perosamine synthetase and others. Structural modeling performed on these proteins provides insights into the potential function of these proteins and how they might contribute to Cipro resistance mechanisms.

Funding: The work was funded by the Department of Homeland Security. The funder provided support in the form of salaries for authors [CJJ, KSM, JBT, AZ, SNG, LV, FB, SM, VF, HK, PJJ], but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section.

Studies have shown that naturally occurring F. tularensis strains are susceptible to streptomycin, gentamicin, doxycycline, chloramphenicol and quinolones, and have heterogeneous susceptibility to erythromycin [ 9 – 11 ]. While F. tularensis can acquire Cipro resistance under selective pressure, the mechanisms of Cipro resistance in F. tularensis are not well understood. We selected for survival of F. tularensis LVS isolates in the presence of increasing Cipro concentrations, then compared whole genome sequences of resistant and related sensitive isolates to identify mutations likely to be found in F. tularensis subjected to Ciprofloxacin selective pressure. We performed whole genome analysis using a combination of two methods to assess the relative strengths of each platform for mutation detection. First, we developed a comparative genome hybridization (CGH) tiling microarray for F. tularensis LVS, with successive probes overlapping by at least 85% of their length, and performed two-color hybridizations of each resistant isolate together with the parent LVS strain. We then used Illumina next generation shotgun sequencing to generate large numbers of short sequence reads for each isolate, with more than 200X coverage over the entire genome. Here we describe the mutations found by these experiments and report the results of protein structure analyses to elucidate the underlying resistance mechanisms.

Unintended selection has resulted in a wide range of antibiotic resistant, clinically important pathogens. Most antibiotic resistance mechanisms fall into one of three classes: (1) Resistance based on changes in the structure of proteins targeted by the antibiotics, such as when changes in genes encoding components of topoisomerases change the shape of the sites where Cipro ordinarily binds to them; (2) resistance based on acquisition or increased expression of proteins that directly act on the antibiotic molecule, e.g. of β-lactamase enzymes that break down penicillins; and (3) resistance based on acquisition or upregulation of energy-dependent efflux pumps that actively remove antibiotics from the bacterial cells, such as the Bmr and Blt multi-drug transporter proteins of Bacillus subtilis (1). Efflux pumps are very common in Gram-negative bacteria, are often poorly characterized, and can result in co-resistance to several antibiotics. All three types of resistance mechanisms may be chromosomally encoded or may be acquired on extra-chromosomal elements.

While unintended selection of naturally occurring antibiotic resistant mutants through antibiotic overuse is a long-standing public health issue, a more recent concern is the possibility that hostile individuals or organizations could engineer resistant strains deliberately. These strains could be created either by targeted introduction of resistance elements or by selection of spontaneous mutants. Methods of inserting genetic material have been developed for a number of microbes including B. anthracis, Y. pestis, F. tularensis and B. pseudomallei [ 6 – 8 ]. It is therefore feasible that, by targeting genes that function in microbes closely related to specific threat agents, threat agent isolates that are resistant to therapeutically important antibiotic concentrations can be developed. Knowing which genes or combinations of genes are modified in Cipro resistant isolates, rapid assays can be developed that would detect these changes very quickly. These assays can be used to analyze the antibiotic resistance profiles in order to properly treat the exposed individuals as rapidly as possible. Moreover, such information would be valuable for forensic analysis to determine possible association with a suspected biowarfare or bioterrorism activity.

Ciprofloxacin (Cipro) is a broad-spectrum bactericidal fluoroquinolone antibiotic effective against many Gram-positive and Gram-negative bacteria. Its known mode of action is to bind to DNA topoisomerases involved in bacterial DNA replication, resulting in multiple double-stranded breaks in the bacterial chromosome. Studies of naturally-occurring mutations in several Gram-positive and Gram-negative pathogens that result in Ciprofloxacin resistance show that amino acid substitutions within the quinolone resistance-determining regions (QRDRs) of the gyrA and parC (and, in some cases, gyrB and parE) genes play crucial roles in resistance to this and other quinolone compounds. Cipro resistance in B. anthracis is associated with single nucleotide polymorphisms (SNPs) in gyrA and parC ([ 1 ] and our own unpublished results) but may also result from changes in either the structure or expression of multi-drug efflux pumps that actively remove antibiotics from microbial cells [ 2 ]. A single mutation within either a topoisomerase or an efflux pump gene or its regulatory region may be sufficient to make B. anthracis resistant to low Cipro concentrations. However, a combination of mutations is apparently required to confer resistance to higher antibiotic concentrations (19). Recent studies of B. anthracis ([ 3 ] and our own unpublished results) also identified mutations associated with Cipro resistance in TetR-type transcriptional regulator genes. Point mutations in gyrA and marA associated with multi-drug and Cipro resistance have been observed in Yersinia pestis [ 4 , 5 ] though they likely represent a minor fraction of the mutations that confer antibiotic resistance in this species.

Methods

Selection of Cipro resistant mutants A parental avirulent F. tularensis subsp. holartica LVS strain was provided by the CDC. F. tularensis LVS culture was streaked onto a Mueller Hinton broth (MHB) agar plate (enriched with Proteose Peptone, NaCl 2 , Bovine serum, D-(+) Glucose, Ferric Pyrophosphate, and Iso-Vitalex). The wild-type Ciprofloxacin minimum inhibitory concentration (MIC) value was determined for F. tularensis LVS by picking a single colony to inoculate 2 mL enriched MHB and incubating overnight at 37°C, 180 rpm. A subculture containing 2 mL enriched MHB was inoculated with 200 μL of the overnight culture and incubated at 37°C, 180 rpm to an optical density at 600 nm of 0.8. A Cipro E-test (BioMerieux) was applied to an enriched MHB agar plate swabbed for full coverage with the F. tularensis LVS subculture, and the E-test plate was incubated overnight at 37°C in an atmosphere containing 5% CO 2 . An approximate Cipro MIC was determined to be 0.023 μg/mL for the wild-type F. tularensis LVS. Cultures were prepared for first-round selections by inoculating each well of a 24 well bioblock containing 2 mL of enriched MHB with the same single F. tularensis LVS colony. The bioblock was covered with an airpore tape seal and incubated at 37°C, 180 rpm overnight. Fresh subcultures were prepared by adding 20 μL of each overnight culture to 2 mL enriched MHB. The subcultures were incubated at 37°C, 180 rpm for approximately 4–6 hours to an optical density at 600 nm of 0.8. Cell suspensions were concentrated by centrifugation at 4,000 g for 2 min. The supernatant was discarded and each cell pellet was suspended in the remaining 200 μL of enriched MHB. Each of the 24 suspensions was plated on enriched MHB agar plates containing 0.075 μg/mL Cipro (approximately three times the wild-type MIC value). These 24 first-round selection plates were incubated at 37°C, up to 72 hours in a CO 2 enriched atmosphere. One Cipro resistant colony was picked from each plate into 2 mL enriched MHB containing 0.05 μg/mL Cipro (75% of the resistant concentrations) and incubated at 37°C, 180 rpm overnight. Subcultures were prepared by adding 20 μL of the passage culture that grew in the presence of Cipro to 2 mL enriched MHB without Cipro and incubating at 37°C, 180 rpm to an optical density at 600 nm of 0.8. These subcultures were used for MIC value determinations (as indicated above) and to prepare frozen stocks by adding 700 μL of the subculture to 300 μL sterile 80% glycerol followed by storage at -80°C. Second- and third-round selections using first-round resistant isolates were carried out by increasing Cipro concentrations to approximately three-fold the parent generation MIC values at each step. Approximately 10 second-round resistant isolates were collected following selection for resistance to a higher Cipro concentration for each of 24 first-round mutants (approximately 240 total), and up to five third-round resistant isolates were collected following exposure of each second round isolate to even higher Cipro concentrations producing approximately 1,000 Cipro resistant F. tularensis LVS isolates. Resistant colonies were verified to be F. tularensis LVS by colony morphology and F. tularensis-specific PCR with a forward primer of: GGCTATATGATGGCATTTTTATTAG; and a reverse primer of: GATATATACCCATTATCGAACCATCC. Glycerol stock dilutions were used directly as templates for the PCR analyses.

Whole genome tiling array design for F. tularensis LVS Tiling arrays were designed for F. tularensis LVS using the NimbleGen 388K array platform, which supports probes of multiple lengths on the same array. We developed computational tools to design probes that tile across entire bacterial genomes while satisfying length, overlap and melting temperature (T m ) constraints. By designing and hybridizing F. tularensis DNA to several test arrays, we determined that a length range of 32–40 nucleotides (nt) provided optimal sensitivity and specificity; reference genomic DNA did not consistently bind to probes shorter than 32 nt, while probes longer than 40 nt did not discriminate well between perfect match targets and targets containing SNPs (data not shown). Individual probe lengths were selected to minimize the overall variation of melting temperatures, given the allowed length range of 32–40 nt. A T m range of 74±3°C was selected, based on GC content of the F. tularensis LVS genome and a median probe length of 36 nt. Melting temperatures were calculated using Unafold [12] which employs accurate nearest neighbor thermodynamic predictions. Probes were tiled with an overlap of 85% (every 5–6 nt) across the sequences of the Francisella tularensis subsp. holarctica LVS chromosome (GenBank gi number 89143280) and plasmids pOM1 from F. tularensis LVS (gi number 10954617), pFPHI01 from F. philomiragia subsp. philomiragia strain ATCC 25017 (gi number 167626220), and pFNL10 from F. tularensis subsp. novicida strain F6168 (gi number 32455353). There were a total of 363,359 unique tiled probe sequences on the array. Every seventeenth probe was replicated on the array. We included 3,494 probes containing randomly generated sequences, matching the length and GC% distributions of the tiled probes as negative controls for assessing the distribution of background signals.

Microarray hybridization of mutant and reference DNAs Genomic DNAs from wild type and Cipro resistant isolates were isolated using a Promega Wizard™ genomic DNA purification kit. DNA labeling and hybridization were performed as described in [13] with the following modifications. The reference LVS DNA was labeled with Cy3-labeled random 9-mers and the DNA from the Cipro resistant isolates was labeled with Cy5-labeled random 9-mers. Two μg of the Cy3 labeled reference DNA and Cy5-labeled DNA from a Cipro resistant isolate were hybridized to the same array. Hybridization was for 17 hours at 42°C temperature. Following hybridization, arrays were washed, then scanned using an Axon 4000B scanner (Molecular Devices, Sunnyvale, CA) at 5 μm resolution. Excitation wavelengths of 532 nm and 635 nm were used to detect Cy3 and Cy5 hybridization, respectively. Array images were saved as TIFF files. NimbleScan software 2.4 (Roche Diagnostics) was used to compute the probe fluorescent intensities from TIFF images and overlay them to pair file reports (text files with the signal intensities from the array). The pair reports were used for statistical analysis of microarray data.

Statistical analysis of sequence changes from tiling microarrays An algorithm called TAPS (Tiling Array Polymorphism Sensor) was developed to analyze data from the two-color hybridizations. The TAPS algorithm is based on a thermodynamic model that predicts the effect of mutations on probe-target hybridization affinities, and estimates the likelihood of a mutation at every reference genome position, given the intensities of all probes overlapping the position, The algorithm superficially resembles the “SNPscanner” algorithm of Gresham et al [14], but requires fewer training parameters (70 vs. 4608), and is less susceptible to over-fitting. It can also analyze two-color data sets, and is not restricted to Affymetrix array designs. The TAPS algorithm models the effect of a SNP on the intensity of an overlapping probe as a function of several variables: the reference channel probe intensity, the position of the SNP in the probe sequence, the base substitution relative to the reference genome, and the two perfect-match bases on either side of the SNP locus. We assume that probe intensity decreases as the free energy of hybridization increases (becomes less negative), and that the free energy ΔG is a sum of contributions from aligned pairs of nearest-neighbor (NN) nucleotides. A SNP in the target sequence increases the free energy by replacing two perfect-match NN pairs with pairs having a single mismatch. For example, a mutation that changes the sequence AGC to ATC replaces the perfect match pairs AG/TC and GC/CG with the mismatch pairs AT/TC and TC/CG. Since our tiling array only has probes for the reference genome sequence, it does not provide information about the specific base substitution in the target genome. However, we can predict the average effect of the three possible substitutions at the central base of a particular base triplet. To estimate these average mutation effects for the different base triplets, we performed experiments in which labeled DNA from the reference LVS strain was hybridized to an array, together with a differentially labeled DNA from a different F. tularensis strain of known sequence (subspecies tularensis strain Schu S4 or subspecies novicida strain U112), and thus, with known sequence variations relative to the LVS strain. S1 Fig shows the distributions of log intensity ratios for probes overlapping known sequence variations between the LVS and Schu S4 strains, for an array hybridized to these two strains. The distributions are shown as a box plot, with probes grouped by the reference triplet centered at the SNP locus. As expected, SNPs affecting a triplet with a central G or C base have a stronger effect on average than those replacing an A or a T. The TAPS model also includes a multiplicative position effect, in which SNPs aligning near the middle of a probe cause larger intensity drops than SNPs aligned near the ends, especially the 3’ region closest to the array surface. We expected to see this positional effect based on our earlier work with virulence gene arrays [13]. S2 Fig shows a typical profile of intensity change vs. SNP position, for the same Schu S4 vs. LVS array used in S1 Fig. Each column in this plot represents the distribution of log intensity ratios between the Cy3 (LVS) and Cy5 (Schu S4) channels, for probes overlapping a Schu S4 variation at a given position in the probe; the central bar represents the range from the 25th to the 75th percentiles. We see that, on average, the intensity drop is almost two-fold when a SNP affects the nucleotides binding near the middle of the probe, but is reduced to zero at either end. Even in the absence of SNP effects, probe intensities will differ between the two channels due to dye effects, scanner bias and noise. To correct for these effects, each pair of intensities (y ref , y mut ) was transformed into the log ratio (M) and log geometric mean (A): A semi-parametric regression model was fitted using the M vs. A data for all probes: in which the error term (A) has mean 0 and variance 2(A), and μ(A) and σ2(A) are smooth mean and variance functions. The functions μ(A) and σ2(A) were fit to the M and A values for all probes on each array, using regression on cubic splines to fit μ(A), and a smoothing spline on binned squared residuals to fit σ2(A). Since SNPs only affect a small fraction of probes on the array, the fitted μ(A) closely approximates the mean function for perfect match probes (those not overlapping variations between the reference and target strains). To model the effect of a free energy change ΔΔG = ΔG mut − ΔG ref on the log intensity ratio, we assume that the probe DNA oligomers within an array feature can be in one of three states: unbound, bound to target DNA from the mutant strain, or bound to target DNA from the reference. At thermodynamic equilibrium at temperature T, the fraction of oligomers bound to mutant DNA is given by the Boltzmann equation: where R is the gas constant; a similar equation holds for the fraction of oligomers bound to reference DNA, θ ref . It follows that Since the probe intensity for each dye at concentrations well above background and below saturation scales with the fraction of oligomers bound to target labeled with that dye, we expect the SNP effect on the log intensity ratio to be proportional to ΔΔG. Therefore, for probes overlapping SNPs, our semi-parametric regression model is modified to include a term for the SNP effect: where w is a proportionality constant (typically < 0) and the noise term (A) is assumed to be Gaussian with mean 0 and the same variance 2(A) as was estimated for perfect match probes. The free energy effect w ΔΔG is modeled as a product of triplet and position effects: where τ indexes the triplet and x is the position of the SNP within the probe, as a fraction of the probe length. The position effect h(x) is approximated by a polynomial function of degree 5: The triplet effects are assumed to be equivalent for reverse complements, so there are 32 β τ parameters and six α j parameters to be fit. Note that the proportionality constant w has been absorbed into the triplet effects. The model parameters were fit to data from the experiments described above, in which arrays were hybridized to DNA from F. tularensis strains of known genome sequence, and thus with SNPs at known positions relative to the reference LVS genome. To make the parameters identifiable, we scaled the coefficients α j so that h(0.5) = 1. To apply the model to data from target strains of unknown sequence, we computed a log likelihood ratio test statistic for every position z in the reference genome. Let P(z) be the set of probes overlapping position z, and let M i and A i be the log intensity ratio and average for probe i. The semi-parametric regression model given above leads to the following expression for the log likelihood: Under the null hypothesis that there is no SNP at position z, then ΔΔG i = 0 for all probes in P(z), and the log likelihood is given by: Under the alternative hypothesis that there is a SNP at position z, ΔΔG i was computed for each probe using the fitted model parameters, leading to a different log likelihood value log L alt (z). The log likelihood ratio test statistic is simply: To identify candidate SNP loci, we computed log LR(z) for every position z in the reference genome and compared it to a threshold value, which we selected by analyzing data from the test arrays hybridized to DNA from F. tularensis strains with SNPs at known positions relative to the LVS strain, and choosing the threshold that gave the best tradeoff between false positive and false negative error rates. This threshold was 20 for the F. tularensis arrays. Typically SNPs were characterized by a contiguous series of position values with log LR scores above the threshold. The most likely SNP location within the series was identified by the position with the maximum score. As an example, Fig 1 plots the test statistic values for a short region of the DNA gyrase A gene in one of the Cipro resistant F. tularensis LVS isolates. The log likelihood ratio has an obvious peak in this region. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Fig 1. The test statistic values for a short region of the DNA gyrase A gene in one of the Cipro resistant F. tularensis isolates. The log likelihood ratio has a clear peak in this region. Candidate SNP positions were identified by looking for regions of the genome where the log likelihood ratio exceeds a fixed threshold. https://doi.org/10.1371/journal.pone.0163458.g001

Illumina sequence data generation and quality control Illumina paired end libraries were prepared from 1 μg of genomic DNA from each of eleven third round Cipro resistant isolates, for the purpose of single-end sequencing on the Genome Analyzer IIx. Briefly, the gDNA was fragmented, ends repaired, A’ tagged, ligated to adaptors, size-selected and enriched with 13 cycles of PCR. Each library was assigned one lane of a flow cell to undergo cluster amplification and sequencing on the Genome Analyzer IIx, and 36 cycles of single-end sequence data were generated. One lane of paired end 51 cycle sequence data were generated for F. tularensis LVS Cipro resistant isolate 1:1:5. The resulting sequencing reads were filtered using the default parameters of the Illumina QC pipeline (Bustard + Gerald). As an additional quality control step, all reads were analyzed using the PIQA pipeline [15]. This pipeline examines genomic reads produced by Illumina machines and provides tile-by-tile and cycle-by-cycle graphical representations of cluster density, quality scores, and nucleotide frequencies. This method allows easy identification of defective tiles, mistakes in sample/library preparations and abnormalities in the frequencies of appearance of sequenced genomic reads. All reads were determined to be of sufficient quality to proceed with subsequent analysis. The amount of sequence data generated for each sample is indicated in Table 1. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Table 1. Illumina sequence data summary. https://doi.org/10.1371/journal.pone.0163458.t001

Mapping and identifying candidate mutations The sequence reads from each of the samples were mapped with up to 1 mismatch to the reference F. tularensis LVS genome (RefSeq accession NC_007880). To avoid uncertainty associated with identifying mutations in repeatable parts of the reference genome, for each position in the reference sequence a uniqueness score based on the subsequences covering this nucleotide was determined. Specifically, the copy number of each subsequence of size 36 (the length of reads used in sequencing) present in the reference genome was first calculated; the uniqueness score of each position in the reference genome was then defined as the total number of subsequences (factoring in the copy number) which covered this position. For example, in this metric, the score of 36 will appear only if each subsequence covering a given nucleotide is unique in the reference; higher scores indicate that one or more subsequences are present in the reference in several copies. 94.11% (1,784,242 bases) of the F. tularensis LVS (NC_007880) reference genome has a uniqueness score of 36. Mutations in these positions can be detected without the ambiguity caused by the presence of repeatable regions. A given position is predicted to contain a mutation if: (1) the number of reads confirming the mutation on each strand exceeds the minimum count threshold–ensuring that only positions that achieve the minimum required coverage are considered, and (2) the proportion of reads confirming a mutation out of all the reads covering a given position exceeds a ratio threshold–ensuring that only mutations that have the minimum required support are identified. As a compromise between mutation detection sensitivity and false discovery rate, the minimum count threshold was set at 10% of the median of the nucleotide-by-nucleotide coverage for each sample, and the ratio threshold was set at 30% of the total coverage on a per-nucleotide basis. In the present analysis, mutations confirmed on both strands (if the number of reads supporting the mutation exceeds the minimum count threshold on each of the strands separately) are distinguished from mutations for which such a condition was met on only one strand. In the case of insertions, the mapping process results in the association of both perfect matches (PM) and insertions to the same location on the reference genome. Thus different ratio threshold criteria are used to detect different types of mutations at a given genome position. The criterion for detecting a substitution of base B for the reference base is: The criterion for detecting a deletion is: The criterion for detecting an insertion of base B on the plus strand is: In the numerators of the above formulas, SubB+/-, Del+/-, and InsB+/- stand for the numbers of reads confirming a substitution, deletion, or insertion, respectively, mapping to the genome strand indicated by the superscript. For substitutions and insertions, SubB- and InsB- indicate the numbers of reads mapped to the minus strand in which the base complementary to B is substituted or inserted. In the denominators, the variables PM, SubACTG, and InsACTG respectively indicate the numbers of reads confirming a perfect match (PM), a substitution of any base, or an insertion of any base, at the genome position of interest. While paired end data was generated, the reads were decoupled and a single-end read assembly (using in-house algorithms) was performed on each of the sequence data sets. These contigs are shorter in length than contigs obtained with paired end data, but in general have fewer errors. Each mutation identified in each sample was confirmed to be present on the contigs assembled for that sample. Mutations (including insertions, deletions, and substitutions) that pass both thresholds and appear on both strands are less likely to be sequencing read generation or mapping artifacts. Mutations that only appear on one strand and cannot be verified on the opposite strand (something that is not common, given sufficient coverage), such as insertions, other than ‘G’ after ‘G’, ‘C’ after ‘C’, ‘A’ after ‘A’, and ‘T’ after ‘T’ are likely artifacts of sequencing/mapping (false positives) or positions in the genome that did not have sufficient coverage to be verified on both strands.

PCR and Sanger sequencing confirmation of Cipro-resistant mutants To confirm mutations identified by tiling microarray and Illumina sequencing, PCR oligonucleotide primers were designed using Primer3™ [16] to amplify F. tularensis LVS genome-specific sequences surrounding the locus where the mutations were identified. In addition to round 3 Cipro-resistant isolates, PCR and sequencing reactions were also performed on round 1 and 2 isolates to identify the selection step in which each mutation occurred. PCR was performed using Promega PCR reagents. Sanger sequencing was performed using ABI3730 DNA analyzers at the DOE Joint Genome Institute in Walnut Creek, CA or at Elim Biopharmaceuticals, Inc (Hayward, CA).

Analysis of the impact of the mutations on protein structure and function The automated homology modeling system AS2TS [17] was used with other computational tools (http://proteinmodel.org) to construct and analyze structural models for all F. tularensis LVS proteins (listed in Table 2 and S4 Table). Created structural models were analyzed to assess the possibility of conformational changes implied by the observed mutations, and to estimate the level of possible sequence variability in identified structurally conserved regions. Structure alignments were calculated using the program LGA (Local Global Alignment) [18] and evaluation of detected structural similarities between LVS proteins and related structures from Protein Data Bank (PDB) was performed by StralSV sequence/structure variability evaluation system [19]. StralSV identifies all structurally similar protein structure fragments in the PDB for any given structural motif, evaluates calculated structure-based alignments between the query motif and the fragments, and quantifies observed sequence variability at each residue position. The output from the system enables rapid identification of invariant residues (often those essential to protein function) and unusual variants, and predictions about natural or engineered mutations that are not yet observed in current sequence databases. Results from the StralSV analysis allowed us to characterize observed mutation points by assigning their location on the protein (e.g. buried, exposed, within an active site), and to identify other proteins (sometimes from more distant organisms) in which a similar structural motif with a given substitution was observed and characterized. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Table 2. Genes in Cipro resistant F. tularensis LVS isolates containing mutations identified both by Illumina sequencing and SNP microarray. The reference genome F. tularensis NC_007880 was used to determine the reference genome position. The lists of identified amino acid diversities at the given mutation points observed in the corresponding positions in homologous proteins are provided in the “Amino acid change” column. https://doi.org/10.1371/journal.pone.0163458.t002