Genome Sequencing and SNP Identification

Table 1. Table 1. Single-Nucleotide Polymorphisms (SNPs) and Indels in 3I-Type Mycobacterium leprae Genomes Found in the United States.

Using Illumina technology, we obtained deep coverage of the genome sequences from M. leprae reference strains NHDP-55, NHDP-98, and the wild-armadillo–derived strain I-30 (Table 1, and Table S4 in the Supplementary Appendix). The resultant sequence reads were mapped onto the genome sequence of the TN reference strain (from India, SNP type 1A)29 and compared with the other sequenced M. leprae reference strains: Br4923 (from Brazil, SNP type 4P), NHDP-63 (from the United States, SNP type 3I), and Thai53 (from Thailand, SNP type 1A).22 This analysis confirmed the exceptionally high level of sequence conservation (99.995% identity), even among M. leprae strains of widely different geographic origins, and identified all four U.S.-derived genomes as SNP type 3I.

On detailed comparison of these seven genome sequences, 52 markers were found only in the SNP type 3I strains. These 3I strains differed among themselves at 21 positions (Table 1, and Table S4 in the Supplementary Appendix). One 11-bp indel (indel_17915) was particularly important, since the 3I strains have only one copy of the sequence (TTGGTGGTGTA, in pseudogene ML0014), whereas all other M. leprae strains have two copies.

Genotyping of M. leprae Strains

SNP Analysis and Classification

We classified M. leprae obtained from 33 wild armadillos, 50 biopsy specimens from U.S. patients, and 4 foreign reference strains, using the algorithm shown in Figure 2. Armadillos were sampled in the five states known to harbor the sylvan infection (Figure 1).14 Among the 50 U.S. patients examined (see Table S1 in the Supplementary Appendix), 39 reported a residence history in areas of the United States or Mexico where endemic exposure to armadillo-borne M. leprae was possible, and 29 of these 39 had no history of foreign residence.

Figure 3. Figure 3. Minimum-Spanning Phylogenetic Tree of Mycobacterium leprae Genotypes Based on Analysis of Single-Nucleotide Polymorphisms (SNPs) and Variable-Number Tandem Repeats (VNTRs). Minimum-spanning-tree analysis was performed with the use of combined VNTR and SNP data from human and armadillo M. leprae strains. Each circle represents a genotype (human unless marked as armadillo) based on the combined data, with the circle size directly proportional to the number of strains with the corresponding genotype. Numbers along the links between circles indicate the number of loci that differ between the genotypes on either side of the link. Three fully sequenced reference M. leprae strains(TN, Thai53, and Br492322,29) are labeled, as are two other reference strains (LWM26 and 43926) of foreign origin. Samples from patients with a history of foreign residence are indicated with an asterisk (with three asterisks indicating three patients). The 114 polymorphisms investigated include 84 SNPs described previously22 and 30 identified during our study; 10 VNTRs were also analyzed. The large circle illustrates the predominance of the 3I-2-v1 M. leprae genotype in our study, with 25 patients and 28 armadillos having this identical genotype.

SNP analysis revealed seven types of M. leprae strains in the United States, including four found in patients with no history of foreign residence (Figure 3). Some exotic strains may have become endemic over time among patients with no history of foreign residence or may now occur in the United States as a result of unreported exposure (e.g., SNP type 1A, which is more commonly associated with the Philippines, in Patient H-02).22 Among patients with possible exposure by means of foreign residence only, the SNP type was typical for strains previously reported from the foreign location. SNP type 3I, generally associated with European–American populations, was most abundant in our samples, found in those from all 33 armadillos and 26 of the 29 patients with no history of foreign residence.

To improve the resolution of our data for M. leprae 3I strains, we also surveyed for 30 of the 52 newly discovered markers. In addition to the 11-bp indel, 24 of 30 markers were restricted to SNPs of type 3I, irrespective of the source of the strain. However, in four 3I strains identified in patients, five SNPs contained ancestral bases and may represent intermediate sequences arising during the evolutionary divergence of 3I strains from their common ancestor. The strains with ancestral bases were classified as having SNPs of the subtype 3I-1 to differentiate them from the more divergent strains classified as 3I-2 strains found in all armadillos and most indigenous U.S. patients (Figure 3, and Table S5 in the Supplementary Appendix).

To gain more insight into the distribution of 3I-1 strains in the Americas, we examined biopsy specimens from 64 Venezuelan patients infected with M. leprae strains containing SNP type 3I. Of the 64 specimens, 48 (75%) belong to the strain subtype 3I-1 and 16 (25%) belong to strain subtype 3I-2. In addition, we found 3I-1 subtypes in patients from Brazil, Puerto Rico, and the Dominican Republic. Therefore, the prevalence of 3I-1 and 3I-2 strains in North America is significantly different (P<0.001) from their prevalence in the Caribbean and South America (see Table S6 in the Supplementary Appendix).

VNTR and Minimum-Spanning-Tree Analyses

Owing to the remarkable conservation of the M. leprae genome, SNP analysis is of limited power. Accordingly, we used 10 polymorphic VNTR markers to enhance discrimination (see Table S4 in the Supplementary Appendix) of strain subtypes.26 Minimum-spanning-tree analysis of the combined SNP and VNTR profiles was performed to examine relationships among strains (Figure 3). The resulting SNP–VNTR genotypes (see Table S1 in the Supplementary Appendix) confirm a high degree of homogeneity between M. leprae from armadillos and most indigenous U.S. cases of leprosy (Figure 3). Among wild armadillos, 28 of the 33 strains showed complete genetic identity with respect to the SNP–VNTR genotype (3I-2-v1); the remaining 5 strains comprised two genotypes, 3I-2-v14 (2 specimens) and 3I-2-v13 (3 specimens) that varied at one VNTR locus (see Table S5 in the Supplementary Appendix). Similarly, 25 of the 39 patients (64%) with a history of residence in areas in which exposure to M. leprae from armadillos was possible — including 22 of the 29 patients with no history of foreign residence — also carried the 3I-2-v1 strain. The 3I-2-v1 genotype appears to be unique and highly distinctive. This combination of VNTR alleles was not found within a database of VNTR genotypes identified around the world,28 and allele frequencies in that database suggest a probability of random reassortment of the VNTR v1 genotype of only 1 in 10,000.28

In our study, 3I-2-v1 was the only genotype found in more than two patients. The combination of SNP and VNTR genotyping is highly discriminatory and confirms a significant association between the M. leprae strain infecting armadillos and many U.S. patients. The 3I-2-v1 strain was significantly associated with a history of residence in areas where M. leprae–infected armadillos have been found (P<0.001). People with leprosy who live in areas with infected armadillos and who have no history of foreign residence have a significantly increased risk of presenting with infection with the 3I-2-v1 strain, as compared with any other strain (odds ratio, 16.5; 95% confidence interval [CI], 4.2 to 64.7; P<0.001) (see Table S6 in the Supplementary Appendix).

A history regarding contact with armadillos was available for 15 patients. Although 7 recalled no contact, 8 recalled having contact, including 1 who reported frequently hunting, cooking, and eating armadillos. Nine of the 15 patients were infected with the 3I-2-v1 strain. These data confirm interaction with armadillos by some of our patients, and suggest an increased likelihood of infection with the 3I-2-v1 strain as a result (odds ratio vs. having no contact, 4.0; 95% CI, 0.5 to 35.8; P=0.314).