Metagenomic survey of rifamycin biosynthetic diversity

The complexity of soil microbiomes limits the utility of shotgun sequencing as a tool for identifying biosynthetic gene clusters in soil metagenomes. Instead, PCR-based methods that use degenerate primers to target conserved natural product biosynthetic genes have been developed to study the biosynthetic gene cluster diversity present in an environmental sample, much in the same way that bacterial phylogenetic diversity is routinely evaluated through the analysis of PCR-amplified 16S genes12,13. To assess the diversity of rifamycin-like gene clusters present in soil microbiomes, we used degenerate primers targeting the 3-amino-5-hydroxy benzoic acid (AHBA) synthase gene, which encodes the final step in AHBA biosynthesis (Fig. 1). AHBA is the universal precursor for the ansamycin family of natural products, including the rifamycins14. The phylogenetic divergence of AHBA synthase genes correlates closely with the structural divergence of the metabolites encoded by the biosynthetic gene clusters from which an AHBA synthase gene arises, making it an information-rich target for identifying rifamycin-like gene clusters in metagenomes using PCR-based methods (Fig. 1)15,16,17,18,19.

Fig. 1 The rifamycin biosynthetic gene cluster and the role of AHBA synthase. a The rifamycin gene cluster from Amycolatopsis mediterranei. b The reaction catalyzed by AHBA synthase and the structure of rifamycin SV (the product of the rifamycin gene cluster). The rifamycin SV structure is colored according to the genes responsible for producing its PK core (red), AHBA-derived substructure (green), and tailoring functionalities (black). The phylogenetic divergence of AHBA synthase genes from previously characterized gene clusters correlates with the different structural classes of ansamycins Full size image

To identify metagenomes containing rifamycin-like biosynthetic gene clusters, environmental DNA (eDNA) isolated from a collection of approximately 1500 geographically and ecologically diverse soils was used as the template in PCR reactions with degenerate primers designed to amplify AHBA synthase genes (Fig. 2a; Supplementary Table 1). Amplicon sequences generated from each soil were then compared to a reference collection of AHBA synthase genes from characterized ansamycin biosynthetic gene clusters. A soil was considered a potential source of a rifamycin-like gene cluster if it contained an AHBA synthase sequence that was more closely related to a gene from a known rifamycin-like gene cluster than from any other ansamycin family gene cluster (Fig. 2a). Based on this analysis, rifamycin-like biosynthetic gene clusters were present in approximately half of the soils we examined. AHBA synthase amplicons within the rifamycin-like sequence-space form a number of well-defined clades (Fig. 2a), which we predicted might be associated with groups of biosynthetic gene clusters encoding structurally distinct congeners. To access the potentially new rifamycin-like gene clusters from soil metagenomes, we constructed saturating cosmid-based metagenomic libraries from seven soils. This subset of soils yielded a multitude of distinct AHBA synthase sequences, which were predicted to span all of the major rifamycin-like clades that we identified.

Fig. 2 Sequence-based screen for rifamycin congener gene clusters. a Screening overview. DNA isolated from ~1500 soils was screened for the presence of AHBA synthase genes by PCR using degenerate primers. Sequence tags generated in this screen were used to construct a phylogenetic tree, onto which AHBA synthase reference sequences from known rifamycin congener gene clusters were mapped (marked with asterisks). Large, distinct clades in the phylogenetic tree are shown in different colors. Metagenomic DNA cosmid libraries were generated from soils that contained AHBA sequence tags that spanned all AHBA clades predicted to be associated with rifamycin congener gene clusters. To facilitate the recovery of individual clones containing gene clusters of interest, each metagenomic library was expanded to contain >20,000,000 unique eDNA cosmids and formatted as smaller subpools of between 20,000 and 60,000 unique cosmid clones per sub-pool. Primary clones (those containing an AHBA synthase gene) were recovered from AHBA positive subpools using a PCR dilution method and degenerate AHBA synthase primers. The same approach, but with degenerate primers targeting PKS ketosynthase (KS) domains and the rif15A/15B tailoring genes, was used to recover regions of the pathways that flank those found on the primary clone. AHBA sequence tags corresponding with primary clones that were targeted for recovery are indicated with arrows on the phylogenetic tree. b Summary of rifamycin congener gene clusters recovered from the soil metagenomes. Portions of the gene clusters found on primary clones are shown on a gray background Full size image

Rifamycin-like gene clusters from metagenomic libraries

In sequenced biosynthetic gene clusters that encode rifamycin family members, a variable region containing tailoring genes, responsible for generating most of the structural diversity seen in rifamycin congeners, resides directly downstream to the AHBA biosynthesis operon (Fig. 1). To guide the isolation of eDNA cosmids containing tailoring genes, the seven newly constructed and two pre-existing soil eDNA libraries were screened with the same AHBA synthase degenerate primers that we used to screen crude eDNA extracts (Fig. 2a). We initially recovered 35 unique cosmids (i.e., primary clones) from sublibrary pools that yielded rifamycin-like AHBA sequences (Fig. 2b). Sequencing of these cosmids revealed that variations in the collections of predicted tailoring genes largely changed in concert with the phylogenetic divergence of the AHBA synthase genes.

Representative gene clusters associated with each major AHBA synthase clade were recovered in their entirety on sets of overlapping cosmid clones (Fig. 2a). Each collection of overlapping cosmids was sequenced, assembled into a single continuous stretch of DNA and annotated in silico to reveal an eDNA-derived rifamycin-like gene cluster (Fig. 2b). eDNA-derived gene clusters were predicted to encode a number of enzymes that have not previously been associated with rifamycin congener biosynthesis (e.g., N-acyltransferases, CoA-transferases, propionyl-CoA carboxylases, methylmalonyl-CoA mutases, and lanthionine synthetase-like enzymes) (Fig. 2b and Supplementary Figure 1A). A number of other tailoring genes found in these clusters are phylogenetically distinct from those found in known rifamycin-like gene clusters, suggesting they may differentially functionalize the rifamycin backbone. These genes are predicted to encode glycosyltransferases, methyltransferases, cytochrome P450s, oxidoreductases, and sugar biosynthesis enzymes (Supplementary Figures 2 and 3).

In most cases, the polyketide synthase (PKS) portion of each gene cluster is predicted to be functionally identical. However, a number of gene clusters with the most complex sets of tailoring genes were predicted to encode a change in the substrate specificity of the acyltransferase (AT) domain in the eighth PKS module (AT8*, Fig. 2b). These AT8* domains are predicted to use ethylmalonyl-CoA (Emal) as a substrate instead of methylmalonyl-CoA (Mmal)20,21, which would introduce a two-carbon branch into the rifamycin polyketide (PK) core (Supplementary Figure 1B). The combination of a potential change in the PK core structure and a complex collection of tailoring genes led us to prioritize this family of gene clusters for investigation. We hypothesized that these gene clusters would encode the most complex rifamycin congeners to have evolved to date and that this increased complexity might have evolved in response to common rifamycin resistance mechanisms. Based on AHBA synthase phylogeny, 13% of the rifamycin-like AHBA synthase sequences we amplified from soil environments are predicted to arise from this family of gene clusters (Fig. 2a, orange colored clades). While our screening suggests this is a common class of gene clusters in the environment, a search of all publicly available sequenced bacterial genomes only revealed one gene cluster that contains a similarly complex tailoring gene region and an AT8* domain. This previously uncharacterized gene cluster from Amycolatopsis vancoresmycina (Ava) is identical in gene content and organization to the RifCon 10 gene cluster that we recovered from a soil eDNA library (Fig. 3a).

Fig. 3 Analysis of the kng gene cluster and the activity of Kangs A, V1, and V2. a Comparison of the rifamycin (rif) and Kang (kng) gene clusters. Lines connecting the two clusters indicate genes that are predicted to be functionally equivalent. For simplicity, only genes lacking a counterpart in the rif cluster are labeled in the kng cluster. Colored boxes surrounding these genes correspond to the substructures they are predicted to encode (shown in panel C). b Structures of Kangs A, V1 and V2. c Summary of the proposed biosynthesis of Kang V2. The structure of Kang V2 is colored as follows: red, PK core; blue, Emal modification; green, AHBA-derived substructure; black, tailoring modifications. Colored bubbles highlighting the key structural features of Kang V2 correspond with the genes in (A) that are predicted to encode for these features. The PKS module 8 dehydratase (dh) domain, which is predicted to be inactive, is shown in lower case letters to differentiate it from the remaining, active domains. d In vivo activity profiles of the Kangs against RifR Sau. The structure of Rif is shown along with the three most commonly mutated RNAP residues in RifR Mtb clinical isolates3, 29. The dashed line and arcs indicate an H-bond and nonpolar contacts, respectively. e. In vitro transcription assay with radiolabeled nucleotides showing the activity of Rif and the Kangs at the concentrations indicated against Msm wild-type and RifR βS447L RNAP. F, full-length transcript; A, abortive transcript Full size image

Highly functionalized congeners from an AT8* gene cluster

As an initial exploration of the tailoring gene-rich family of gene clusters that contain an AT8* domain, we looked for rifamycin congeners in ethyl acetate extracts from cultures of Ava. While Ava has never been reported to produce rifamycin-like metabolites, we identified three major HPLC peaks with rifamycin-like UV spectra in culture broth extracts (Supplementary Figure 4). The structure of each metabolite was elucidated using a combination of high-resolution mass spectrometry (HRMS), 1D and 2D NMR and UV data. 13C-NMR, HRESIMS [calcd m/z for C 50 H 64 NO 19 (M + H+) 982.4073, found m/z 982.4025], and UV data for compound 1 were consistent with the structure of Kang A, a rifamycin congener originally characterized from Amycolatopsis mediterranei var. kanglensis and encoded by an uncharacterized gene cluster (Supplementary Figures 5–10, Supplementary Table 2)22,23. The most dramatic differences between Kang A and other rifamycins are that it contains a β-O-3,4−O,O’-methylene digitoxose deoxysugar (hereafter, K-sugar) substituent at C-27, an oxidized ethyl substituent in place of a methyl substituent at C-20, and a gem-dimethylsuccinic acid (K-acid) appended to the oxidized ethyl branch in the PK core (Fig. 3b).

The predicted molecular formula for the second metabolite, Kang V1 (2) [HRESIMS calcd m/z for C 50 H 65 NO 19 Na (M + Na+) 1006.4048, found m/z 1006.4006], suggested it was a reduced analogue of Kang A. A comparison of 1 and 2D NMR data from (1) and (2) allowed us to assign this difference to the reduction of the C-11 ketone to an alcohol [13C δ 77.1, 1H δ 5.49 (1 H, s)] (Supplementary Figures 11–16, Supplementary Table 2). To the best of our knowledge, this C-11 reduction has only been seen in one previously described rifamycin natural product congener, chaxamycin D24. In the case of the third metabolite, Kang V2 (3), HRMS data suggested it differed from Kang A (1) by the addition of a CH 2 moiety [HRESIMS calcd m/z for C 51 H 66 NO 19 (M + H+) 996.4229, found m/z 996.4197]. The UV spectra of Kang V2 (3) supported the presence of a naphthohydroquinone moiety (λ max 302 nm) instead of the naphthoquinone (λ max 276) seen in (1) and (2) (Supplementary Figure 4). The naphthohydroquinone substructure was also supported by an HMBC correlation from H-3 to a carbon at δ 150.5 ppm (C-4; Supplementary Figures 17–22, Supplementary Table 2). In Kang A and V1, this carbon is significantly more deshielded (δ 184.9 and 188.8 ppm, respectively). The presence of the carbonyl at C-8 in Kang V2 was supported by an HMBC correlation from the C-14 methyl to C-8 (δ 191.7). The formation of a fourth ring on the naphthohydroquinone substructure through the addition of a highly deshielded methylene [13C δ 98.4, 1H δ 6.19 (1 H, d), δ 5.48 (1 H, d)] was defined by HMBC correlations from the new methylene protons to C-4 and C-11 of the naphthohydroquinone (Supplementary Figure 17). To the best of our knowledge, the fourth ring formed by the addition of the methylenedioxy bridge in Kang V2 (3) is not found in any reported rifamycin congeners.

Many of the new structural features found on the Kangs can be rationalized based on differences in gene content between the Kang (kng) gene cluster and other rifamycin family gene clusters (Fig. 3a, c, Supplementary Figures 23 and 24, Supplementary Table 3). In addition to the AT8*-containing kngD domain, the kng cluster contains a collection of deoxysugar biosynthesis genes (kng22, kng23, and kng27) and a glycosyltransferase gene (kng26) that we predict are involved in generating the K-sugar modification25,26. The kng gene cluster also contains a set of genes (kng30, kng34A/B, and kng35) that we predict are involved in producing the K-acid; however, the genes responsible for installing the gem-dimethyl functionality on the succinic acid are not bioinformatically obvious. An O-methyltransferase, encoded by kng24, and an additional cytochrome P450, encoded by kng28, may participate in generating the methylenedioxy bridge found on the K-sugar as well as the Kang V2 ring system27,28.

Kangs are active against RifR RNAPs via a distinct mechanism

Kangs A, V1 and V2 are active as antibiotics against Gram-positive bacteria, including Staphylococcus aureus (Sau), Staphylococcus epidermidis, Listeria monocytogenes, and Mtb (Supplementary Table 4). Kangs V1 and V2 both show improved activity against Mtb (H37Rv; IC 90 3.12 and 1.56 µM, respectively) compared to Kang A (12.5 µM). We were particularly interested in whether the complex structural features seen in the Kangs might impart improved activity against mutations in RNAP that confer RifR. Substitutions at just three RNAP amino acid positions, Mtb RNAP β subunit D441, H451, and S456 (corresponding to Msm/E. coli [Eco] RNAP β subunit D432/D516, H442/H526, and S447/S531) account for the vast majority of mutations observed in RifR Mtb clinical isolates3,29. The antibacterial activity of the Kangs against RifR RNAP mutants was assessed in vivo using a collection of Sau strains carrying RNAP point mutations and in vitro using purified wild-type and RifR (S447L) Msm RNAPs30,31. The use of these models allowed us to explore the activity of the Kangs against mutations that correspond to the most commonly mutated sites in RifR Mtb, without necessitating the use of restrictive BSL3 assay conditions.

The Kangs are active against RifR Sau strains carrying RNAP mutations at sites corresponding to those commonly mutated in RifR Mtb clinical isolates (Fig. 3d). Kang V1 showed an ~80-fold lower MIC (0.069 µg mL−1) than Rif (5.6 µg mL−1) against a Sau RNAP βD471Y mutant strain. Kang V2 exhibited similarly potent activity (MIC 0.069 µg mL−1) against a Sau strain carrying an RNAP βS486L mutation, which corresponds to the most commonly observed RifR mutation in Mtb clinical isolates (Mtb RNAP βS456L), appearing in ~40–80% of sequenced isolates from geographically diverse regions of the world32,33,34,35,36,37,38,39,40. As with Mtb, the Sau RNAP βS486L mutation effectively abrogates antibacterial activity of Rif (MIC > 50 µg mL−1). Remarkably, Kang V2 showed more potent activity against the Sau RNAP βS486L mutant than against the wild-type strain, suggesting it might have evolved in a niche where this variant is the dominant form of RNAP. Based on the results of our MIC assay, we predicted that treatment of wild-type cells with Kangs V1 and V2 could effectively suppress the development of two common RifR phenotypes. Indeed, in Sau we were not able to identify any Kang V2 resistant mutants that carried the βS486L mutation (Supplementary Figure 25) nor could we identify any βD471Y mutants that arose when cultures were treated with Kang V1. Each of these mutations occurred at a frequency of approximately 10% among RifR Sau colonies. Consistent with the results of our MIC assay, an H481Y mutation, which confers a high level of resistance to all of the compounds, was the predominant mutation that arose following exposure of Sau to either Rif or the Kangs. While mutations at H481 were the most common variants we sequenced in RifR Sau strains (~70%), the βS456L mutation (Sau βS486L) predominates in RifR Mtb clinical isolates29.

To determine whether the activity of the Kangs against the Sau RNAP βS486L mutant could be generalized to mycobacterial RNAP carrying the equivalent mutation, we tested the in vitro activity of the Kangs against purified Msm RNAP using a run-off transcription assay (Fig. 3e). The Msm RNAP exhibits 91% sequence identity with Mtb RNAP at the amino acid level and shows complete conservation of residues in the Rif binding pocket31. We found that the Kangs were all potent in vitro inhibitors of wild-type Msm RNAP, with comparable activity to Rif. While Rif was inactive against an Msm RNAP βS447L mutant (corresponding to Mtb/Sau RNAP β S456L/S486L), all three Kangs displayed potent activity against this mutant. In agreement with the results of our Sau MIC assays, Kang V2 showed the highest potency against the RifR Msm RNAP (Fig. 3e).

Kangs exhibit distinct mechanistic properties

Detailed analysis of the transcription assays suggested that the mechanism by which the Kangs inhibit RNAP differs from that of Rif. The effects of Rif on RNAP transcription activity at each stage of the transcription cycle have been probed extensively. Rif has little to no effect on promoter binding or open complex formation41,42, but causes an increase in the apparent K m for the initiating substrate NTPs binding in the enzyme i and i + 1 sites41,43, thus affecting dinucleotide synthesis at lower NTP concentrations. Importantly, Rif does not affect RNAP catalysis itself (phosphodiester bond formation)41,44. The predominant effect of Rif is steric occlusion of the translocating nascent transcript after the formation of the first phosphodiester bond, resulting in the inhibition of the production of full-length transcript (F, Fig. 3e) but over-production of abortive dinucleotide transcripts (A, Fig. 3e)41,44,45. In contrast to the effect of Rif, the Kangs inhibited production of the full-length transcripts but also the abortive transcripts, suggesting that the Kangs inhibit a step of transcription preceding that of Rif—either substrate (DNA or initiating nucleotide) binding or phosphodiester bond catalysis itself.

Kang A and Rif share core interactions with RNAP

While Kang V1 and V2 showed the highest levels of activity against bacteria carrying specific clinically important RifR mutations, all three Kangs exhibit improved activity compared to Rif against RNAP variants carrying common mutations found in RifR Mtb clinical isolates. We speculated that the activity of the Kangs against RifR mutants and their potentially novel mechanism of inhibition could be related to the presence of the unique K-sugar and K-acid, which all three Kangs share. To explore this hypothesis, we examined a crystal structure of a mycobacterial RNAP complexed with Kang A, the parent compound in the Kang family, and compared it to a structure complexed with Rif. A more detailed examination of the interaction between each Kang congener and the specific RNAP mutant against which it is most potent will be the focus of future studies.

Kang A and Rif were soaked into crystals of an Msm RNAP transcription initiation complex (TIC)31. Both structures were phased by molecular replacement using the Msm TIC as a model and refined to 3.05 Å resolution (Fig. 4a–c, Supplementary Table 5). The structures of both antibiotics, including the K-sugar and K-acid moieties unique to the Kangs, as well as the RNAP β subunit interaction determinants for the antibiotics, were well-resolved (Supplementary Figure 26A and B). The tip of the σ-finger (a structural element of the σ subunit) also approaches each antibiotic and appears to make molecular contacts. However, because the σ-finger electron density is very weak (reflected in high atomic B-factors) and amino acid substitutions in σ that confer RifR have never been reported, the role of these interactions with Rif and Kang A remain to be established. We note that previous studies deleting the σ-finger suggested a role for this motif in binding to the Rif variant rifabutin but not to another variant, rifapentine46, indicating that the significance of σ-finger/antibiotic interactions is dependent on the specific Rif variant.

Fig. 4 Structural basis for Kang A inhibition of RifR RNAP. a Overall view of the Msm TIC bound to Kang A. The Rif scaffold of Kang A is colored orange, K-sugar yellow, K-acid violet. wt, wild-type. The boxed region is magnified in (B) (showing Rif/wt-RNAP), (C) (Kang A/wt-RNAP), and (D) (Kang A/S447L-RNAP), but the view is rotated 90° as shown. b View into the Rif binding pocket (wt-RNAP) from inside the Msm RNAP active site cleft. Carbon atoms of the Rif scaffold are colored orange, carbon atoms of the 1-methyl-piperazine moiety are colored green. The RNAP is shown as a backbone worm with a transparent molecular surface (β, light cyan; β′, light pink). Density for the RNAP active site Mg2+ was very weak so it was not modeled, but its position is denoted by a dashed yellow circle. RNAP β subunit side chains that make direct contacts with Rif are shown, labeled and colored cyan. Polar interactions are indicated by dashed lines (gray, H-bonds; red, cation-π interactions). Strong electron density near the positively charged 1-methyl-piperazine group (Supplementary Figure S26) was interpreted as a SO 4 − ion. Boxed residue labels denote residues that have been identified as conferring RifR when substituted45, with red boxes denoting three residues (Msm RNAP β D432, H442, and S447, corresponding to Mtb RNAP β D441, H451, and S456) that comprise the majority of RifR substitutions identified from clinical isolates from tuberculosis patients. c. View into the Kang A binding pocket (wt-RNAP). Kang A is colored as in (A). The RNAP is shown as in (A) except RNAP side chains that interact with K-sugar but not Rif (R164, T424) are colored yellow, and R604, which makes a salt bridge with K-acid is colored violet. Residues that confer RifR when substituted are denoted by colored boxes as in (B). d View into the Kang A binding pocket (S447L-RNAP). Kang A and RNAP are shown as in (C). The RNAP β subunit segment from 447–450 is distorted, and the loop from 451 to 465 is disordered, resulting in the loss of Kang A/RNAP contacts with β residues L427, L447, L449, G450, and R456 Full size image

The Rif/Msm RNAP interactions were similar to those described in previous structures (Fig. 4b)45,46,47,48. The Rif/Msm RNAP structure reveals a set of cation-π interactions that have not been noted previously. The conjugated double-bond system comprising C16–C19 of the PK backbone of Rif is approached from the RNAP side by the guanidino group of R445 in a geometry indicative of a cation-π interaction49. The opposite face of the conjugated double-bond system is approached by the guanidino group of R604. We call this arrangement a cation-π sandwich (Supplementary Figure 26C). As expected, the Rif scaffold of Kang A binds in nearly the identical pocket and pose as Rif, and the interactions between the RNAP β subunit residues and the Kang A/Rif scaffold are nearly identical to Rif (Fig. 4c), including the cation-π sandwich (Supplementary Figure 26C and D).

Structural basis of Kang inhibition of RifR RNAP

In addition to nearly identical interactions between the RNAP β subunit and the PK backbone of either Rif or Kang A, the chemical moieties unique to Kang A establish new interactions (Fig. 4b, c and Fig. 5). The K-sugar interacts with two β subunit residues that do not contact Rif, R164 and T424. These residues correspond to Mtb/Eco RNAP β R173/R143 and T433/S508, respectively. To our knowledge, neither of these residues has ever been identified as conferring RifR when substituted45.

Fig. 5 Kang A contacts with wild-type and RifR RNAP. Schematic summary of the Kang A/RNAP β subunit contacts. Residues that make only nonpolar contacts are shown as labels with arcs denoting the contacts. The side chains (or main chain atoms for F430) of residues that make polar contacts are shown in stick format (H-bonds, gray dashed lines; cation-π interactions, red dashed lines). The color-coding of residues/residue labels is as follows: residues that contact the Rif scaffold in the Rif/RNAP structure, cyan; residues that also make nonpolar contacts with K-sugar, yellow arc; residues that contact K-sugar but do not contact the Rif scaffold, yellow. R604 (colored violet) makes a cation-π interaction with the Rif scaffold but also makes a salt bridge with K-acid. Residues that confer RifR when substituted are denoted by colored boxes as in (A). Residues that lose contacts with Kang A in the RifR S447L RNAP mutant are denoted by red background shading Full size image

The K-acid also establishes an interaction with RNAP that does not occur with Rif, a salt bridge (4.4 Å) with the guanidino group of β R604 (Fig. 4b, c). We believe this interaction stabilizes Kang A binding in two ways, first by forming a favorable salt bridge between the negatively charged K-acid and the positively charged R604, but in addition the salt bridge rigidifies the side chain of R604, which may stabilize the cation-π interaction with the Kang A PK backbone (Supplementary Figure 26D).

We propose that the additional interactions with RNAP contributed by the unique Kang moieties (K-sugar and K-acid) stabilize the binding of the Kangs sufficiently to overcome the loss of interactions caused by the S447L substitution, leading to an IC 90 for the Kangs against this RifR RNAP that is at least two orders of magnitude lower than Rif (Fig. 3e). To test this hypothesis, we determined the structure of the RifR S447L RNAP in complex with Kang A and compared it to the structures of the wild-type enzyme bound to Rif and to Kang A. The structure was obtained similarly as described for the wild-type enzyme and was refined to 3.45 Å (Fig. 4d, Supplementary Table 5).

In the wild-type RNAP, S447(OG) forms a H-bond with Rif/Kang A(O2) (Fig. 4b, c) and this favorable interaction is lost with the S447L substitution (Fig. 4d and Fig. 5). In addition, substitution of the Ser by the bulkier, branched Leu residue has complex effects on the Rif binding pocket;48 the path of the polypeptide backbone is altered at L447 to accommodate the bulky substitution (Supplementary Figure 27), and as a consequence the β-subunit loop from residues 451-465 becomes disordered and nearby parts of the antibiotic binding pocket rearrange, resulting in the loss of nonpolar contacts between Kang A and L427, G450, L449, and R456. These structural changes do not affect other RNAP/antibiotic contacts including, importantly, contacts with K-sugar and K-acid (Fig. 4d and Fig. 5).

Binding of Rif to the wild-type RNAP results in a buried surface area of 2,880 Å2, while the binding of Kang A buries 3330 Å2. The additional chemical moieties of Kang A (K-sugar and K-acid) contribute about 450 Å2 of extra interaction area over Rif, and about 75% of that is contributed by the K-sugar. The binding of Kang A to the S447L RNAP results in a reduced buried surface area of 2940 Å2, a loss of 390 Å2 compared with Kang A/wild-type RNAP. Thus, the loss of 390 Å2 of buried surface area with Kang A due to the S447L substitution is more than compensated by the 450 Å2 of buried surface area gained from the K-sugar and K-acid interactions, supporting our hypothesis.

Structural basis for the Kang mechanism of action

Rif inhibits RNAP function by blocking RNA translocation and extension after formation of the first or second phosphodiester bond41,44,45, resulting in inhibition of full-length transcript production along with an increase of abortive products (Fig. 3e). By contrast, the Kangs inhibit the production of both full-length and abortive products (Fig. 3e), indicating that the Kangs inhibit transcription at a step earlier than Rif.

We probed promoter DNA binding and loading of the template strand DNA into the RNAP active site, steps of transcription initiation preceding Rif inhibition, using DNase I footprinting (Supplementary Figure 28A) and RNAP active site directed Fe2+-mediated hydroxyl-radical cleavage (Supplementary Figure 28B). The results show that neither Rif nor Kang A significantly affect these steps, as observed previously for Rif41.

We next investigated substrate binding and phosphodiester bond formation. We modeled the positions of the first two nucleotide substrates occupying the i and i + 1 sites (the 5′- and 3′-initiating nucleotides, respectively) in an initiating complex by superimposing the structure of a T. thermophilus RNAP de novo initiation complex (4Q4Z)50 onto the Msm RNAP/Rif and Kang A structures (Fig. 6a, b). Rif did not clash sterically with the DNA or the NTP substrates, consistent with findings that Rif has only very small effects on the K m for initiating substrate41. The Rif piperazine moiety approaches the γ-phosphate of the modeled i site nucleotide (iNTP), and because the Rif piperazine N4 is positively charged and is poised within 3.6 Å from the closest oxygen in the modeled (iNTP) γ-phosphate, this interaction would not disfavor iNTP binding.

Fig. 6 Structural basis for Kang A inhibition of iNTP binding. a View of the RNAP active site from the T. thermophilus de novo initiation complex (4Q4Z)50 with bound Rif superimposed. Shown is the t-strand DNA from +1 to −5 (dark gray), the initiating NTP substrates (i site NTP, ATP; i + 1 NTP, CMPCPP; blue carbon atoms) and two Mg2+-ions (yellow spheres; Mg2+I is the Mg2+-ion chelated in the RNAP active site, Mg2+II is bound to the i + 1 NTP). Rif is color-coded as in Fig. 4b. Rif and the NTPs are also shown with transparent van der Waals surfaces. The blue “+” denotes the positive charge of the Rif piperazine moiety, while the red “−” denotes the negative charge of the iNTP γ-phosphate. b Same as (A) but showing Kang A (colored as in Fig. 4c). The negative charge of K-acid is brought in close proximity to the negative charge of the iNTP γ-phosphate. c Sequence of AP3-GU promoter template used in in vitro abortive initiation assays monitoring the effect of Kang A or Rif on RNA dinucleotide synthesis with GTP, GDP, or GMP as the 5′-initiating nucleotide. The initial transcribed sequence of the Mtb AP3 promoter (top) was engineered to allow only RNA dinucleotide synthesis (5′-GU-3’) in the presence of GTP, GDP, or GMP as the 5′-initiating nucleotide and UTP. The mutated bases are denoted in bold italic (AP3-GU, bottom). d Kang A or Rif inhibition of in vitro abortive initiation of RNA dinucleotide synthesis using the AP3-GU promoter template; (top) 1 mM GTP + 50 μM α-32P-UTP; (middle) 2 mM GDP + 50 μM α-32P-UTP; (bottom) 4 mM GMP + 50 μM α-32P-UTP. e Plotted is the RNA dinucleotide synthesis with Kang A relative to the same condition with Rif, normalized by the results with no antibiotic. Kang A has a strong inhibitory effect with GTP as the 5′-initiating nucleotide (blue bars), a weaker effect with GDP (red bars), and no inhibitory effect with GMP (green bars). The error bars denote standard error of four measurements Full size image

In the modeled de novo initiation complex with Kang A, the pose of Kang A positioned the negatively charged carboxylic group of the K-acid very close (2.5 Å between the closest oxygen of each group) to the negatively charged iNTP γ-phosphate (Fig. 6b), suggesting that Kang A may increase the K m of the iNTP by Coulombic repulsion. To test this hypothesis, we took advantage of RNAPs ability to efficiently initiate de novo with an NDP (β-phosphate 6.5 Å from K-acid) or an NMP (α-phosphate 8.0 Å from K-acid) as the 5′-initiating substrate (K m iNTP ~K m iNDP ~1 mM; K m iNMP ~5 mM)41,51. To monitor only RNA dinucleotide synthesis, we used a mutant duplex Mtb AP3 promoter template (AP3-GU; Fig. 6c) in which the initial transcribed sequence was engineered to ensure only RNA dinucleotide synthesis, either pppGpU, ppGpU,or pGpU, in the presence of α-P32-UTP (0.3 μM) and either 1 mM GTP, 2 mM GDP, or 4 mM GMP (Fig. 6d). As expected, Rif has an inhibitory effect on dinucleotide synthesis (Supplementary Figure 28C)41,43. However, relative to Rif, Kang A has a strong inhibitory effect on RNA dinucleotide synthesis when GTP serves as the 5′-initiating nucleotide, a weaker inhibitory effect with GDP, and no inhibitory effect with GMP (Fig. 6e). These results strongly support the hypothesis that Kang A interferes with binding of the iNTP substrate via Coulombic repulsion between the K-acid and the iNTP γ-phosphate (Fig. 6).

Note that this mechanism for Kang A inhibition of initial phosphodiester bond formation does not preclude inhibition of RNA chain elongation by steric occlusion, the mechanism of action for Rif41,44,45. Maximal inhibition of pppGpU synthesis (at 1 μM antibiotic) by Kang A is about 75% (Fig. 6d), while inhibition of full-length transcripts in the run-off assay at the same Kang A concentration is essentially 100% (Fig. 3e), indicating that Kang A inhibits RNA chain synthesis via two mechanisms, inhibition of initial phosphodiester bond formation by interfering with binding of the iNTP substrate, and blocking RNA chain elongation subsequent to formation of the first phosphodiester bond, the latter mechanism being in common with Rif.