Mechanism of promoter recognition To start transcription, RNA polymerase II is recruited by the general transcription factor IID (TFIID) to the DNA promoter. Patel et al. used a combination of experimental approaches to elucidate the full molecular architecture of human TFIID and its complete conformational landscape during promoter recognition. They suggest exactly how TFIID is loaded onto the promoter, which involves defined steps—including promoter recognition and transcription initiation—and leads to regulated gene expression. Science, this issue p. eaau8872

Structured Abstract INTRODUCTION In eukaryotes, transcription initiation starts with the assembly of the transcription preinitiation complex (PIC) onto promoter DNA. The PIC comprises the general transcription factors and RNA polymerase II (Pol II). The general transcription factor IID (TFIID) is responsible for initially recognizing the core promoter. Human TFIID is a trilobed (lobes A, B, and C) complex composed of TATA box binding–protein (TBP) and 13 evolutionarily conserved TBP-associated factors (TAF1 to TAF13), with six TAFs present in two copies. Together, TBP and the TAF subunits of TFIID directly interact with promoter DNA with the assistance of TFIIA, forming a platform for the assembly of the rest of the PIC. RATIONALE A key challenge in understanding the molecular basis behind TFIID’s recognition of promoter DNA is the lack of a complete structural depiction of the complex. We used cryo–electron microscopy (cryo-EM) to describe the various biochemical and/or conformational states of the complex, thus providing information on both the structure and dynamics of TFIID and its interaction with promoter DNA. RESULTS We report the cryo-EM structure of TFIID with a resolution of 4.3 Å for lobe C, 4.5 Å for lobe B, and 9.8 Å for lobe A. Together with chemical cross-linking mass spectrometry and structure prediction, we generated a complete structural model of the evolutionarily conserved core of TFIID. TFIID is built on a dimeric scaffold of TAFs, containing at its center a TAF6 dimer in lobe C that connects to lobes A and B. Lobes A and B are both organized around TAF4, -5, -6, -9, -10, and -12 but include additional subunits that result in distinct function (see the figure). Lobe A, which contains TAF11 and TAF13 interacting with TBP, keeps TBP inhibited unless TFIID is promoter bound, at which point it loads TBP onto DNA. Lobe B contains TAF8, which extends to hold lobes B and C together rigidly. Lobe B positions TAF4 in place to stabilize upstream DNA binding and recruits TFIIA. Lobe C, in addition to TAF6 and TAF8, contains TAF1, -2, and -7, which bind the downstream core promoter sequences. Using computational sorting of cryo-EM images, we characterized the conformational landscape of apo-TFIID and TFIID in the presence of TFIIA and promoter DNA. Two major states for apo-TFIID (termed the canonical and extended states) were observed, and three additional states (termed the scanning, rearranged, and engaged states) were observed in the presence of TFIIA and core promoter DNA. Lobe A, which migrates 150 Å from its position near lobe C in the canonical state to near lobe B in the extended state, carries TBP in a repressed state that is only released in the context of promoter binding. Identification of distinct TFIID states allowed us to generate a mechanistic model for TFIID promoter binding (see the figure). We propose that TFIID first binds the downstream core promoter elements through TAF1 and TAF2. This binding and the flexible attachment of lobe A help position the upstream DNA in proximity to TBP. TBP then scans for a TATA box or its sequence variants. Engagement of upstream core promoter sequences by TBP is facilitated by TFIIA interacting with TAF4 and TAF12 within lobe B. When TBP finally binds the promoter, it releases from lobe A, opening the binding site for TFIIB, which can then recruit Pol II. The structure of TFIID also allowed us to deduce the position of various regulatory domains of TFIID involved in contacts with transcriptional activators and active chromatin marks that are responsible for recruiting and modulating TFIID function. CONCLUSION Our studies lead to a mechanistic model of how TFIID prevents TBP from nonspecifically engaging with DNA outside of gene promoters, thus preventing aberrant PIC assembly and erroneous transcription initiation. Our model also suggests how TFIID loads TBP onto TATA-less promoters and how activators and chromatin marks may direct TFIID recruitment and PIC assembly. Structure of human TFIID. The structure of apo-TFIID is shown in the canonical state and that of promoter-bound TFIID is depicted in the engaged state. Lobes A and B in TFIID share a similar architecture that contains histone-fold domains organized in a manner that resembles a histone octamer. The TAF6 subunit of TFIID dimerizes the core set of TAFs. The TAF8 subunit rigidly tethers together lobes B and C. Five states of TFIID were observed in the process of promoter binding, leading to a mechanistic model of TBP loading onto the promoter DNA.

Abstract The general transcription factor IID (TFIID) is a critical component of the eukaryotic transcription preinitiation complex (PIC) and is responsible for recognizing the core promoter DNA and initiating PIC assembly. We used cryo–electron microscopy, chemical cross-linking mass spectrometry, and biochemical reconstitution to determine the complete molecular architecture of TFIID and define the conformational landscape of TFIID in the process of TATA box–binding protein (TBP) loading onto promoter DNA. Our structural analysis revealed five structural states of TFIID in the presence of TFIIA and promoter DNA, showing that the initial binding of TFIID to the downstream promoter positions the upstream DNA and facilitates scanning of TBP for a TATA box and the subsequent engagement of the promoter. Our findings provide a mechanistic model for the specific loading of TBP by TFIID onto the promoter.

The regulation of transcription initiation is arguably the primary method by which the expression of genes is controlled. The transcription preinitiation complex (PIC) is responsible for the loading of RNA polymerase II (Pol II) onto DNA (1, 2). The assembly of the PIC begins with the recognition of the core promoter by transcription factor IID (TFIID), aided by TFIIA (3). The TATA box–binding protein (TBP), a component of TFIID, recruits TFIIB, which then loads the Pol II–TFIIF complex (4). Lastly, the addition of TFIIE and TFIIH facilitates the opening of the transcription bubble (5). Whereas the stepwise assembly of a TBP-based PIC has been well characterized structurally (6), the process by which TFIID loads TBP onto the promoter is not well understood.

TFIID is a ~1.3-MDa complex that contains, in addition to TBP, 13 TBP-associating factors (TAFs), with six of them (TAF4, -5, -6, -9, -10, -12) present in two copies (7–9) (fig. S1). At low resolution, human TFIID is composed of three lobes (lobes A, B, and C), with a fairly rigid connection between lobes B and C and with lobe A more flexibly attached to this “BC core” (10). In previous work we showed that in a promoter-bound complex (IIDAS, which we will refer to here as IIDA-SCP) containing TFIID, TFIIA, and the super core promoter (SCP) (11), the promoter elements downstream of the transcription start site (TSS) are recognized by TAF1 and TAF2 in lobe C, whereas TBP binds the TATA box upstream of the TSS with the aid of TFIIA and lobe B (12).

Here we present cryo–electron microscopy (cryo-EM) structures of human TFIID, alone and in various stages of promoter binding. Together with chemical cross-linking–mass spectrometry (CX-MS) data and biochemical reconstitution, we were able to determine the complete structure of TFIID and the functional conformational landscape of the complex. Our studies lead to a mechanistic model of TBP loading onto the promoter by TFIID and TFIIA and provide insights into how TFIID may engage chromatin, respond to transcriptional activators, and serve as a scaffold for PIC assembly.

Overall structure of TFIID The flexible nature of TFIID has long hampered a high-resolution structural description of the intact complex (10). In previous work, we showed how the distribution of positions of the flexibly attached lobe A shifts upon binding of promoter DNA and TFIIA (10). Lobe A in apo-TFIID exists in a bimodal but continuous distribution of states, with roughly equal occupancy of two distinct, major states referred to as the canonical and extended states. Whereas in the canonical state lobe A is near lobe C, in the extended state lobe A is between lobes B and C (Fig. 1A). The displacement of lobe A between these two states is ~100 Å. By sorting a large cryo-EM dataset of free TFIID into two predominant states, refining them independently, and then combining the refined regions, we were able to extend the resolution of the BC core to 4.5 Å (range of 4.2 to 6.5 Å) and to generate a three-dimensional (3D) reconstruction of lobe A at 9.5 Å (range of 8.5 to 15 Å) (Fig. 1B and figs. S2 and S3). We then used a combination of cryo-EM, CX-MS, and structure prediction to generate a complete model of the complex. Fig. 1 Cryo-EM structure of TFIID. (A) Cryo-EM reconstructions of TFIID, with the BC core in blue and lobe A in yellow (canonical state) and green (extended state). (B) Transparent cryo-EM map of TFIID in the canonical state with fitted cryo-EM maps from focused refinements of the BC core and lobe A in solid blue and yellow, respectively. (C to E) TFIID structural model in front (C), top (D), and side views (E). See also Movie 1. Compared with that of the IIDA-SCP structure (12), the density corresponding to the TAF1-TAF7 subcomplex within lobe C in apo-TFIID is poorly defined, indicating that this module is flexible in the unbound TFIID, but stabilized upon binding to promoter DNA (figs. S4 and S5). For the rest of lobe C, it was possible to dock into the density the model of the TAF6 HEAT repeat dimer, a segment from the C-terminal region of TAF8, and the TAF2 aminopeptidase-like domain (APD) from the previous IIDA-SCP structure (12), with adjustments and extensions made to fit the observed density (Fig. 1, C to E, and fig. S5). Within lobe B, we were able to fit a homology model of the WD40 domain of TAF5, the crystal structures of the TAF5 NTD2 domain and the histone-fold domain (HFD) heterodimers of TAF6-TAF9, TAF4-TAF12, and TAF8-TAF10, as well as to extend the models where additional densities were present in the cryo-EM map (Fig. 1, C and D, and fig. S5). The resulting atomic model for lobe B is consistent with our CX-MS data (fig. S6) and in agreement with previous biochemical studies (8, 13). To further validate our model, we heterologously coexpressed exclusively those segments of TAFs that we could directly model into the lobe B cryo-EM density, which comprised only 35% of the residues present in the full-length versions of the subunits (fig. S7). Three successive pulldowns using different affinity tags placed on TAF5, TAF4, and TAF8, followed by size exclusion chromatography, resulted in a pure, soluble complex containing stoichiometric amounts of all seven TAF fragments, supporting the formation of a stable complex from the components predicted by our structural model. All of the TAFs in lobe B, except for TAF8, have been proposed to exist in two copies within TFIID (8, 14), suggesting that a similar architecture could exist within the flexible lobe A. We used a computational strategy based on automated docking of different combinations of TFIID subunits into the lobe A cryo-EM density to generate a complete model of lobe A (fig. S5). The core of the structure is equivalent to lobe B, except for the replacement of TAF8 with TAF3 as the histone-fold partner of TAF10. Additionally, lobe A includes the TAF11-TAF13 HFD pair and TBP (Fig. 1, C and E). Our placement of TAF11-TAF13 adjacent to the TBP subunit is supported by the presence of chemical cross-links between TAF11 and TBP (fig. S6), as well as in vivo and in vitro studies showing that the HFDs of TAF11-TAF13 constitute the bridge between TBP and the rest of TFIID (15). Altogether, our structure defines the full architecture of human TFIID, revealing the complete evolutionarily conserved regions of all TAFs and TBP (fig. S1 and Movie 1). Movie 1. Structural models for the canonical and engaged states of human TFIID. The models are shown within the cryo-EM maps of the two estates aligned on the BC core (different regions of the maps were refined to different resolutions due to their different degrees of flexibility). Lobe A is shown only in the canonical state.

TFIID assembly around a dimeric subcomplex of TAFs Our structure of human TFIID shows that the complex assembles around a dimeric yet asymmetric arrangement of TAFs (fig. S6). Two copies of interacting TAF6 HEAT repeat domains are found at the center of the BC core, where they form a dimer with a 3 1 screw axis symmetry that bridges lobes B and C (Fig. 2A). The N-terminal HFDs of each copy of TAF6 are then separated between lobes A and B, and thus, TAF6, through the flexible connection between its HFD and HEAT repeat domain, tethers the entire complex together. This TAF6 connection is maintained throughout the various conformational states of TFIID (Fig. 2A). The HFD of TAF6 forms a heterodimer with the HFD of TAF9, which interacts with the WD40 and NTD2 regions of TAF5. The TAF6-TAF9 HFD pair then forms a tetramer with the TAF4-TAF12 HFD pair, and together these five subunits (TAF5, -6, -9, -4, -12) define the TAF subcomplex that is present in two copies within TFIID (Fig. 2B and figs. S7 and S8), one each in lobes A and B. The existence of a dimeric TAF-containing subcomplex has been previously proposed on the basis of in vivo knockdown and in vitro biochemical studies (8, 16). However, the structure within the native TFIID complex does not exhibit the symmetry previously proposed for a reconstituted subcomplex containing the same subunits, likely due to the presence of additional symmetry-breaking TAFs in the fully formed, native complex (8) (fig. S7). Fig. 2 Structural organization of human TFIID. (A) Domain organization of TAF6, with sequence conservation colored according to ConSurf (69) scores (top). Model of TFIID with the TAF6 dimer highlighted (bottom). The dimer of TAF6 HEAT repeats is centrally located within the complex. Dashed lines are shown connecting the TAF6 HEAT domains with their corresponding HFDs in lobes A and B. (B) Model of TFIID (center) and close-up views of lobe B (left) and lobe A (right). (C) Domain organization of TAF8, with sequence conservation colored according to ConSurf (69) scores (top). NLS, nuclear localization sequence. Model of the BC core of TFIID with TAF8 highlighted (bottom). (D) The 6iD (TAF6 interacting domain) of TAF8 bridges the WD40 domain of TAF5 in lobe B and the HEAT repeat of TAF6 in lobe C. (E) The 2iD (TAF2 interacting domain) of TAF8 bridges the HEAT repeat of TAF6 and the APD of TAF2 within lobe C. See also Movie 1. The two sets of TAFs (-4, -5, -6, -9, -12) shared between lobes A and B act as a base for the assembly of the rest of each lobe. In lobe B, a hexamer of HFDs is formed by the TAF8-TAF10, TAF6-TAF9, and TAF4-TAF12 HFD pairs. In lobe A, the TAF3-TAF10 and TAF11-TAF13 HFD pairs form an octamer-like structure with the TAF6-TAF9 and TAF4-TAF12 HFD pairs (Fig. 2B and Movie 1). Though the presence of higher-order histone-fold assemblies had been predicted to exist within TFIID, such a structure had not been visualized until now (fig. S8). It has been proposed that these nucleosome corelike structures may be involved in interaction with DNA and promoter binding (16–20). However, the surfaces of lobes A and B lack the large positively charged patches observed in the nucleosomal histone octamer (fig. S8). The TAF6-TAF9 HFD pair that was proposed to interact with the downstream DNA (20, 21) is actually located far from the DNA in the IIDA-SCP complex (fig. S8). We instead propose that HFDs serve as a structural scaffold within TFIID. The difference in the flexibility of lobes A and B is likely due to the presence of TAF8 in lobe B, which stabilizes its connection with lobe C (Fig. 2C). In our model, the highly conserved middle region of TAF8 (residues 130 to 235) snakes through the BC core, interacting extensively with TAF2 and TAF6. Extending from its N-terminal HFD, the TAF6 interacting domain (6iD) of TAF8 forms a bridge between the WD40 of TAF5 in lobe B and the first of the HEAT repeats of TAF6 (Fig. 2D). The long helix of the TAF2-interacting domain (2iD) of TAF8 then bridges the second TAF6 HEAT repeat and TAF2, and then TAF8 folds onto the surface of the TAF2 APD, effectively anchoring TAF2 to the rest of the complex. This network of interactions among TAF8, TAF6, and TAF2 (Fig. 2E) is consistent with previous biochemical studies (8, 13).

Role of lobe B in the stabilization of upstream DNA binding Our structural studies indicate that the function of lobe B is to stabilize the upstream DNA and bind TFIIA. Both of these functions involve the highly conserved C terminus of TAF4 (Fig. 3A). The HFD of TAF4, comprising helices α1 and α2, is followed by a large loop and a helix (α3) that interacts with the WD40 of TAF5 (Fig. 3B). Docking of the lobe B structure into the IIDA-SCP map reveals that the highly conserved loop between α3 and a fourth helix in TAF4 (α4) contacts the promoter DNA just downstream of the TATA box (Fig. 3, C and D, and fig. S4). This loop has previously been shown to bind DNA in vitro (20), and in TAF4−/− human fibroblast cells, stable expression of a TAF4 mutant lacking this loop results in the down-regulation of a subset of genes (22). From there, α4 continues toward the TBP-TFIIA density and is likely involved in TFIIA recruitment and the stabilization of the TFIIA-TBP-DNA module, in agreement with previous data (23) (Fig. 3D). The docking of lobe B into the IIDA-SCP map also revealed that the four-helix bundle of TFIIA likely contacts the first helix-turn-helix motif of the TAF12 HFD (Fig. 3D). Thus, we propose that TAF4 and TAF12 within lobe B act to promote the binding of TBP to the upstream DNA by directly contacting both the DNA and the TFIIA-TBP module. Therefore, the BC core of TFIID appears to act as a molecular ruler, placing TBP at a defined distance from the downstream promoter elements. This role suggests that maintaining a fairly rigid connection between lobes B and C is important for correctly positioning TBP with respect to the TSS, which in human core promoters are separated by ~30 base pairs (bp) (24, 25) (fig. S8). Fig. 3 Upstream promoter binding stabilized by lobe B. (A) Domain organization and sequence conservation of TAF4 according to ConSurf (69) scores. The first level shows the domain organization of TAF4. The second level zooms in on the C terminus and shows the secondary structure [solid outline corresponds to observed secondary structure and dashed outline to the predicted secondary structure based on PSIPRED (70) results (α4 is not visible in the apo-TFIID structure but becomes ordered upon interaction with the DNA)]. The third level shows the amino acid sequence of the loop between helices 3 and 4, which contain several conserved, positively charged residues that could be contacting the DNA. Single-letter abbreviations for the amino acid residues are as follows: A, Ala; C, Cys; D, Asp; E, Glu; F, Phe; G, Gly; H, His; I, Ile; K, Lys; L, Leu; M, Met; N, Asn; P, Pro; Q, Gln; R, Arg; S, Ser; T, Thr; V, Val; W, Trp; and Y, Tyr. (B) Structure of lobe B. (C) Model of TFIID docked into the IIDAS reconstruction. (D) Close-up view of part of (C), highlighting the loop between helices 3 and 4 as it contacts the DNA (circled in red), helix 4 continuing on toward the TFIIA and TBP (circled in green), and the interaction between the TFIIA and TAF12 (circled in blue). Our structure suggests a potential overlap between the contacts that TAF4 makes with the upstream promoter DNA in the IIDA-SCP complex and those established by the TFIIF winged-helix domain within the PIC (6, 26) (fig. S9). Additionally, the downstream promoter binding regions of TAF1 and TAF2 were also found to clash with Pol II in the closed PIC complex, and the path of the downstream promoter in the closed PIC is bent compared with the more linear path observed in the IIDA-SCP complex (12) (fig. S9). Thus, significant structural rearrangements in TFIID must occur during PIC assembly and transcription initiation, opening the question of whether TFIID can remain promoter bound throughout the transcription initiation cycle.

Proposed mechanism of TBP loading by TFIID and consequent PIC recruitment Superposition of the five conformational states of TFIID—canonical, extended, scanning, rearranged, and engaged—illustrates the range of motion TBP experiences with respect to the BC core during the steps leading to full promoter engagement (Fig. 5A and Movie 2). The distance that TBP travels between these states is approximately 130, 40, 30, and 50 Å, respectively, and follows a curved path that directs TBP toward the upstream DNA. Taken together, these structures suggest a stepwise mechanism of TBP loading onto the promoter and the consequent recruitment of the rest of the PIC. In the first step, TAF1-TAF7 and TAF2 in lobe C bind to downstream DNA. This initial DNA binding facilitates the positioning of the TATA box where it can be reached by TBP as it travels with the mobile lobe A, thus helping the upstream DNA outcompete the inhibitory TAND1 from the cleft of TBP. In the second step, TFIIA displaces TAND2 from TBP and likely stabilizes the upstream DNA through its interaction with lobe B. In this way, the rearranged state constrains the position of lobe A and facilitates TBP binding to the upstream DNA. In the third step, TBP fully engages the promoter DNA, bending it and simultaneously causing a steric clash between the DNA and TAF11 that results in the release of TBP from the rest of lobe A (Fig. 5B and Movie 2). Fig. 5 Mechanism of TBP loading by TFIID. (A) Cryo-EM reconstructions of the canonical, extended, rearranged, and engaged states of TFIID superimposed onto the BC core to show the range of motion of lobe A and TBP. The TAF1-TAF7 module is positioned according to the engaged state reconstruction, and the DNA models for both the engaged and rearranged states are shown. (B) Cartoon schematic for the process of TBP loading onto promoter DNA by TFIID, with subsequent PIC recruitment, assembly, and progression to the elongation complex. See also Movie 2. In the fourth step, TFIIB recognizes the fully engaged TBP-DNA complex and recruits with it Pol II-TFIIF. At this stage, the binding of the TFIIF winged-helix domain in Rap40 and Pol II would displace the TAF4 contact with upstream DNA and the interactions of lobe C with downstream core promoter sequences, respectively. This process could potentially result in the TAFs falling off of the PIC, unless the interaction between TFIIA and TAF4 was sufficient to keep TFIID bound or new contacts were to form between TFIID and the PIC at this stage of the assembly. Although a number of interactions have been reported between TFIID and other general transcription factors in vitro (34–37), it has been shown that upon the addition of Pol II-TFIIB-TFIIF, TFIID remains associated with the promoter only in the presence of activators (38, 39). In this potential scenario, TFIID may not remain as part of the growing PIC but could instead bind another TBP to enable formation of a new active complex once the previous complex clears the promoter (Fig. 5B). Additional experiments will be required to test this model and determine the precise role of TFIID in PIC assembly after TBP loading. Approximately 80% of eukaryotic promoters lack a canonical TATA box, yet loading of TBP is essential to initiate transcription for all protein genes (40). The mechanism of TBP loading by TFIID provides a way to promote TBP loading in the absence of a canonical TATA box and expands the potential for regulation through variation in the core promoter sequence. To structurally explore this concept, we assembled a promoter-bound complex by using a mutant SCP (mSCP) that lacked a consensus TATA sequence (ACTGCCGT replacing TATAAAAG). The resulting IIDA-mSCP complex was purified via a DNA-pulldown assay and resulted in a sample that still bound the promoter DNA but appeared trapped in the rearranged state with TBP constrained onto the promoter (fig. S11). We did not observe any complexes in the engaged state, consistent with previous DNase footprinting experiments that showed that TFIID is only able to weakly protect the TATA box by using purified components (10). However, both in vitro transcription assays containing nuclear extracts and in vivo reporter assays showed transcription from mSCP templates (11, 41). Those results would indicate that other factors not present in the DNase footprinting experiments, but present in the nuclear extract, must be aiding TBP in the absence of a consensus TATA. Factors such as transcriptional activators, chromatin marks, or other coactivator complexes could play an essential role in allowing transcription from TATA-less promoters by facilitating the transition from the rearranged to the engaged states and thus the full engagement of TBP onto DNA.

TFIID as a coactivator and chromatin reader In vivo TFIID recruitment to the core promoter is aided by gene-specific activators and chromatin marks. Promoters are enriched in certain posttranslational modifications of histones and in histone variants that distinguish them from the rest of the genome (42). Trimethylation of lysine 4 on histone H3 (H3K4me3) and acetylation of H3 and H4 are especially enriched on the +1 nucleosome (the first nucleosome downstream of the TSS), located ~50 bp downstream of the TSS (43–46). TFIID recognizes H3K4me3 through the plant homeodomain (PHD) of TAF3 and the diacetylated H4 via the TAF1 double bromodomain (DBD) (47–49). A model of the downstream promoter extended with a +1 nucleosome shows how these domains, which our studies indicate are flexibly tethered to the core of TFIID, would be oriented toward the +1 nucleosome in the canonical state of TFIID, suggesting a mechanism of TFIID recruitment by the modified +1 nucleosomes of activated genes (Fig. 6A). Fig. 6 Model of TFIID recruitment. (A) Model of TFIID bound to the promoter including a +1 nucleosome. The model is compatible with the binding of flexible histone tails of H3 and H4 to the PHD [PDB ID 2K17 (47)] of TAF3 and the bromodomain of BRD2 [PDB ID 2DVR (49)], a homolog of the DBD of TAF1, respectively. Dashed lines indicate the connections between domains contained in the models of TFIID or the nucleosome, with the flexible domains that bridge the two. Domain architecture maps of TAF1 and TAF3 showing the distance between the structured domains modeled within TFIID and the domains that contact chromatin. A cartoon model of TFIID binding to the +1 nucleosome is shown to the right. (B) Model of TFIID bound to the core promoter with bound activators at the upstream proximal promoter region. Activators are contacting the N terminus of TAF4 that contains activator interacting regions, like the glutamine-rich and TAFH domains. Domain maps of the highlighted TAFs illustrate the distance between the domains that were part of the TFIID model (solid) and those domains that were not observed (transparent). Distances between the conserved C terminus and the domains that contact activators (TAFH and glutamine-rich) are shown below the domain map. A cartoon model of TFIID binding to activators is shown on the right. Transcriptional activators determine cellular fate by directing the transcription of genes controlling development, differentiation, stimulus response, growth, and maintenance of homeostatic balance (50). Though many activators have been shown to interact with different TAFs, the strongest evidence has been shown for binding of activators through the conserved glutamine-rich and TAFH domains of TAF4 within its long and flexible N terminus (51–53). A model generated by extending the upstream DNA in the TFIID rearranged state shows how both copies of TAF4 are positioned toward the upstream proximal promoter [which is known to remain cleared of nucleosomes and act as a binding site for transcriptional activators (45)] so that they can interact with an activator via their flexible N-terminal domains. This model suggests that transcriptional activators may play a dual role in TFIID recruitment to the promoter, as well as in promoting TBP engagement by stabilizing the rearranged state of TFIID (Fig. 6B).

Implications for the structure and function of the SAGA transcription complex The insights into the structure and mechanism of TFIID also shed light onto the possible function of the large transcription factor SAGA, as the two complexes share a number of similar components (54) (fig. S12A). SAGA contains four main modules of different function: a TBP-loading TAF-containing module, a histone acetyltransferase module, a histone deubiquitinase module, and an activator binding TRRAP module (55). In humans, the SAGA TAF module contains TAF9, -10, and -12, which are shared with TFIID, as well as the SAGA-specific TAF5L and TAF6L, which are paralogs to TAF5 and TAF6 in TFIID. In addition, SAGA also contains TADA1, which substitutes for TAF4 in forming a histone fold pair with TAF12; SUPT7, which can form a histone fold pair with TAF10; and SUPT3H, which contains two HFDs homologous to those in TAF11 and -13. Therefore, SAGA contains homologous proteins for all TAFs that make up the dimeric core of TFIID, but whether these exist in two copies within SAGA has not been determined. Using a model of lobe A, we aligned the common SAGA components with those in TFIID and were able to show that within the structurally modeled regions of TFIID, the homologous SAGA subunits are highly conserved (fig. S12B). We were also able to dock the TFIID-derived lobe A model containing only the SAGA homologous regions into the cryo-EM map of the Pichia pastoris SAGA complex (56), revealing its potential location within the complex (fig. S12C). The TADA1 subunit of SAGA has a HFD similar to TAF4 but does not appear to retain the conserved C-terminal region that in TFIID interacts with DNA and TFIIA. The SUPT7L subunit of SAGA that could act as a replacement for TAF8 or TAF3 lacks strong sequence similarity to either of them outside of the HFD. The yeast ortholog of SUPT3H, Spt3, binds TBP but with much lower affinity than TAF11-TAF13, as demonstrated by the fact that TBP does not immunopurify with either human or yeast SAGA but can still bind TBP (54, 57). The presence of SUPT3H in SAGA suggests that a lobe A–like module may exist within the complex, but whether such a module is involved in delivering TBP to promoters in vivo remains unclear. Existing models suggest that the activator-binding components within SAGA bring it to the promoter to load TBP (58, 59).

Outlook Our studies provide a full structural description of human TFIID and its conformational landscape and how these relate to core promoter engagement. The model we propose for TBP loading is likely conserved in eukaryotes as those regions that play critical roles in the process of TBP loading are all highly conserved (TAF1 and TAF2 downstream binding regions; TAF1 TAND, and TAF4 C-terminal regions). Notably, although the regions responsible for contacting the downstream promoter motifs in human TAF1 and TAF2 appear to be conserved in yeast, downstream promoter elements have not been identified in yeast despite a wealth of genomic data. Thus, it is likely that sequence-specific recognition plays a lesser role in downstream promoter binding in yeast TFIID and that other factors, such as activators and chromatin marks, may play a more substantial role in positioning TFIID. Our structures shed light on how TBP is regulated within TFIID to prevent it from nonspecifically binding DNA and starting aberrant transcription events, while simultaneously providing an explanation for how TFIID is able to load TBP onto both TATA and TATA-less promoters. Our structures also suggest how activators and chromatin marks may be directing TFIID recruitment and PIC assembly. Further studies will be needed to dissect the effects that these regulatory factors have on the mechanism of TBP loading and the details of TFIID dynamic rearrangements during PIC assembly.

Methods and materials summary TFIID was immunopurified from HeLa cells as described previously (10). For CX-MS, 100 nM of TFIID was incubated with 150 nM TFIIA and 5 mM BS3 at room temperature for 2 hours and then quenched by the addition of 2.1 μM ammonium bicarbonate. The cross-linked proteins were precipitated with trichloroacetic acid and treated as described (60). Mass spectrometry and identification of BS3 cross-linked peptides was performed as described previously (60). For the cryo-EM sample preparation of apo-TFIID, TFIID was cross-linked on ice for 5 min using 0.01% glutaraldehyde, and then 4 μl were applied to a C-flat CF 2/2 holey carbon gird (Protochips) to which a thin continuous carbon film coated with polyethylenimine had been applied to improve orientation distribution. For cryo-EM sample preparation of the mixed IIDA-SCP sample, TFIIA and SCP DNA were added at ~1.2× molar excess to TFIID and incubated for 3 min on ice followed by 2 min at 37°C and finally cross-linked on ice using 0.05% glutaraldehyde for 5 min before grid preparation. For cryo-EM sample preparation of the IIDA-mSCP complex was done as described in (12) except that the promoter DNA contained a mutated TATA box, with the sequence TATAAAAG in the original SCP being replaced by ACTGCCGT. The grids for apo-TFIID and IIDA-mSCP were loaded into a Titan Low-base electron microscope (FEI) and those for mixed IIDA-SCP were loaded into a Titan Krios electron microscope (FEI), and both were operated at 300 keV of acceleration voltage and equipped with a K2 direct electron detector (Gatan). Collected movies were motion corrected using MotionCor2 (61), CTF fits were determined using Gctf (62), and particles were picked using Gautomatch (version 0.53, from K. Zhang, MRC-LMB, Cambridge). Data processing was performed using Relion (63, 64), model building was carried out with O (65) and Coot (66), and model refinement was done using Phenix (67). Depictions of molecular models were generated using PyMOL (The PyMOL Molecular Graphics System, version 1.8, Schrödinger) and the UCSF Chimera (68) package from the Computer Graphics Laboratory, University of California, San Francisco (supported by National Institutes of Health P41 RR-01081).

Supplementary Materials www.sciencemag.org/content/362/6421/eaau8872/suppl/DC1 Materials and Methods Figs. S1 to S12 Tables S1 to S3 References (71–95)

http://www.sciencemag.org/about/science-licenses-journal-article-reuse This is an article distributed under the terms of the Science Journals Default License.

Acknowledgments: We thank S. Zheng for TAF4 monoclonal antibody; D. King for providing TAF4 antibody antigen peptide; A. Iavarone for performing in-gel mass spectrometry data collection and analysis; S. Gradia and Berkeley Macrolab facility for 438 series plasmids; P. Grob, S. Howes, R. Zhang, and L.-A. Carlson for electron microscopy support; A. Chintangal and T. Houweling for computing support; C. Lopez and C. Yoshioka at the OSHU Cryo-EM Facility for help with collecting Krios data; and D. Herbst for discussion. We acknowledge the use of the LAWRENCIUM computing cluster at Lawrence Berkeley National Laboratory and the resources of the National Energy Research Scientific Computing Center, a Department of Energy Office of Science user facility supported by the Office of Science of the U.S. Department of Energy under contract DE-AC02-05CH11231. Funding: This work was funded through NIGMS grants R01-GM63072 to E.N., R01-GM053451 to S.H., P50-GM076547 to J.R., and NCI grant R21-CA175849 to J.R. A.B.P. and R.K.L. were supported by an NIGMS Molecular Biophysics Training Grant (GM008295). B.J.G. was supported by fellowships from the Swiss National Science Foundation (projects P300PA_160983 and P300PA_174355). E.N. is a Howard Hughes Medical Institute investigator. Author contributions: J.F. purified TFIID. R.K.L. and Y.L. reconstituted lobe B. A.B.P. prepared, collected, and processed the apo-TFIID sample. R.K.L. reprocessed the purified IIDA-SCP sample and prepared, collected, and processed the mixed IIDA-SCP and IIDA-mSCP samples. A.B.P. and B.J.G. built and refined the atomic coordinate model. S.G., J.L., J.R., and S.H. performed cross-linking mass spectrometry analysis of the IIDA sample. A.B.P., R.K.L., and E.N. analyzed data and wrote the paper. Competing interests: The authors declare no competing interests. Data and materials availability: The cryo-EM maps and refined coordinate models reported here have been deposited in the Electron Microscopy Data Bank with accession codes EMD-9298 (BC core), EMD-9299 (lobe B), EMD-9300 (lobe C), EMD-9302 (lobe A canonical), EMD-9301 (lobe A extended), EMD-9305 (apo-TFIID canonical), and EMD-9306 (IIDA-SCP) and in the Protein Data Bank with accession codes PDB-6MZC (BC core), PDB-6MZD (lobe A), PDB-6MZL (apo-TFIID canonical), and 6MZM (IIDA-SCP engaged).