All RCSB PDB activities are supported by robust infrastructure that ensures 24/7/365 support of millions of PDB data depositors and users worldwide. A full description of RCSB PDB services has been published, 6 along with various analyses of the impact of structural biologists, the PDB archive, and the RCSB PDB. 12 , 22 , 23

A separate website, PDB101.rcsb.org (“101,” as in an entry‐level course), hosts educational materials that encourage learning about proteins and nucleic acids in 3D. A main focus is the Molecule of the Month series, 21 currently in its 20th year of “telling molecular stories” about structure and function. Other materials include molecular origami paper models, posters, animations, and curricular materials. A “Guide to Understanding PDB Data” is built around more PDB‐specific information: PDB data, visualizing structures, reading coordinate files, and potential challenges (including biological assembly versus asymmetric unit).

Different visualization options are available. Protein Feature View offers graphical summaries of full‐length UniProt 16 protein sequences and how they correspond to PDB entries, together with annotations from external databases (such as Pfam), 17 homology model information, 18 , 19 predicted regions of protein disorder, and hydrophobic regions. Rapid 3D visualization of structures large and small is possible using the NGL viewer, 20 which includes specialized options for viewing ligand interactions and electron density maps. Many RCSB PDB features available on RCSB.org are also provided as Web Services supporting programmatic access by increasing numbers of users.

Since RCSB PDB users extend well beyond experts in structural biology, 11 , 12 our website features are designed to enable finding a variety of structures related to a particular topic using search tools (e.g., by sequence, sequence similarity, small molecule name). The website also offers alternatives to searching, such as the Browse by Annotation tool that organizes PDB structures into hierarchical trees based upon several different classifications, including Anatomical Therapeutic Chemical drug classification system developed by the World Health Organization Collaborating Centre for Drug Statistics Methodology ( http://www.whocc.no/atc_ddd_index/ ); protein residue modifications in the PDB archive using the protein modification ontology from the Proteomics Standards Initiative 13 ; and Biological Process, Cellular Component, and Molecular Function based upon descriptions from the Gene Ontology (GO) Consortium 14 mapped to corresponding PDB structures by the SIFTS initiative. 15

RCSB PDB serves millions of users worldwide, primarily through the web portal at RCSB.org . The website, as described in Nucleic Acids Research, 6 provides tools and services to access and explore PDB content. Each week, all PDB structural data are integrated with ∼40 external data resources to provide rich, up‐to‐date structural views of fundamental biology, biomedicine, and energy sciences. Data can be searched and explored through individual Structure Summary Pages, or as groups of structures displayed in tabular reports.

US PDB operations are the responsibility of the RCSB PDB (RCSB PDB; RCSB.org ) with financial support from the National Science Foundation, the National Institute of General Medical Sciences, the National Cancer Institute, the National Institute of Allergy and Infectious Disease, and Department of Energy. RCSB PDB team members are hosted by Rutgers, the State University of New Jersey, the San Diego Supercomputer Center at the University of California San Diego, and the University of California San Francisco. The RCSB PDB also serves as the global Archive Keeper, responsible for ensuring disaster recovery of PDB data and coordinating weekly archival updates among wwPDB partners in Europe and Asia.

The PDB archive is jointly managed by the Worldwide PDB partnership (wwPDB; wwPDB.org ), 4 consisting of the Research Collaboratory for Structural Bioinformatics (RCSB) PDB, 5 , 6 PDB Japan (PDBj), 7 PDB in Europe (PDBe), 8 and BioMagResBank. 9 The wwPDB operates under an international agreement ( wwpdb.org/about/agreement ). Adhering to the FAIR principles of findability, accessibility, interoperability, and reusability, 10 under management by the wwPDB partners, the single global archive of macromolecular data is disseminated to the scientific community without charge or restrictions on usage.

The Protein Data Bank (PDB) archive was established in 1971 as the first open‐access digital‐data resource in the biological sciences with seven protein structures. 1 , 2 Current PDB archival holdings encompass >155,000 atomic‐level structures of proteins, DNA, and RNA, experimentally determined by macromolecular X‐ray crystallography (MX: ∼90%), nuclear magnetic resonance spectroscopy (NMR: ∼9%), and three‐dimensional electron microscopy (3DEM: ∼1%). Nearly three quarters (∼73%) of PDB structures also include one or more ligands (e.g., enzyme cofactors and inhibitors, US Food and Drug Administration (FDA) approved drugs, and metals), and ∼10% of PDB structures include one or more carbohydrate components. Virtually, all of these public‐domain structures were determined with the support of research funding from governments or private philanthropies, and the PDB archive is now widely regarded as an international public good. Replacement value of current PDB archival holdings is conservatively estimated at >15 billion US dollars. 3

Growth in the complexity of PDB archival holdings 2000–2018. (a) Cumulative number of unique ligands maintained in the Chemical Component Dictionary each year. In 2018, 2,498 new entries were added. (b) Average molecular weight (solid purple line) and average number of polymer chains (solid orange line) of structures released each year. (c) Growth in available EM structure data, shown by annual accumulation of number of chains and molecular weight

The global structural biology community has also been depositing increasingly more complex structures into the PDB archive. Figure 2 a,b reflects the size and diversity of structures deposited to the PDB archive versus time. Growth in the number of distinct small‐molecule ligands represented in the PDB chemical component dictionary (CCD) 25 is illustrated in Figure 2 a (2,498 new ligands were added in 2018, corresponding to year‐on‐year growth of 7.7%). Entries in the PDB CCD include amino acids; nucleosides and nucleotides; carbohydrates; metals and other ions; crystallization and buffer solutes; enzyme cofactors, substrates, and products; prosthetic groups (e.g., heme); oligopeptides; small organic molecules; and pharmacologic agents. In parallel with the growth of the CCD, the average size of each PDB entry, as gauged by mean aggregate molecular weight of the biological assembly, is also growing (Figure 2 b). Not surprisingly, 3DEM has contributed disproportionately to the growth in the number of larger PDB structures since early 2014 (Figure 2 c).

Figure 1 illustrates the growth in the PDB archive since 2000. Atomic coordinates for >11,200 new structures together with experimental data/metadata (∼7.6% year‐on‐year growth) were made available in 2018. Most of these new structures were determined using MX (∼88.8%), with the remainder determined by 3DEM (∼7.6%), and NMR (∼3.5%). The number of 3DEM structures populating the archive has been growing rapidly since structural biologists ushered in the “resolution revolution” 24 (Figure 1 b). Starting in 2016, annual 3DEM structure depositions now exceed NMR structure depositions. Global data deposition statistics, maintained from 2000 onwards, are updated on a weekly basis ( http://www.wwpdb.org/stats/deposition ).

Structure‐guided drug discovery is a well‐established tool for large and small biopharmaceutical companies alike. 26 3D structural studies frequently aid in optimizing small‐molecule ligand affinity and selectivity for target proteins (e.g., vemurafenib approved for treatment of the 50% of late‐stage metastatic melanoma patients with the Val600➔Glu mutation that activates the BRAF protein kinase, PDB structure 3og7 27 ). A recent RCSB PDB analysis 23 documented that US FDA approval of 88% of 210 new molecular entities (NMEs or new drugs from 2010 to 2016) was facilitated by open access to ∼6,000 PDB structures containing the protein targeted by the NME and/or the new drug itself. More than half of these structures were described in the scientific literature and publicly released >10 years before final drug approval. Moreover, these structures were cited in a significant fraction of more than 2 million papers reporting publicly funded, precompetitive research on the drug targets that influenced drug company investment decisions, leading ultimately to the US FDA approvals and patient access to new life‐altering drugs. Finally, the impact of structural biologists and the PDB archive on US FDA new drug approvals was similar across all therapeutic areas.

4 INTEGRAL MEMBRANE PROTEIN STRUCTURES IN THE PDB

Contemporary successes enjoyed by structural biologists studying integral membrane proteins document that the PDB archive will represent an increasingly important source of precompetitive information supporting ongoing and future drug discovery campaigns directed at these challenging targets. More than 50% of the targets of current US FDA approved drugs are integral membrane proteins.28 The vast majority of these drug targets fall within four well‐studied protein families (G‐protein‐coupled receptors [GPCRs]: ∼30%; voltage‐gated ion channels [VGICs]: 8%; ligand‐gated ion channels [LGICs]: 7%; and transporters: 7%; examples of each are displayed in Figure 3). The following sections briefly review current PDB holdings and highlight opportunities for structure‐guided drug discovery for each of these major classes of target proteins.

Figure 3 Open in figure viewer PowerPoint 36 106 v 1.7 (green), beta1 and beta2 (blue), bound to inhibitor tetrodotoxin (yellow). Voltage‐sensing helices are shown in red. (c) LGIC (PDB 5uow) 72 85 Ribbon drawings of exemplar structures from each of the four classes of membrane‐bound proteins of pharmacologic interest, viewed parallel to the lipid bilayer (shaded grey rectangle). (a) GPCR (PDB 5vai): GLP1‐R (glucagon‐like peptide‐1 receptor, active conformation in green) bound to GLP1 (red) and heterotrimeric G‐protein (blue, yellow, and magenta). (b) VGIC (PDB 6j8i): Na1.7 (green), beta1 and beta2 (blue), bound to inhibitor tetrodotoxin (yellow). Voltage‐sensing helices are shown in red. (c) LGIC (PDB 5uow): NMDA receptor (blue, green, and red) bound to channel blocker MK‐801 (magenta). An antibody Fab (grey) was used in the structure determination. (d) Transporter (PDB 6o2p): CFTR (cystic fibrosis transmembrane conductance regulator (blue) bound to ivacaftor (yellow), which interacts with a long transmembrane helix involved in gating (red)

4.1 G‐protein‐coupled receptors PDB archival holdings of GPCRs at the time of writing are summarized in Table 1. The landmark structure of bovine rhodopsin (PDB 1f88) gave the first view of the class in 2000,29 and initially, GPCR structure depositions to the PDB were restricted to the Rhodopsin subfamily, many of them crystallized using lipidic mesophases.30 Progress in this arena was accelerated by advances in protein engineering of the beta‐adrenergic receptor, creating chimeras with entire proteins or smaller protein domains inserted into extramembranous loops (e.g., T4 phage lysozyme31) that facilitate crystal lattice formation without perturbing the structure of the 7‐transmembrane helix domain.32 Currently, more than 300 GPCR structures from four of the five GPCR subfamilies have been determined and deposited in the archive, including A‐rhodopsin, B1‐secretin, C‐glutamate, and F‐frizzled/taste 2 (but not B2‐adhesion).33 The vast majority of these structures were determined using MX (∼91%), with a small number coming from NMR (∼2%), and a growing number coming from 3DEM (∼7%). At present, the PDB archive contains structures for more than 60 unique GPCRs (representing examples or orthologs of ∼15% of the entire complement of more than 800 GPCRs encoded by the human genome). GPCR structures have been elucidated in both active and inactive conformational states, some including bound small‐molecule ligands or drugs, bound peptide/protein ligands, bound heterotrimeric G proteins, and in some cases stabilizing Fab fragments and/or camelid‐nanobodies.34 Structure‐guided drug discovery for GPCRs (particularly Class A members) using MX is currently being pursued within many of the large biopharmaceutical companies, targeting both receptors represented within the PDB and novel receptors, exclusive to one or more companies. Table 1. G‐protein‐coupled receptors in the PDB archive All Class A (rhodopsin) Class B1 (secretin) Class B2 (adhesion) Class C (glutamate) Class F (frizzled/taste 2) Structures 339 295 23 0 8 13 MX 311 278 (7.7–1.7 Å) 15 (3.3–1.9 Å) 0 6 (3.1–2.2 Å) 12 (3.9–2.4 Å) NMRa 6 6 0 0 0 0 3DEM (resolution) 22 11 (4.5–3.0 Å) 8 (4.1–3.0 Å) 0 2 (4.0 Å) 1 (3.8 Å) Unique receptors 62 52 6 0 2 2 The first 3DEM structure of a GPCR to become publicly available (PDB 5vai)36 revealed the structure of glucagon‐like peptide 1 (GLP1) analog being recognized by the GLP1 receptor (GLP1‐R: active conformation) that was embedded in a detergent micelle and bound to a G‐protein heterotrimer (Figure 3a). Underscoring the power of cryo‐EM to enable structural studies of large/complex and very challenging samples, this Class B1 (secretin) GPCR was visualized at the atomic level in the act of recognizing its peptide hormone ligand, while engaging with a G‐protein heterotrimer. GLP1‐R37 is the target of six oligopeptide agonists (exenatide [PDB 3c59, 3c9t],38 liraglutide [4apd],39 lixisenatide, albiglutide, dulaglutide, and semaglutide [4zgm]40) approved by the US FDA for treatment of type II diabetes mellitus. These biologic agents, the newest of which was approved in 2017, mimic endogenous GLP1 and slow gastric emptying/increase secretion of insulin by the patient's own pancreas in response to elevated blood glucose levels. The principal advantage of this treatment strategy versus older/cheaper small‐molecule insulin secretagogues is that it carries a lower risk of hypoglycemia. At present, there are no publicly available structures of any of the approved GLP1‐R agonists bound to full‐length GLP1‐R. With open access to PDB structure 5vai, detailed knowledge of how 5vai and related structures were determined, and recent acquisitions of state‐of‐the‐art 3DEM instrumentation by biopharmaceutical companies, the stage is now set for structure‐guided discovery of the next generation of GLP1‐R agonists with improved pharmacologic properties (i.e., longer half‐lives that will permit less frequent dosing and improve the likelihood of compliance). It also appears highly likely that 3DEM will shortly reveal one or more structures of Class B2 (Adhesion) GPCRs, some of which are drug discovery targets,41, 42 and all of which have thus far eluded 3D structure determination by any experimental method.

4.2 Voltage‐gated ion channels VGICs open and close ion‐selective pores in response to small changes in membrane potential, playing central roles in nerve signal transmission. They form a large superfamily that includes voltage‐gated sodium (Na v ), calcium (Ca v ), potassium (K v ), and other ion channels, encoded by at least 143 human genes,43, 44 making them the third largest family of signaling proteins after GPCRs and protein/lipid kinases. Rod MacKinnon's pioneering work on the homotetrameric potassium channel KcsA from S. lividans45 and A. pernix K v AP46 revealed the mechanistic bases for ion selectivity and gating at the atomic level, but structural information for Na v and Ca v channels, which lack the structural fourfold symmetry seen in KcsA, is relatively new to the archive. Voltage‐gated sodium channels give rise to the rapid action potentials that mediate nerve transmission, making them targets for natural and designed toxins, inhibitors, and drugs.47, 48 Many examples are found in nature, including the potent and exquisitely selective tetrodotoxin, a neurotoxin found in puffer fish (and other organisms) with a lethal dose being less than a milligram. Many anticonvulsants, antiarrhythmics, and local anesthetics, such as lamotrigine, flecainide, and lidocaine, also act by blocking these channels.49 The PDB currently contains >750 VGIC structures (Table 2). Table 2. Voltage‐gated ion channels available from the PDB archive Voltage‐gated ion channels Voltage‐gated potassium channel activity High voltage‐gated calcium channel activity Voltage‐gated proton channel activity NMDA glutamate receptor activity Voltage‐gated anion channel activity Voltage‐gated ion channel activity involved in regulation of postsynaptic membrane potential Voltage‐gated ion channel activity involved in regulation of presynaptic membrane potential Structures 756 235 237 8 135 31 21 17 MX (resolution) 494 (1.2–4.8 Å) 156 (1.2–4.8 Å) 132 (1.4–4.4 Å) 7 (1.4–3.5 Å) 94 (1.3–4.0 Å) 11 (1.6–4.1 Å) 16 (1.4–2.6 Å) 8 (1.9–3.0 Å) NMR 64 30 25 1 2 2 5 7 3DEM (resolution) 197 (2.9‐35 Å) 48 (2.9‐10 Å) 80 (3.0‐35 Å) 0 39 (4.5–16.5 Å) 18 (3.2–6.6 Å) 0 2 (3.0–3.8 Å) Hybrid 1 1 0 0 0 0 0 0 Unique channels 105 41 30 1 6 7 4 4 The human genome encodes nine voltage‐gated sodium channels (Na v , designated Na v 1.1 to Na v 1.9), which support a range of cellular and biological functions. Na v 1.1, Na v 1.2, Na v 1.3, and Na v 1.6 are expressed primarily in the central nervous system; Na v 1.4 is found in the skeletal muscle; Na v 1.5 is found in cardiac muscle; and Na v 1.7, Na v 1.8, and Na v 1.9 are typically found in the peripheral tissues. Much of the early structural work on Na v channels was performed using bacterial homologs, which have a simpler homotetrameric structure and proved relatively easy to express, purify, and crystallize. These first MX structures were reported in 2011 for A. butzleri Na v Ab (PDB 3rvy, 3rvz, and 3rw0).50 In contrast, the mammalian channel is composed of one long alpha chain, which forms four membrane‐spanning domains similar in arrangement to the four identical bacterial subunits. In addition, the alpha subunit associates with one or more copies of five beta‐subunits (beta1, beta1B, beta2, beta3, and beta4). 3DEM structures have been determined for human Na v 1.4/beta2 (PDB 6agf),51 Na v 1.2/beta2 with a conotoxin (PDB 6j8e),52 and Na v 1.7/beta1/beta2 with tetrodotoxin and saxotoxin (PDB 6j8i and others, Figure 3b).52 Drug discovery efforts have focused considerable resources on Na v 1.7.53 This work began in earnest following the 2004 discovery that a Na v 1.7 gain‐of‐function mutation causes persistent pain.54 In 2006, a loss‐of‐function mutation was identified in several Pakistani street performers, who show no sensitivity to pain while walking on hot coals, and so forth.55 Selectivity remains an elusive challenge in this arena. Pair‐wise sequence identities among the nine human Na v VGICs exceed 70%. To complicate matters further, multiple functional‐binding sites for both large and small molecules are present on the solvent‐accessible surfaces of these integral membrane proteins. Prior to the availability of 3DEM structures of Na v VGICs, much of the early drug design work was performed using homology models based on distantly related bacterial proteins. Today, medicinal chemists are sifting through multiple sites of action of natural toxins and poisons with the aim of finding druggable sites with potential for specificity, and then targeting them with small molecules, peptides, or antibodies. Notwithstanding insights from these new structures, serious challenges remain for structure‐guided drug discovery. Na v VGICs are conformationally dynamic, existing in multiple functional states (e.g., closed/resting, open, and closed/inactivated), each of which will need to be structurally characterized. Single‐particle 3DEM, however, offers a critical advantage versus MX in that multiple conformations of a macromolecular assembly can be accommodated via focused classification procedures56 to reveal multiple structural states on the EM grid.57

4.3 Ligand‐gated ion channels LGICs mediate transmission of signals across nerve synapses in response to binding of neurotransmitters. There are three major structural classes of these channels (Table 3): pentameric “Cys‐loop” receptors, ionotropic glutamate receptors (iGluRs), and P2X receptors.58 The pentameric Cys‐loop receptors include excitatory cation‐selective channels, such as the nicotinic acetylcholine receptors and inhibitory anion‐selective channels (e.g., the GABA A receptor). In 2005, Nigel Unwin's ground‐breaking EM structure of the nicotinic acetylcholine receptor from the marbled electric ray (PDB 2bg9) revealed at the atomic‐level both ligand‐binding subunits and channel geometry.59 A large collection of toxins, poisons, and drugs act through these pentameric receptors, including the two well‐known poisons curare and strychnine60; anesthetics and alcohol61; benzodiazopine antidepressants62; and the antiparasitic agent ivermectin.63 Table 3. Ligand‐gated ion channels in the PDB archive Alla Cyclic nucleotide‐gated ion channel activity Extracellular ligand‐gated ion channel activity Intracellular ligand‐gated ion channel activity Ligand‐gated anion channel activity Ligand‐gated cation channel activity Ligand‐gated ion channel activity involved in regulation of presynaptic membrane potential Structures 968 38 647 241 88 865 159 MX (resolution) 685 (1.15–7.4 Å) 26 (1.65–3.28 Å) 506 (1.15–4.79 Å) 122 (1.21–7.4 Å) 59 (1.55–3.8 Å) 612 (1.15–7.4 Å) 115 (1.24–3.96 Å) NMR 57 5 29 18 9 47 2 3DEM (resolution) 223 (2.94‐50 Å) 4 (3.4–3.51 Å) 112 (2.95‐50 Å) 98 (2.94–8.5 Å) 20 (3.04–6.6 Å) 203 (2.94‐50 Å) 45 (3.8–16.5 Å) Electron crystallography (resolution) 3 (3.54–3.8 Å) 3 (3.54–3.8 Å) 0 3 (3.54–3.8 Å) 0 3 (3.54–3.8 Å) 0 Unique genes 84 8 45 24 11 69 5 iGluRs fall into four main classes, based on their small‐molecule‐binding properties: AMPA receptors (GluA1‐4), kainate receptors (GluK1‐5), NMDA (N‐methyl‐D‐aspartate) receptors (GluN1, GluN2A‐D, and GluN3A‐B), and delta receptors (GluD1‐2).64, 65 These polypeptide chains can form both homo‐ and heterotetramers, and associate with a variety of modulatory auxiliary subunits. They are modular in structure. An N‐terminal domain (homologous to bacterial periplasmic‐binding proteins) mediates dimerization between subunits of the same iGluR class. The C‐terminal portion contains the extracellular agonist‐binding domain, which consists of two polypetide chain segments separated by the portion that forms the membrane‐spanning ion channel pore. Structures of extracellular fragments of iGluR proved instrumental in characterizing some functionally important properties of these channels.66 Beginning in 2009 with publication of the MX structure of GluA2 AMPA receptor (PDB 3kg2),67 work in this area has moved rapidly. Today, multiple 3DEM structures of iGluR and their complexes with ligands, toxins, and accessory proteins are also publicly available.68, 69 As seen for the VGICs, the iGluRs display multiple sites for binding of toxins and poisons, and many of these LGICs are currently the focus of structure‐guided drug discovery efforts (see the 2019 special issue of ACS Med. Chem. Lett. on Allosteric Modulation of iGluR).70 For example, memantine, an NMDA receptor channel blocker, has been approved for treatment of moderate‐to‐severe Alzheimer's patients.71 A 3DEM structure of the heterotrimeric GluN1/GluN2A/GluN2B NMDA receptor with a similar agent (MK‐801, PDB 5uow,72 Figure 3c) revealed the ligand‐binding site within a vestibule of the ion channel. Preclinical characterization of MK‐801 underscores both the promise and the challenges posed by targeting these receptors. Neuroprotection was observed in animal models of stroke, traumatic brain injury, and Parkinsonism, accompanied by side effects of induced psychotic behavior and neuronal degeneration. A subsequently determined 3.6 Å resolution MX structure of an N‐terminal truncated form of the receptor enabled molecular dynamics simulations of MK‐801 and memantine binding (PDB 5un1),73 further advancing structure‐guided drug design efforts aimed at improving side‐effect profiles.