Chemical space

A multi-dimensional conceptual region defined by a set of descriptors. For example, 'drug-like' chemical space (defined by limiting the space to molecules with a molecular mass <500 Da, fewer than 30 C, H, N, O or S atoms and fewer than 4 rings) has been estimated to be as large as 1063 molecules.

Continuous directed evolution

A directed-evolution method that resembles natural evolution, in which the hereditability of fitness is passed onto subsequent generations without manual intervention. As success in directed evolution depends on the number of rounds completed, removing manual steps can dramatically increase the speed of each round, the number of rounds that can be completed and hence the complexity of evolutionary changes that could be driven.

De novo computational design

The design of compounds based purely on the protein structure, through the computational docking of fragments into an active site and their computational growth using feasible in silico chemical steps to increase calculated binding affinity.

Design–make–test–analyse cycles

(DMTA cycles). The repetitive central process in lead optimization, involving a cycle of four steps: design (a hypothesis is constructed to improve the profile of the lead molecule); make (compounds exemplifying the design are synthesized); test (synthesized compounds of confirmed structure and purity are tested in one or more carefully constructed and controlled assays); and analyse (the experimental data are analysed and the results are used to amend a design hypothesis for the next cycle).

Directed evolution processes

Methods that mimic the processes of evolution but are directed towards a user-defined goal.

DNA-encoded libraries

Very large mixtures of molecules generated using a split-and-pool approach and used for ultra-high-throughput screening. Each synthesized molecule is covalently bound to a DNA fragment, which records the synthetic steps that have been taken to create the small molecule. An immobilized protein target is used to select binders from a pool of DNA-tagged molecules. The structure of the binders is deduced by sequencing the appended DNA tag.

DNA shuffling and recombination

A way to propagate beneficial mutations by recombining DNA segments from several gene sequences or gene pools from a directed evolution experiment.

DNA-templated synthesis

A process in which the DNA heteroduplex is used to bring two complementary DNA fragments bearing different reacting molecules into close proximity, increasing the reaction rate by several orders of magnitude. The synthesis of a chemical library is not just encoded in a sequence-dependent manner, but can be used to direct the order of chemical reactions.

Dynamic combinatorial libraries

Collections of molecules formed from reversible reactions of reagents under thermodynamic control. All species are interconverting at equilibrium. In the presence of a binding protein that binds one or more molecules, the equilibrium is shifted, and the system becomes enriched with the binding moieties.

Evolutionary algorithms

A subset of machine-learning algorithms inspired by biological evolution. Candidate solutions are individuals in a population, and a fitness function defines their quality and acts as a selection. Successful features from individuals are mutated and/or recombined to form the next generation of individuals, for further selection based on the fitness function.

Fragment-based drug design

An approach by which small, weakly binding chemical fragments (typically with a molecular mass of 100–200 Da) that bind to a protein target are identified and optimized to higher-affinity leads, usually guided by structural information on the fragment–target interaction from techniques such as X-ray crystallography.

mRNA display

An in vitro ribosome translation system for peptides and proteins. mRNA display uses the antibiotic puromycin, which causes premature chain termination on the ribosome. The cDNA is transcribed into mRNA libraries, and the 3′-end of each mRNA is coupled via a spacer oligonucleotide to puromycin. The oligonucleotide spacers allow effective translation and termination. The attached puromycin can react with the growing peptide chain, forming a covalent link between the peptide and its encoding mRNA, making the genotype–phenotype link. Selection is made based on the affinity of the peptide or protein with its attached coding mRNA for an immobilized target.

Non-ribosomal biosynthetic pathways

Pathways that biosynthesize the cores of many natural products based on peptides and polyketides. These involve large modular enzyme complexes known as non-ribosomal peptide synthetases and polyketide synthases.

Phage display

An in vivo translation system that uses bacteriophage to maintain the link between translated peptides or proteins and the DNA that encodes them. cDNA for the protein or peptide of interest is inserted into the phage coat protein gene, and phage progeny in Escherichia coli 'display' the target protein on its surface, attached to the coat protein. Selection is achieved by affinity for an immobilized target. After elution of binders, affinity maturation is achieved by further rounds of amplification, which introduces further variability in the selected DNA sequences. The amino acid sequence of the optimized binder can be deduced by sequencing the coding DNA of the selected phage.

Pharmacophores

The steric and electronic features in a ligand that result in the optimal molecular interactions of the ligand with a specific biological target, typically modulating a biological response.

Retrotransposon

A genetic element that can amplify itself in a genome via a 'copy–paste' mechanism involving reverse transcription into RNA and translation back into DNA, which can then be inserted at various positions in the genome. Retrotransposons are common components of eukaryotic cells.

Ribosome display

An in vitro translation system for peptides and proteins. The initial cDNA library is fused to a spacer sequence lacking a stop codon. The cDNA is transcribed to mRNA.The mRNA is translated to protein on the ribosome, but the lack of stop codon prevents release factors binding and disassembling the translational complex. Therefore, the spacer sequence remains attached to the tRNA and bound to the ribosome, with the peptide chain protruding, allowing folding. The resulting complex of RNA, ribosome and protein can be selected by the affinity of the protruding protein for its ligand, and sequencing of the mRNA enables the identification of the protein sequence of the bound proteins.

Site saturation mutagenesis

A method by which one or more codons can be randomized to produce all possible amino acids at chosen positions within the DNA.

Split-and-mix solid phase synthesis

A method for the synthesis of large combinatorial compound libraries. A solid-phase-supported reagent is split equally, and each portion is reacted with a different reagent. After washing, the individual portions are recombined and mixed. Subsequent rounds of splitting, reaction and recombination generate a final library of xn compounds, where x is the number of starting portions and n is the number of rounds.

Structure–activity relationships