Biologists have long sought to understand how a fertilized egg can form an organism composed of hundreds of specialized cell types, each expressing a defined set of genes. Cellular identity is now accepted to be the result of the expression of specific combinations of genes (Fig. 1). This expression pattern must be established and maintained—two distinct, but connected, processes. The pluripotency of the initial cell and the establishment of cell types depend to a large extent on the coordinated deployment of hundreds of transcription factors that bind to specific DNA sequences to activate or repress the transcription of cell lineage genes1. This establishment phase corresponds most closely to what is generally cited as the first definition of epigenetics by Conrad Waddington, namely the study of the mechanisms by which the genotype produces the phenotype in the context of development2. The maintenance phase often involves a plethora of non-DNA sequence specific chromatin cofactors that set up and maintain chromatin states through cell division and for extended periods of time—sometimes in the absence of the initial transcription factors3. This phase is more akin to a definition of epigenetics put forward by Nanney4, then elaborated on by Riggs and Holliday5,6,7 and further modified by Bird8 and others9 to mean the inheritance of alternative chromatin states in the absence of changes in the DNA sequence. DNA methylation was proposed early on as a carrier of epigenetic information with subsequent work revealing that chromatin proteins and noncoding RNAs are also important for this process10,11,12,13,14. For example, histone variants and histone modifications can influence local chromatin structure, either directly or indirectly. Such modifications can be heritable but reversible and are governed by a series of writers (that deposit them), readers (to interpret them) and erasers (to remove them). Finally, higher-order 3D chromosome folding is also thought to modulate gene expression and might contribute to inheritance15.

Fig. 1: Epigenetic mechanisms that maintain cell identities during development and throughout life. Starting from the zygotic genome, stage- and cell-type-specific transcription factors initiate regulatory cascades that induce cell differentiation. Epigenetic components (for example, Polycomb PRC1/2 and Trithorax group proteins) maintain the ‘off’ states of certain genes and the ‘on’ states of others, in a cell-type- and time-specific manner (the bottom panels show three genes, depicted schematically as chromatinized templates, in which transcription is triggered by specific transcription factors and silent or active states are maintained by PRC1/2 or Trithorax proteins, respectively). In doing so, they constitute barriers against accidental reprogramming that maintain developmental and physiological homeostasis. Altered epigenomes can lead to changes in programmed cell differentiation or, when accidental, to disease (bottom right). Germline reprogramming resets the majority (but not all) of the epigenome to achieve reproduction (top right). Full size image

Since 1942, when the word was first coined, epigenetics has been redefined multiple times16 (Table 1). In this Review, we use epigenetics to mean “the study of molecules and mechanisms that can perpetuate alternative gene activity states in the context of the same DNA sequence”. This operational definition has several implications. First, it encompasses transgenerational inheritance as well as mitotic inheritance and the persistence of gene activity or chromatin states through extended periods of time, even without cell division—for instance, in long-lived post-mitotic cells such as adult neurons. Second, the DNA sequence to be considered depends on the biological system. In mitotic inheritance, one should consider the genomic sequence of individual cells, whereas in transgenerational inheritance one should consider the DNA of the whole organism (including its microbiota, if this can contribute to inheritance). Finally, this definition explicitly extends the usage of ‘epigenetic’ to regulatory processes that involve molecules known to participate in epigenetic inheritance, even when not addressing the epigenetic memory function per se. We argue that this common practice should be accepted, as it conveys to non-specialists the broader field of epigenetic research. We also note that cases of inheritance that do not involve chromosomal components have been documented14,17,18 and it will be important to study how widespread they are and whether similar phenomena occur in humans.

Table 1 Summary of the history and definitions of epigenetics Full size table

Here, we review the interplay between regulatory plasticity and stable epigenetic heritability, including cell fate and reprogramming events that occur during development, in response to physiological stimuli, and in disease. We discuss how noncoding RNAs, DNA methylation, heterochromatin, Polycomb and Trithorax proteins and 3D genome architecture (Box 1) can regulate both inheritance and gene expression plasticity, and how new technologies allow these phenomena to be analysed in a spatiotemporal fashion, in small numbers of cells or even single cells, and at multiple scales from the nucleotide to the chromosome (Box 2). We discuss evidence for a hotly debated topic—epigenetic inheritance across generations—particularly focusing on mammalian examples because of the potential biomedical implications. We also consider two other important new research areas: the potential influence of the environment and the effects of epigenetic changes on genome integrity. In closing, we highlight how epigenetic research may benefit human health.

Box 1Major carriers of epigenetic information Heterochromatin components Pericentric heterochromatin contains a large number of proteins, but its most distinctive feature is the presence of megabase-sized repetitive DNA domains coated in a specific histone H3K9 trimethylation mark, which is deposited by the enzymes SUV39 and SETDB1. This mark is bound by the chromo domain of SUV39H1, which stimulates catalytic activity of the enzyme152. Furthermore, the same mark is bound by the HP1 protein, which can bridge adjacent nucleosomes153. Therefore, heterochromatin components can both write and read the H3K9me3 mark and compact their target chromatin. Heterochromatin factors also collaborate with RNAi in plants, yeast and some animals to convey epigenetic inheritance. Polycomb proteins Early genetic studies classified Polycomb (PcG) and Trithorax into two antagonistic groups that maintain the memory of spatial patterns of expression of homeotic genes throughout development. These complexes also have key roles in the maintenance of developmentally or environmentally programmed expression states, such as X-chromosome inactivation or cold-induced vernalization in plants3. PcG proteins are found in two main classes of complex—PRC2 and PRC1—that are responsible for deposition of the H3K27me3 and H2AK119Ub marks via EZH2 and RING1A/1B, respectively3. PcG proteins can be recruited to specific regions of the genome by DNA-binding proteins or noncoding RNAs3. PRC2 complexes contain a writer, the histone methyltransferase enzyme EZH2 (or its less efficient paralogue EZH1), and a reader, the EED subunit. Similar to HP1, CBX subunits of PRC1 complexes contain a chromodomain that specifically recognizes H3K27me3. Finally, another PRC1 subunit, PHC1-3, can oligomerize and induce 3D clustering in nuclear foci in vivo3. Noncoding RNAs Noncoding RNAs (ncRNAs) belong to several classes, and neither their production nor their functions can be generalized. Many ncRNAs, such as microRNAs, regulate post-transcriptional processes, whereas others are involved in transcriptional regulation. Short noncoding RNAs, such as short interfering RNAs (siRNAs) and PIWI-interacting RNAs (piRNAs), are shorter than 30 nucleotides, whereas long noncoding RNAs (lncRNAs) vary in size (up to more than 100 kilobases). The best characterized of these is probably the X-inactive specific transcript (Xist)154. Many short ncRNAs act within or outside chromatin, and some, for example siRNAs and tRNA fragments, can diffuse extracellularly14, whereas many nuclear lncRNAs are chromatin-associated. Enhancer RNAs can activate genes155, but most short and long ncRNAs are repressive, act via chromatin (H3K9me3, Polycomb) or DNA methylation154,156, and can induce epigenetic memory by building self-enforcing loops with heterochromatin or the RNAi machinery. They are also involved in the regulation of higher-order chromatin architecture. DNA methylation The mechanisms that allow DNA methylation to be copied during DNA replication represent one of the best-understood epigenetic systems, and involve specific proteins that recognize CpG hemi-methylated DNA and thereby redeposit DNA methylation on newly replicated DNA. DNA methylation is maintained by the DNA methyltransferase DNMT1 and its partner UHRF1 (also known as NP95), which specifically binds hemimethylated DNA and stimulates DNMT1 via its ubiquitin ligase activity (Fig. 2). Therefore, as recently reviewed in detail157, a single complex contains both the ‘writer’ and the ‘reader’ of the epigenetic methyl CpG mark, and both moieties are essential for the maintenance of DNA methylation.