A signature event for organoids Human cancer genomes harbor cryptic mutational signatures that represent the cumulative effects of DNA damage and defects in DNA repair processes. Knowledge of how specific signatures originate could have a major impact on cancer diagnosis and prevention. One approach to address this question is to reproduce the signatures in experimental systems by genetic engineering and then match the signatures to those found in naturally occurring cancers. Drost et al. used CRISPR-Cas9 to delete certain DNA repair enzymes from human colon organoids. In a proof-of-concept study, they show that deficiency in base excision repair is responsible for a mutational signature previously identified in cancer genome sequencing projects. Science, this issue p. 234

Abstract Mutational processes underlie cancer initiation and progression. Signatures of these processes in cancer genomes may explain cancer etiology and could hold diagnostic and prognostic value. We developed a strategy that can be used to explore the origin of cancer-associated mutational signatures. We used CRISPR-Cas9 technology to delete key DNA repair genes in human colon organoids, followed by delayed subcloning and whole-genome sequencing. We found that mutation accumulation in organoids deficient in the mismatch repair gene MLH1 is driven by replication errors and accurately models the mutation profiles observed in mismatch repair–deficient colorectal cancers. Application of this strategy to the cancer predisposition gene NTHL1, which encodes a base excision repair protein, revealed a mutational footprint (signature 30) previously observed in a breast cancer cohort. We show that signature 30 can arise from germline NTHL1 mutations.

Cancer arises through the sequential accumulation of mutations in somatic cells. The overall mutational burden of somatic cells is determined by a balance between DNA damage and repair activity. Mutational processes leave specific signatures, as defined by systematic analysis of mutation characteristics across many independent cancer sequencing data sets (1). A recent study showed that a set of mutational signatures in genome-wide mutation collections of breast cancers predicts deficiency of BRCA1 and BRCA2, as well as sensitivity to PARP [poly(adenosine diphosphate–ribose) polymerase] inhibition (2). To date, the main approach to link specific mutational signatures to underlying molecular mechanisms has involved associating cancer mutations with defined exposures to carcinogens, such as tobacco smoke (3) and ionizing radiation (4), or with the absence of specific DNA repair proteins, such as components of the DNA mismatch repair (MMR) (1) and nucleotide excision repair (NER) pathways (5). However, multiple processes are simultaneously active in tumors with highly unstable genomes, which makes it difficult to causally link the presence of specific mutational signatures to DNA repair deficiency.

Organoid technology allows for the long-term in vitro expansion of epithelial tissues, starting from a single adult stem cell. Such organoids remain genetically stable over long periods of time (6). We have previously shown that clonal organoid cultures can be used for in-depth pattern analysis of mutations that accumulate throughout life in tissue-specific adult stem cells (7, 8). Moreover, organoids can be readily modified using CRISPR-Cas9 genome editing (9). Mammalian species differ greatly in their DNA repair capacity and will thus respond differently to mutagenic stress (10, 11), and mutation types and numbers will vary over a lifetime (7). We used CRISPR-Cas9 genome editing in human intestinal organoid cultures to systematically decipher the mutational consequences of DNA repair deficiency.

To establish a tool for dissecting mutational signatures in human organoid cultures, we introduced loss-of-function mutations in the MMR gene MutL homolog 1 (MLH1). Inactivating mutations in MMR genes, including MLH1, predispose people to colorectal cancer (12, 13). These tumors are characterized by an immense mutational load, such as base substitutions and small insertions and deletions (INDELs) at short repeat sequences in the genome, referred to as microsatellite instability (MSI) (13). We used organoids derived from human normal colon epithelium, because this allowed us to study the effect of disruption of single DNA repair genes in an otherwise normal genetic background. Moreover, human colonic organoids are as close as possible to the cell-of-origin of colorectal cancer (14). To inactivate MLH1 in normal human colonic organoids, we used CRISPR-Cas9 technology to insert a puromycin-resistance cassette into the second exon of the MLH1 gene (Fig. 1A and fig. S1A). After puromycin selection, we clonally expanded single organoids and subsequently genotyped these to confirm correct biallelic targeting (figs. S1, A and B, and S2A). Gene inactivation was verified by quantitative reverse transcription polymerase chain reaction (qRT-PCR), revealing a substantial reduction in MLH1 mRNA expression in MLH1 knockout (MLH1KO) organoids, most likely due to the degradation (through nonsense-mediated decay) of nonsense mRNAs transcribed from the mutant alleles (Fig. 1C). Western blot analysis confirmed the loss of MLH1 protein expression in the MLH1KO organoids (Fig. 1E).

Fig. 1 Generation of DNA repair gene knock-outs in human intestinal stem cell cultures. Targeting strategy for the generation of MLH1 (A) and NTHL1 (B) knockout organoids using CRISPR-Cas9 genome editing. sgRNA, single guide RNA. (C) qRT-PCR for MLH1 in normal and MLH1KO organoids. Expression was normalized to GAPDH. Mean and SD (error bars) of n = 3 independent experiments are indicated. (D) Same as in (C), but for NTHL1. (E) Western blot analysis of MLH1 expression in normal and MLH1KO organoids (representative from n = 3). Tubulin was used as a loading control. The asterisk indicates a background band. (F) Same as in (E), but for NTHL1. Glyceraldehyde-3-phosphate dehydrogenase (GAPDH) was used as a loading control.

Next, we passaged the MLH1KO and parental normal human colon organoids for 2 months to allow cells to accumulate sufficient mutations required for downstream analyses (fig. S1C). We then used flow cytometry to establish subclonal cultures of single cells (fig. S1C) and expanded these until sufficient DNA could be obtained (7). Both clonal and subclonal cultures were subjected to whole-genome sequencing (WGS) analyses to identify the mutations that accumulated between the two clonal expansion steps and to correlate their appearance with time (Fig. 2A). As expected, MLH1KO organoids showed an increased base substitution load (27.7 ± 4.9 mutations per genome per day) compared with normal organoids (3.8 ± 1.2 mutations per genome per day), as well as a change in type of base substitutions (increase in C>T transitions). Similarly, the MLH1KO organoids displayed an increased number of INDELs (Fig. 2A), which were predominantly single–base-pair deletions (Fig. 2B) at mononucleotide repeats (Fig. 2C). This confirmed that deletion of MLH1 is sufficient to generate the mutator phenotype observed in MMR-deficient tumors. As expected, we did not observe an increase in structural variations.

Fig. 2 Mutational burden in DNA repair–deficient human colonic stem cells. (A) Number of mutations accumulated in the absence of the indicated DNA repair proteins per day. Base substitutions subdivided by mutation type and INDELs are shown. (B) Size distribution of the observed INDELs per genotype. A negative value indicates deletions and a positive value indicates insertions. (C) Number of INDELs located in simple repeats per genotype. Indicated are the number of repetitive subunits surrounding an inserted or deleted subunit. A value of 0 indicates that the INDEL is not located within a simple repeat.

Somatic mutations are unevenly distributed throughout the genomes of most cancers (15, 16) and normal adult stem cells (7). In general, base substitutions are more frequent in heterochromatic and late-replicating regions of the genome. It has been reported that specific DNA repair activity, including MMR (17) and NER (18), underlies this variation in regional mutation rates, as tumors lacking essential components of either of these DNA repair pathways show a less biased genomic distribution. To directly test this, we performed a genome-wide analysis of replication timing (Repli-seq) (19) on human intestinal organoids. We defined early, intermediate, and late replicating genomic regions and determined the relationship between somatic mutational load and replication timing. As observed in most cancers (16), in vitro–accumulated mutations in normal organoids were more frequent in late-replicating DNA (Fig. 3A). This bias was no longer present in MLH1KO cells, in line with the notion that MMR might be more active in euchromatic early-replicating regions of the genome, thereby creating variation in regional mutation rates (17). However, when we compared the absolute mutational loads in the genomic regions with different replication timing, we found that all regions (not only euchromatic early-replicating regions) showed increased mutation numbers (Fig. 3A). Thus, additional mutagenic processes might be active in the MLH1KO cells that increase mutational load throughout the genome, thereby removing the bias in genomic distribution.

Fig. 3 Nonrandom genomic distribution of base substitutions in DNA repair–deficient organoids. (A) Shown for each genotype are enrichment and depletion of base substitutions in the genomic regions that are replicated at the indicated stages during S phase of the cell cycle. Asterisks indicate a significant enrichment or depletion (P < 0.05, one-sided binomial test). (B) Relative levels of each base substitution type in the leading and lagging DNA strands are shown for each genotype. Asterisks indicate a significant difference (P < 0.05, two-sided Poisson test).

By removing part of the newly synthesized strand, MMR is able to repairs errors—such as base-base mismatches and insertion and/or deletion loops at simple repeats—that are introduced by DNA replication polymerases (20). MLH1KO cells are characterized by increased levels of both base substitutions and INDELs at simple repeats (Fig. 2). An imbalance in mutation incorporation can arise between the leading and lagging strands during DNA replication, because different DNA polymerases are used with distinct fidelities and proofreading capacities. Additionally, the lagging strand is exposed longer as single-stranded DNA, which is chemically less stable than double-stranded DNA and may thus be more vulnerable to accumulate damage (21). To define leading and lagging strands to test for replication strand asymmetry, we determined the location of replication origins using the Repli-seq data. We observed a significant replication strand asymmetry in MLH1KO cells for C>A, C>G, C>T, and T>C substitutions (P < 0.05, two-sided Poisson test) that was not observed in normal cells (Fig. 3B). MSI colorectal tumors, typically resulting from damage to the MMR system, show similar replicative asymmetry (21). Without functional MMR, the DNA polymerase errors cannot be resolved. Therefore, the additional mutagenic process that is active throughout the entire genome of MLH1KO cells may represent stochastic mistakes by DNA replication polymerases, which were previously suggested to be important in mutation accumulation in human cancers (22).

We next extracted mutational signatures (fig. S3) and compared them to those described in the Catalogue of Somatic Mutations in Cancer (COSMIC) database, using cosine similarity as a measure of closeness (1, 23). As we previously reported (7), normal organoids typically show a contribution of a mutational signature that resembles COSMIC signature 18 (cosine similarity = 0.870), which is characterized by C>A transversions. Recently, a similar signature was described to be associated with 8-oxoguanine mutagenesis (24). In contrast, MLH1KO organoids were characterized by the predominant occurrence of a signature that resembles COSMIC signature 20 (cosine similarity = 0.792), suggesting that this signature reflects mistakes made by polymerases during normal DNA replication. As human intestinal stem cells divide approximately once every day (fig. S4), we estimate that ~25 replication errors occur per cell division, which are largely resolved by MMR in normal cells (Fig. 4A). Next, we determined the cosine similarity between the mutational profile of each sample and each COSMIC signature, which reflects how well the mutational profile of a sample can be explained by each signature individually. We included genome-wide mutation data obtained by analysis of individual cancer cells (expanded as clonal organoids before sequencing) from colon cancers derived from three different patients (materials and methods). One of these cancers is driven by MLH1 promoter hypermethylation and therefore lacks MMR activity, whereas the other two cancers are MMR proficient. Mutation accumulation in our MLH1KO organoids closely resembled that of MMR-deficient cancer cells and was markedly different from MMR-proficient cancer cells (Fig. 4B).

Fig. 4 Signatures of mutational processes in DNA repair–deficient organoids. (A) (Left) Mutational spectra of all base substitutions observed for each genotype. Different mutation types and the direct sequence context are indicated. (Right) Number of mutations per genome per day that can be explained by the indicated mutational signatures for each genotype. (B) Heat map showing the cosine similarity scores for each indicated sample and COSMIC signature. The samples have been clustered according to the similarity score with each signature. The signatures have been ordered according to their similarity, such that very similar signatures cluster together. Arrows indicate signatures that have been associated with deficiency in DNA MMR in pan-cancer analyses (1). MSI, microsatellite instability. (C) Mutations that have been introduced or identified in NTHL1. The blue diamond indicates the site where the selection marker was introduced by gene targeting in the organoids. The red diamond denotes the nonsense germline mutation that was identified in a patient with breast cancer (PD13297a). LOH, loss of heterozygosity.

As our organoid-targeting strategy allowed us to define a very clean mutational signature for MMR deficiency, we next set out to determine signatures caused by deficiency in base excision repair (BER), the role of which in generating cancer-causing mutations is less well established than that of MMR deficiency. To this end, we inactivated the BER gene NTHL1, which encodes a DNA glycosylase that is involved in the removal of oxidized pyrimidines through initiation of the BER pathway (25, 26) (Fig. 1B and figs. S1A and S2B). Germline homozygous mutations in NTHL1 were recently shown to cause adenomatous polyposis and colorectal cancer (27). Upon gene targeting, we confirmed the loss of NTHL1 expression by qRT-PCR analysis (Fig. 1D) and Western blotting (Fig. 1F). Because we expected a lower mutation frequency with BER deficiency as compared with MMR deficiency (27), we cultured the NTHL1KO clones for 3 months and subsequently generated subclonal cultures and used WGS to analyze the mutations that specifically accumulated between the two clonal expansion steps (fig. S1C). As expected, the base substitution accumulation rate in NTHL1KO cells was approximately one-fourth of that in MLH1KO cells but approximately two times higher than in normal cells (Fig. 2A). Loss of NTHL1 did not result in increased numbers of INDELs (Fig. 2A) or structural variation.

In contrast to MLH1KO cells, NTHL1KO cells retained a nonrandom distribution of mutations throughout the genome, as observed for normal cells (Fig. 3A). Signature 30, characterized by C>T transitions, was the main contributor to the mutation spectrum observed in NTHL1KO cells (Fig. 4A and fig. S3C). Of note, somatic mutation analysis in the NTHL1KO clones did not reveal any nonsynonymous or stop-gain mutations in other DNA repair genes (table S1), supporting the notion that the observed change in mutation accumulation can be solely attributed to NTHL1 deficiency. In agreement with the observations on the NTHL1KO clones (Fig. 2A), it was reported previously that colorectal cancers from individuals with biallelic NTHL1 germline mutations predominantly show C>T transitions in their exomes (27, 28). Signature 30 has previously been identified in one patient with breast cancer (PD13297a) analyzed by WGS (29). We examined tumor and germline sequences of this patient and identified a germline nonsense mutation in NTHL1 (Fig. 4C), compounded by loss of heterozygosity in the tumor. This observation further corroborates the link between NTHL1 deficiency and signature 30. Mutation accumulation in our NTHL1KO organoids closely resembled that in PD13297a (Fig. 4B).

We have shown that mutational signatures can be dissected by characterizing the genomic landscapes of genetically modified human organoid subclones. Engineered MLH1KO organoids validated our approach, as they accurately model the predominant mutation profiles observed in MMR-deficient colorectal cancers. We subsequently used our approach to demonstrate that a high contribution of signature 30 mutations within a tumor can be indicative of cancer-predisposing germline mutations in the base excision repair gene NTHL1. A similar strategy could be exploited to investigate the consequences of other BER gene deficiencies. Although signature 30 has been previously identified in one patient of a large breast cancer cohort, it may be indicative of predisposition to a broader range of cancer types. NTHL1 germline mutations appear to predispose people to multiple cancer types, including colorectal and breast cancer (27, 28). The strategy we have described can be used to study the mutational consequences of DNA repair knockouts or mutagen exposure, to systematically dissect mutational signatures and potentially unveil their molecular origins.

Supplementary Materials www.sciencemag.org/content/358/6360/234/suppl/DC1 Materials and Methods Figs. S1 to S4 Tables S1 and S2 References (30–35)