a, Schematics of the CMD1 and LHCSR3 transgene expression constructs used for complementation of the cmd1 strain. The paromomycin resistance marker (AphVIII) was used for selection of transgenic clones. The HSP70A–RBCS2 fusion promoter (HSRB) drives transgene expression. An HA epitope added to the C terminus of CMD1 allows detection of the fusion protein. b, Western blot analysis of CMD1–HA expression in wild-type cells, cmd1 cells and cmd1 cells complemented with wild-type CMD1–HA (WT-1 and -2) or mutant CMD1–HA (HD-1 and -2) as indicated above. Anti-HA antibody was used for detection. Detection with anti-α-tubulin provided a sample processing control. Wild-type and cmd1 strains without the CMD1-HA transgene served as negative controls. Representative results from two independent experiments. c, Western blot analysis of LHCSR3 protein in wild-type cells, cmd1 cells and cmd1 cells complemented with CMD1-HA or LHCSR3 as indicated above. Detection with anti-α-tubulin provided a sample processing control. Representative results from two independent experiments. For source data for b, c, see Supplementary Fig. 1. d, Erlenmeyer flasks containing cells as indicated growing photo-autotrophically after 16 h of exposure to high light (750 μmol photons m−2 s−1). Representative photographs from three independent experiments. e, Determination of the effect of 5mC and 5gmC on transcription in C. reinhardtii using a luciferase reporter assay. Luciferase reporters driven by promoters (HSRB or LHCSR3) containing unmodified cytosine, 5mC or 5gmC, prepared by M.SssI treatment or further treated by CMD1, were transformed into C. reinhardtii. The cells were collected at different time points for measurement of luciferase activity. The mock sample was transformed with an empty vector. The luciferase activity was normalized to the corresponding chlorophyll fluorescence and then compared to the value of the mock control, which is set to 1. Data are mean ± s.e.m. of two independent biological replicates (shapes). f, Schematic diagram of TET-BS sequencing analysis. In conventional bisulfite sequencing, C, 5fC and 5caC but not 5mC or 5hmC are converted into U by bisulfite treatment, which is read as T in PCR and sequencing. However, 5gmC is read as C, which is thus indistinguishable from 5mC or 5hmC. By TET treatment, both 5mC and 5hmC are oxidized into 5caC, which is then read as T in subsequent bisulfite sequencing. Therefore, only 5gmC (orange lollipop) in the starting DNA sample is read as C (blank lollipop, lower right) in TET-BS sequencing. g, Establishment of TET-BS assay to distinguish 5gmC from all other forms. A lambda DNA fragment was used to test the feasibility of the assay. After methylation with M.SssI enzyme, all CpG sites are resistant to deamination and thus read as C in BS-seq. 5gmCs, which exist only in the CMD1-treated 5mC–λDNA, are detected as C because they are non-convertible in TET-BS treatment. Each circle represents a CpG site. Representative results from two independent experiments. h, BS-seq and TET-BS-seq analysis of the HSRB promoter used in the luciferase assay. Upon nuclear transformation of the cytosine-modified DNA, a substantial portion of 5gmC underwent conversion to C (reduced from 84.2% to 70.8%) and the high 5mC level remained. Notably, individual 5gmCs at neighbouring Cs on the same DNA template appear to behave differently. Although the mechanism of conversion is not clear, 5gmC might be lost slowly over time through DNA repair or an alternative demethylation process. Representative results from two independent experiments. i, ChIP analysis of the interaction of CMD1–HA with the 5′ genomic region of LHCSR3.1. The different regions of DNA fragments precipitated with anti-HA antibodies were amplified by qPCR. The region amplified by primer pair 3 (chromosome_8: 1947066–1947226) exhibited the strongest interaction with CMD1–HA. The enrichment relative to IgG was normalized to that of cmd1 cells, which was set as 1. Data are mean ± s.e.m. of two independent biological replicates (shapes).