Isolation of zebrafish germline cells and low-coverage WGBS

To obtain germline cells from zebrafish we used the transgenic line Tg(vasa:EGFP)25. The reporter gene for this line contains the promoter region of vasa, an RNA binding protein component of the germplasm and well-described germline marker26. As such, vasa:EGFP protein is expressed in oocytes and segregated with PGCs during embryogenesis. At 24 hpf, when PGC migration is finished, we found a compact cluster of cells between the yolk ball and yolk extension in the gonadal region (Fig. 1a–d). Given there are few germline cells per individual fish at this developmental stage, ten fish were pooled for each replicate, dissociated with trypsin and prepared for cell sorting. The EGFP +ve cells were isolated with fluorescence-activated cell sorting (FACS) and accounted for approximately 0.01% or less of all cells analysed (Fig. 1e). The gating strategy is exemplified in Supplementary Fig. 1. This percentage is similar to values previously reported for teleost species (0.02–0.04%)27. To determine the purity of the population isolated, sorted cells were visualised under an inverted fluorescent microscope. The proportion of EGFP +ve cells ranged from 93.8 to 100% and resembled PGCs in terms of size and shape (Supplementary Fig. 2).

Fig. 1 Isolation and quantitation of DNA methylation in the zebrafish germline. a–d Fluorescence microscopy of tg(vasa:EGFP) zebrafish embryos and larvae. 1.5 h post-fertilisation (hpf) (a), 24 hpf (b), 48 hpf (c–d). 1.8× view of EGFP +ve labelled cells is shown inset (dashed lines). Scale bars are 500 µm. Forward scatter height (FSC-H) e Flow cytometry plot of 10 zebrafish larvae at 48 hpf. The red dashed square indicates the EGFP +ve population gated for isolation. Blue dots indicate discrete data points (i.e., cellular events), whereas green, yellow and red colouring indicate increasing data density. f Percentage of methylation in CG context from 1 to 28 days post-fertilisation (dpf) in both vasa:EGFP +ve germline cells (green line) and control cells (black line). For each sample type and timepoint, n = 3 independent biological replicates were used, except for 28 dpf vasa:EGFP +ve, which has n = 5 independent biological replicates Full size image

In order to maximise the number of samples tested, we used a low coverage whole genome bisulfite sequencing (WGBS) pipeline to uncover genome-wide methylation levels28,29. To be sure we sampled sufficiently, the number of CG calls required to accurately predict global methylation was undertaken using empirical bootstrap sampling30 of previously published datasets15. As expected, we found that increasing the number of CG calls reduces the margin of error for global methylation (Supplementary Fig. 3A). However, beyond a certain threshold, we found increasing the number of CG calls had a minimal effect reducing the margin of error. An asymptotic model described by the equations y = 1.207/√x and y = 2.109/√x, for sperm and muscle respectively, was used to fit a curve to the data. At our minimum sequencing depth of 10,000 CG calls, bootstrap sampling predicts a margin of error (99% confidence interval) of approximately ±1.2–2.1 methylation percentage points (Supplementary Fig. 3B).

Zebrafish germline preserves global DNA methylation

In mice, epigenetic reprogramming of PGCs occurs in two sequential steps, the first during PGC expansion and migration to hindgut endoderm, the second upon entry of PGCs into the gonads4. In marsupials, epigenetic reprogramming occurs postnatally when PGCs have finished migration to the gonad6. To capture the full spectrum of these reprogramming windows we measured methylation from 24 hpf until gonadal transformation, 25–28 dpf. At 24 hpf, EGFP +ve cells were slightly more methylated than control cells, however, both showed high levels of CG methylation (84.06% and 81.41%, respectively). Thus, in stark contrast to mice which experience a massive loss in CG methylation, zebrafish PGCs have preserved global CG methylation upon arrival in the gonad.

Next, we measured methylation levels through gonadal primordium (2–11 dpf), ‘juvenile ovary’ (11–21 dpf) and early gonad transformation (25–28 dpf) stages. Genome-wide erasure of DNA methylation was not present at any of the time points assessed (Fig. 1f). Average levels of 5-methylcytosine (5-mC) in the CG context were 78.42% and 76.08%, respectively, for EGFP +ve and control cells. Detailed sequencing results are provided in Supplementary Data 1.

DNA methylation levels during gonad transformation

Mature germ cells in zebrafish possess sex-specific methylation programmes. In sperm, nearly 95% of CG dinucleotides are methylated, while oocytes are 75% methylated14,15. To explore the onset of this differentiation, we analysed germline methylation from gonad transformation until early gametogenesis. From 25 to 55 dpf, zebrafish gonads undergo an ovary-to-testis transformation in males or further ovarian maturation in females19,31. The vasa protein is expressed in male and female germline stem cells32, yet the intensity of vasa:EGFP expression is correlated with the number of oocytes and can be used to distinguish presumptive females from males33. Accordingly, prior to sexual differentiation at 21 dpf, embryos retained a ‘juvenile ovary’ with low levels of fluorescence detected. At later stages, presumptive male gonads retained low fluorescence whereas female gonads displayed intense fluorescence (Fig. 2a–f).

Fig. 2 Fluorescence microscopy of germline cells and their methylation during gonad transformation. a–f Phenotypic sex in zebrafish can be identified using vasa:EGFP expression: during the ‘juvenile ovary’ stage, expression of EGFP is low but consistent between individuals (a, b). Later, expression of EGFP vastly increases in presumptive females (c, d) relative to presumptive males (e, f). This enables sex phenotyping in early stages of sexual differentiation. Scale bars are 500 µm. g, h Methylation in the vasa:EGFP +ve germline cells of female (g) and male (h) fish from the gonad transformation stage until sexual maturity (35–70 dpf). Non-germline control cells were also tested (black lines and dots). For each sample type and timepoint, n = 3 independent biological replicates were used Full size image

We isolated germline cells from individual females and males at four time points during gonad transformation (35, 40, 45, 50 dpf). Despite the high expression of vasa:EGFP in mature oocytes, cell filtering prior to FACS restricted cell size to 40 µm. Thus, we were able to collect germinal stem cells (GSCs), oogonia, and stage IA and early IB oocytes34. For males, vasa:EGFP expression decreases as germ cells progress through gametogenesis33, meaning just GSCs and cells during early male gametogenesis were assessed. The male and female germline cells we tested were not globally differentially methylated in any of the stages evaluated. The average levels of methylation in CG context for EGFP +ve cells were 78.9% and 79.06%, respectively, for presumptive females and males. Furthermore, global differences in methylation were not found in sexually mature individuals (70 dpf) with average 5-mC levels of 75.64% for females and 77.51% for males. This suggests that hypermethylation of the male germline relative to the female germline, occurs during the final stages of spermatogenesis.

Our low-coverage analysis cannot quantify methylation at single-copy loci, yet, we were able to analyse some genomic subsets. On average 54.31% (n = 116, ±SD 1.01) of CG calls mapped to repeated regions (very similar to the overall repeat level of 52%, from Howe et al.35). Not surprisingly, we found repetitive regions had greater methylation (mean 86.94%, ±SD 1.88) for all the samples compared to non-repetitive sequences (mean 67.64%, ±SD 4.29) (p < 0.01, Wilcoxon signed-rank test). Importantly, there was no hypomethylation of germline samples in either repetitive or non-repetitive subsets relative to non-germline controls (Supplementary Data 1).

Amplification and demethylation of oocyte-specific rDNA

Our initial analysis of germline methylation was performed using non-overlapping sliding windows (Figs. 1, 2), however, when all mapped reads were analysed irrespective of their location, methylation levels were greatly reduced for 9 EGFP +ve samples at 28–50 dpf (Supplementary Data 1). One explanation for this was that a lowly methylated region (or regions) was over-represented in our dataset for either technical or biological reasons. When we measured the occurrence of mapped CG calls within sliding 1 Mb windows throughout the genome, we found that the tip of chromosome 4 (Chr4:77,000,001–78,000,000; GRCz11) possessed a surprisingly high density of CG calls (Fig. 3a). Closer inspection revealed that the over-represented reads mapped to both strands of a 17.3-Kb region (chr4: chr4:77,549,891–77,567,278) containing an 11.5-Kb repeat unit encoding 45S ribosomal DNA (chr4:77,555,720–77,567,278) (Fig. 3b). It has been recently reported that at least 2 clusters of rDNA exist within the zebrafish genome36. One of these clusters contains the canonical rDNA expressed in all somatic cells. The other cluster is a maternal-specific rDNA type, which we term fem-rDNA, which overlaps with our overrepresented reads on chromosome 4. The genome reference has assembled only one unit of likely several fem-rDNA copies, separated by intergenic spacers37.

Fig. 3 Amplification of oocyte-specific fem-rDNA in a previously identified sex-linked region. a Number of CG calls mapping to windows of 1 Mb in the whole zebrafish genome. A peak is observed at the right tip of chromosome 4 following 25-dpf and 28-dpf. b Reads map to both the complementary original top (CTOT, red) complementary original bottom (CTOB, blue) strand of the 45S fem-rDNA unit on chromosome 4. Components of the rDNA repeat are indicated (External transcribed spacer (ETS), Internal transcribed spacer (ITS), Intergenic spacer (IGS)). c The amplified region is located within the most significant sex-linked SNPs from non-domesticated zebrafish strains, Cooch Behar, Ekkwill, Nadia and WIK Full size image

The tip of chromosome 4 in zebrafish is notable for its close linkage to a previously identified locus associated with sex in natural strains38. Analysis of SNPs with the strongest statistical support for sex-linkage in two natural laboratory strains (WIK and EKW) and two recently-sourced wild isolates (Nadia and Cooch Behar) revealed that amplified 45S fem-rDNA is located within the major sex-determining region in chromosome 4 (Supplementary Data 2). The high and variable number of rDNA copies and its location in a poorly-assembled section of the genome makes it difficult to establish the true length of the fem-rDNA repeat. Nevertheless, sex-linked SNPs located at both ends of this gap suggest the complete fem-rDNA cluster is embedded within the sex-determining region (Fig. 3c).

Using our low-coverage BS-seq data, we measured fem-rDNA amplification and methylation levels in both germline and control samples from females, males and sexually indeterminate fish. In the non-germline control samples, we found that fem-rDNA reads, on average, comprised 0.032% of total reads sequenced (Supplementary Data 3). These levels were similar to the proportion of fem-rDNA reads in adult muscle tissue from other datasets (0.049%, Supplementary Data 4)15. As such, we considered this range to represent non-amplified, background levels of fem-rDNA. Fem-rDNA levels were also at the background level for 41 out of 55 germline samples but there was clear enrichment for fem-rDNA in 8 females and one presumptive male (>1% reads mapping to fem-rDNA), peaking at 40 dpf (Fig. 4a). For one female, 20.75% of reads mapped to fem-rDNA, representing amplification of at least 170-fold compared to non-germline controls. To validate this finding, we performed quantitative PCR using somatic-rDNA as a similar multi-copy genomic control and independently grown fish. We found fem-rDNA was amplified relative to non-germline cells in 9 out of 12 female germline samples (>5-fold) at 35 and 40 dpf (range 5.89–93.54-fold amplification relative to background), but found no amplification in a further three males (Supplementary Fig. 4).

Fig. 4 Amplification and methylation of oocyte-specific fem-rDNA during gonad transformation. a Percentage of reads mapping to fem-rDNA in the germline prior to sexual differentiation (green), and in presumptive males (blue) and females (red) during gonad transformation and after sexual maturation. b Relationship between the amplification and methylation of fem-rDNA for vasa:EGFP −ve control samples (grey dots), and vasa:EGFP +ve germline cells from sexually undifferentiated fish at 1–21 (green dots) and 25–28 dpf (magenta dots); presumptive female fish at 35–45 (pink) and 50–70 (red) dpf; presumptive male fish at 35–45 (light blue) and 50–70 (blue) dpf. Samples have been divided into 3 clusters based on rDNA level and methylation; (1) ‘background’ non-amplified and methylated (2) moderately amplified and lowly methylated, (3) highly amplified and unmethylated. These consist of n = 75, n = 15, and n = 14 samples, respectively Full size image

When we analysed fem-rDNA methylation, we found three clear groups. All non-germline controls and many male and female germline samples were highly methylated (mean 73.2%, ±SD 17.29%), and did not show fem-rDNA amplification (mean 0.04%, ±SD 0.02%, see group 1, Fig. 4b). In contrast, those with strong amplification of fem-rDNA (i.e., >1% of total reads) were fully demethylated (mean 1.72%, ±SD 1.29%), and except for one individual, were either phenotypically female or late in the sexually indifferent phase (see group 3, Fig. 4b). An intermediate group of germline-only samples showed modest amplification of fem-rDNA (0.032–0.703% reads mapping to fem-rDNA, mean 0.23%, ±SD 0.22%) and were lowly methylated (mean 10.08%, ±SD 10.1%). Together this shows that fem-rDNA amplification and demethylation is highly correlated with feminisation of the zebrafish gonad.