B. oleracea genome assembly and annotation

Complementing the sequencing of the smaller B. rapa genome11, a draft genome assembly of B. oleracea var. capitata line 02–12 was produced by interleaving Illumina, Roche 454 and Sanger sequence data. This assembly represents 85% of the estimated 630 Mb genome, and includes >98% of the gene space (Supplementary Methods, Supplementary Tables 1–3, 7 and 8 and Supplementary Fig. 3). The assembly was anchored to a new genetic map16 to produce nine pseudo-chromosomes that account for 72% of the assembly, and validated by comparison with a B. oleracea physical map17, a high-density B. napus genetic map18 and complete BAC sequences (Supplementary Figs 4–9 and Supplementary Tables 4 and 5). For comparative analyses, identical genome annotation pipelines were used for annotation of protein-coding genes and transposable elements (TEs) for B. oleracea and B. rapa.

A total of 45,758 protein-coding genes were predicted, with a mean transcript length of 1,761 bp, a mean coding length of 1,037 bp, and a mean of 4.55 exons per gene (Table 1, Supplementary Methods, Supplementary Table 6 and Supplementary Fig. 10), similar to A. thaliana19 and B. rapa11. Publicly available ESTs, together with RNA sequencing (RNA-seq) data generated in this study, support 94% of predicted gene models (Supplementary Tables 7 and 8), and 91.6% of predicted genes have a match in at least one public protein database (Supplementary Tables 9 and 10, and Supplementary Fig. 11). Of the 45,758 predicted genes, 13,032 produce alternative splicing (AS) variants with intron retention and exon skipping (Supplementary Table 11). Genome annotation also predicted 3,756 non-coding RNAs (miRNA, tRNA, rRNA and snRNA) (Supplementary Table 12).

Table 1 Summary of genome assembly and annotation of B. oleracea. Full size table

A combination of structure-based analyses and homology-based comparisons resulted in the identification of 13,382 TEs with clearly identified terminal boundaries, including 5,107 retrotransposons and 8,275 DNA transposons (Supplementary Methods, Supplementary Fig. 12 and Supplementary Table 13). These elements together with numerous truncated elements or TE remnants make up 38.80% of the assembled portion of the B. oleracea genome, whereas TEs account for only 21.47% of the B. rapa genome assembly. Copia (11.64%) and gypsy (7.84%) retroelements are the major constituents of the repetitive fraction, and are unevenly distributed across each chromosome, with retrotransposons predominantly found in pericentromeric or heterochromatic regions (Supplementary Fig. 13) in B. oleracea. Tentative physical positions of some of the centromeres were determined based on homologue and phylogenetic analysis of the centromere-specific 76 bp tandem repeats CentBo-1 and CentBo-2 and copia-type retrotransposon (CentCRBo) (Supplementary Table 14 and Supplementary Figs 14–17). The distribution of 45S and 5S rDNA sequences were also visualized by fluorescent in situ hybridization (Supplementary Figs 18 and 19), leading to a predicted karyotype ideogram for B. oleracea (Supplementary Fig. 20). An extra-centromeric locus with colocalized centromeric satellite repeat CentBo-1 and the centromeric retrotransposon CRBo-1 was observed on the long arm of chromosome 6 (Supplementary Figs 18–20). A comprehensive database for the genome information is accessible at http://www.ocri-genomics.org/bolbase/index.html.

Conserved syntenic blocks and genome rearrangement after WGT

The relatively complete triplicated regions in B. oleracea and B. rapa were constructed and they relate to the 24 ancestral crucifer blocks (A–X) in A. thaliana20. Further the triplicated blocks resulting from WGT in the two Brassica species were partitioned into three subgenomes: LF (Least-fractionated), MF1 (Medium-fractionated) and MF2 (Most-fractionated)11 (Fig. 1a, Supplementary Methods, Supplementary Tables 15 and 16, and Supplementary Figs 21–26). These syntenic blocks occupy the majority of the genome assemblies of A. thaliana (19,628 genes, 72.24% of 27,169 genes), B. oleracea (26,485 genes, 57.88%) and B. rapa (26,698 genes, 64.84%), and provide a foundation for comparative analyses of chromosomal rearrangement, gene loss and divergence of retained paralogues after WGT. Massive gene loss occurred in an asymmetrical and reciprocal fashion in the three subgenomes of each species and was largely completed before the B. oleracea–B. rapa divergence (Fig. 1c, Supplementary Tables 17–19 and Supplementary Figs 25–27). The timing of this evolutionary process was supported by the estimated timing of WGT ~

15.9 million years ago (MYA), and species divergence ~

4.6 MYA, based on synonymous substitution (Ks) rates of genes located in the blocks (Fig. 1b and Supplementary Table 20). Gene loss occurred mainly through small deletions that may be caused by illegitimate recombination21,22 (Supplementary Fig. 27), consistent with observations in other plant genomes.

Figure 1: Genomic structure and gene retention rates in syntenic regions of B. oleracea and B. rapa. (a) Segmental colinearity of the genomes of B. oleracea, B. rapa and A. thaliana. Syntenic blocks are defined and labelled from A to X (coloured) previously reported in A. thaliana20. (b) Time estimate of WGD and subsequent two Brassica species divergence. (c) Pattern of retention/loss of orthologous genes on each set of three subgenomic (LF, MF1 and MF2) blocks of B. oleracea and B. rapa corresponding to A. thaliana A to X blocks. The x axis denotes the physical position of each A. thaliana gene locus. The y axis denotes the proportion of orthologous genes retained in the B. oleracea and B. rapa subgenomic blocks around each A. thaliana gene, where 500 genes flanking each side of a certain gene locus were analysed, giving a total window size of 1,001 genes. Full size image

Abundant genome rearrangement following WGT and subsequent Brassica species divergence resulted in complex mosaics of triplicated ancestral genomic blocks in the A and C genomes (Fig. 1a and Supplementary Fig. 28). At least 19 major, and numerous fine-scale, chromosome rearrangements occurred, which differentiate the two Brassica species (Supplementary Fig. 29). This is in agreement with previous comparative studies based on chromosome painting12,23 and genetic mapping24,25. The extensive chromosome reshuffling in Brassica is in contrast to that observed in other taxa, such as the highly syntenic tomato–potato and pear–apple genomes, each with longer divergence times and less genome rearrangement26,27. This difference may be a consequence of mesopolyploidy in Brassica.

Greater TEs accumulation in B. oleracea than B. rapa

Both retro- (22.13%) and DNA (16.67%) TEs appear to be greater amplified in B. oleracea relative to B. rapa (9.43 and 12.04%) (Fig. 2a and Supplementary Table 13). We constructed 1,362 gap-free contig-contig syntenic regions by clustering orthologous B. rapa—B. oleracea genes using MCscan (Supplementary Figs 29 and 30). The B. oleracea TE length (34.03% of the 259.6M) is 3.4 times greater than that of the syntenic B. rapa regions (16.73% of the 155.0M) (Fig. 2c, Supplementary Tables 21 and 22, and Supplementary Fig. 31). Phylogenetic analysis revealed that B. oleracea has both more LTR retrotransposon (LTR-RT) families, and more members in most families than B. rapa (Fig. 2d and Supplementary Figs 12, 32 and 33). Furthermore, two new lineages of LTR-RTs, Brassica Copia Retrotransposon and Brassica Gypsy Retrotransposon, were defined in both Brassica species (Supplementary Fig. 33). Analysis of LTR insertion time revealed that ~

98% of B. oleracea intact LTR-RTs amplified continuously over the ~

4 million years (MY) since the B. oleracea–B. rapa split, whereas ~

68% of B. rapa intact LTR-RTs amplified rapidly within the last 1 MY, predominantly in the recent 0.2 MY (Fig. 2b and Supplementary Fig. 34). Hence, LTR-RTs expanded more in the intergenic space of euchromatic regions in B. oleracea than B. rapa. This agrees with previous observations based on comparison of BAC sequences between the A and C genomes28. As a consequence of continuous TE amplification over the last 4 MY, the genome size of B. oleracea is ~

30% larger than that of B. rapa although the two genomes share the same ploidy and are largely collinear.

Figure 2: TE comparison analyses in B. oleracea and B. rapa. (a) TE copy number and total length in each assembly and B. oleracea–B. rapa syntenic blocks. (b) The number of intact LTR (Copia-like and Gypsy-like) birthed at different times (million years ago, MYA) in the syntenic regions of B. oleracea and B. rapa. (c) The comparison of TE distribution and composition in B. oleracea–B. rapa syntenic blocks along B. oleracea chromosomes. We divided B. oleracea–B. rapa syntenic region into non-overlapping sliding 200 kb windows to compare TE contents. For each window, the ratio log 10 (B. oleracea/B. rapa) was calculated for total syntenic block length (blue line), LTR length (purple line), gene length (yellow point), exons length (red point) and intron length (green point). If B. oleracea > B. rapa in absolute length of TE composition in a compared window, the dot or line is above the line y=0. The corresponding B. rapa chromosome segments along B. oleracea C08 were indicated by coloured bars. All other B. oleracea chromosomes are showed in Supplementary Fig. 31. (d) Phylogeny of the Copia-like elements as an example of LTR-RTs of the syntenic regions in B. rapa and B. oleracea. The neighbor-joining (NJ) trees were generated based on the conserved RT domain nucleotide sequences using the Kimura two-parameter method68 in MEGA4 (ref. 69). Full size image

Species-specific genes and tandemly duplicated genes

While the genomes of B. oleracea and B. rapa are highly similar in terms of total gene clusters/sequences and the gene number in each cluster, there are also a large number of species-specific genes in the two species. A total of 66.5% (34,237 genes) of B. oleracea genes and 74.9% (34,324) of B. rapa genes were clustered into OrthoMCL groups (Supplementary Table 23 and Supplementary Fig. 35). We identified 9,832 B. oleracea-specific and 5,735 B. rapa-specific genes, of which 77% were supported by gene expression and/or a clear Arabidopsis homologue (Supplementary Table 24). Of them, >90% of these specific genes were validated for their absence in the counterpart genomes by reciprocal mapping of raw clean reads (Supplementary Tables 25 and 26). Most Brassica-specific genes are randomly distributed along the chromosomes (Supplementary Figs 36 and 37). More than 80% of the species-specific genes were surrounded by non-specific genes (Supplementary Fig. 38), suggesting that deletion of individual genes may be the major mechanism underlying gene loss and the difference in gene numbers between B. oleracea and B. rapa.

Tandem duplication produces clusters of duplicated genes and contributes to the expansion of gene families29. We identified 1,825, 2,111 and 1,554 gene clusters containing 4,365, 5,181 and 4,170 tandemly duplicated genes in B. oleracea, B. rapa and A. thaliana, respectively (Fig. 3a, Supplementary Tables 27 and 28 and Supplementary Fig. 39). The wide range of sequence divergence of tandem gene pairs in each species suggests that tandem gene duplication occurred continuously throughout the evolutionary history of these species, rather than in discrete bursts (Supplementary Figs 40 and 41). Their continuous and asymmetrical occurrence after species divergence resulted in 522, 697 and 815 species-specific tandem clusters in the three genomes. The frequency of tandem duplication is independent of the total gene content, suggesting that genome triplication has not inhibited its occurrence. Tandemly duplicated genes are preferentially enriched for gene ontology (GO) categories related to defence response and pathways related to secondary metabolism such as indole alkaloid biosynthesis and tropane, piperidine and pyridine alkaloid biosynthesis (Fig. 3b, Supplementary Tables 29–32 and Supplementary Fig. 42). Over 44.0 and 51.9% of the NBS-encoding resistance genes are tandemly duplicated in B. oleracea and B. rapa, respectively (Supplementary Table 33).

Figure 3: The duplicated genes derived from tandem duplication and whole-genome duplications in Brassica genomes. (a) A Venn diagram showing shared and specific tandem duplication events in A. thaliana, B. rapa and B. oleracea. (b,c) Distribution of tandem genes and WGT/WGD-derived paralogues in the KEGG pathway maps in B. oleracea (bol), B. rapa (bra) and A. thaliana (ath). For each KEGG pathway map, the proportion of the number of duplicated genes or paralogues to the total genes was calculated (x axis) and the number of maps whose tandem gene proportion fell in a range was shown on the y axis. (d) Oxidative phosphorylation pathway enriched by WGT-derived paralogous genes in the Brassica genomes. The gene copy number for each KO enzyme in B. oleracea, B. rapa and A. thaliana were shown (dash-connected) under the KO enzyme number. Full size image

Biased loss and retention of genes after WGT/WGD

Following polyploidization, reversion of gene numbers towards diploid levels through gene loss has been widely observed in plants30. However, in Brassica this only appears to be true for collinear genes in the conserved syntenic regions, with a loss of ~

60% of the predicted post-triplication gene set, nearly restoring the pre-triplication gene number. This is reflected in an overall retention rate of 1.2-fold of A. thaliana orthologous genes in corresponding syntenic regions (Fig. 1c and Supplementary Table 18). In contrast, in terms of genes that have no collinear gene in A. thaliana and either Brassica species (hereafter called non-collinear genes), gene retention rates is 2.5-fold the A. thaliana gene number in B. oleracea and 1.9-fold in B. rapa, both significantly higher than the expected rates (P value <2.2e–16;Supplementary Table 34). For these retained genes, the numbers of the genes that are common in the two Brassica species are 11,746 in B. oleracea and 10,411 in B. rapa. Most of these genes are supported by expression and/or the presence of an Arabidopsis homologue (Supplementary Table 35). More than 61% of these genes have homologues present as collinear genes and 16% also are homologous to other non-collinear genes, indicating gene movement from triplicated syntenic regions and being similar to observations in A. thaliana, where half of the genes are nonsyntenic within rosids31. This suggests that the breakdown of the triplicated syntenic relationship has not only prevented gene loss and a move towards pre-triplication gene numbers but has also maintained a higher gene density, and thus maintained WGT-derived genes for species evolution.

The presence of a large number of the retained paralogous genes in the syntenic regions led us to examine whether genes in some functional categories have preferentially been over-retained, as observed in other plants29. The results indicate that WGT-produced paralogous genes are over-retained in GO categories associated with regulation of metabolic and biosynthetic processes, RNA metabolism and transcription factors (Supplementary Table 36 and Supplementary Figs 43–45), and the two Brassica species exhibit similar patterns of gene category retention. From a study of KEGG pathways, we also found that WGT-produced Brassica paralogous genes contribute 40–60% of total genes for 90% of KEGG pathways (Fig. 3c and Supplementary Fig. 43), and are functionally enriched in primary or core metabolic processes such as oxidative phosphorylation, carbon fixation, photosynthesis, circadian rhythm32 and lipid metabolism (Supplementary Tables 36 and 37 and Supplementary Figs 43–45). Notably, the pathways associated with energy metabolism have been enhanced in both Brassica species. For instance, in the oxidative phosphorylation pathway, there are 161 genes in A. thaliana, but 241 in B. oleracea and 208 in B. rapa. The majority (143/241 and 142/208) of these Brassica genes are multiple paralogues residing in the triplicated syntenic regions, and more than half of these paralogues have been retained as three copies, significantly higher than observed for other genes in the triplication regions (Fig. 3d and Supplementary Fig. 43).

Phylogenetic analyses show that WGT led to an expansion of genes involved in auxin functioning (AUX, IAA, GH3, PIN, SAUR, TAA, TIR, TPL and YUCCA), morphology specification (TCP), and flowering time control (FLC, CO, VRN1, LFY, AP1 and GI) (Supplementary Table 38 and Supplementary Figs 46–61), and that most Arabidopsis genes in these families have two or three orthologs in Brassica species. These WGT-produced duplicated genes may provide important sources of evolutionary innovation33 and contribute to the extreme morphological diversity in Brassica species.

Divergence of duplicated genes in the Brassica genomes

The largest genetic foundation for plant genome evolution and new species formation is the differentiation of retained paralogous and orthologous genes. Around 38% (4,302/11,493) of all paralogous gene pairs in B. oleracea and ~

36% (4,089/11,448) in B. rapa have different predicted exon numbers (Supplementary Data 1, Supplementary Tables 39 and 40 and Supplementary Fig. 62). There are 6,571 orthologous gene pairs with different exon numbers, accounting for 27.6% of total gene pairs (23,823). Some paralogous or orthologous pairs have high Ks values and low sequence similarity (Supplementary Fig. 63), indicating sequence differentiation. Of these paralogous genes, some offer appreciable opportunity for non-reciprocal DNA exchanges (gene conversion). About 8% of the 4,296 homologous quartets in B. rapa and B. oleracea have been affected by gene conversion (Fig. 4a, Supplementary Table 41 and Supplementary Fig. 64) and about one-sixth (53) of converted genes were inferred to have experienced independent conversion events in both Brassica species, a parallelism sometimes observed in other plants11,34. Around 40–44% of conversion events involved paralogues in the less-fractionated subgenomes LF in both species, substantially higher than the other two subgenomes (Supplementary Table 41). This finding suggests that gene conversion is related to homologous gene density, which determines the likelihood of illegitimate recombination.

Figure 4: Divergence of Brassica paralogous and orthologous genes in B. oleracea and B. rapa. (a) Genome-wide gene conversion in B. oleracea. The conversion in B. rapa is showed in Supplementary Fig. 64. (b) The ratio of differentially expressed duplicated gene pairs derived from different duplications: alpha whole-genome duplication (α-WGD), Brassiceae-lineage WGT, tandem duplication (TD). Bol, B. oleracea; Bra, B. rapa. C: callus; R: root; St: stem; L: leaf; F: flower; Si: silique. The differentially expressed duplicated gene pairs were defined as fold change >2 and false discovery rate (FDR) <0.05 or gene pair where expression was detected for only one gene within gene pairs (FDR <0.05). (c) Box and whisker plots for differentiated expression for three subgenomes (LF, MF1 and MF2) in flower tissue of B. oleracea and B. rapa. For the other tissues, see Supplementary Fig. 67. (d) The duplicated gene pairs belonging to transcription factors (TFs) and its related GO terms contain a significantly lower ratio of differentially expressed duplicated gene pairs than the average at the genome-wide level in leaf (values given) and other tissues (values not presented) (Supplementary Table 45). (e) The GO terms (left) in which the duplicated gene pairs contain a significantly higher ratio of differentially expressed duplicated gene pairs than the average ratio at the genome-wide level in leaf and other tissues (Supplementary Table 46). Values from one tissue were presented and the other tissues were indicated with abbreviated letters to the right if expression in these tissues is significantly higher. (f) Expression variation caused by divergence (either different variants or differential expression of the same variants) of alternative splicing (AS) variants in WGT paralogous gene pairs with identical numbers of exons and in Bol–Bra orthologous gene pairs. IRES denotes types of intron retention and exon skipping. Full size image

Analysis of RNA-seq data generated from callus, root, leaf, stem, flower and silique of B. oleracea and B. rapa suggests that >40% of WGT paralogous gene pairs are differentially expressed in these species (Fig. 4b and Supplementary Fig. 65), suggesting potential subfunctionalization of these genes. In both species, a general trend of expression differentiation was alpha-WGD paralogous genes (~

46%) > WGT paralogous genes (~

42%) > tandemly duplicated genes (~

35%) (Fig. 4b, Supplementary Fig. 66 and Supplementary Tables 42 and 43). Different tissues harbour approximately the same number of differentially expressed duplicates, but this number was slightly higher in flower tissue. The expression level of genes in the LF subgenome was significantly higher than corresponding syntenic genes in the more fractionated subgenomes (MF1 and MF2) while no expression dominance relationship was observed between the subgenomes MF1 and MF2 (Fig. 4c, Supplementary Table 44 and Supplementary Fig. 67). Duplicated transcription factor gene pairs showed less differentiated expression (~

38%) than the expected ratio at the genome-wide level (Fig. 4d and Supplementary Table 45), while paralogues with GO categories related to membrane, catalytic activity and defence response exhibited a higher ratio of differentiated expression (Fig. 4e and Supplementary Table 46). Of B. oleracea–B. rapa orthologous gene pairs (23,823 in total), ~

42% were differentially expressed across all tissues (Supplementary Tables 42 and 43).

Furthermore, many paralogues generate different transcripts, resulting in expression differentiation. Analysis of AS variants of paralogous gene pairs that have identical numbers of exons demonstrated that these variants (either different variants or differential expression of the same variants) cause >20% and >44% of such paralogous genes to be differentially expressed in B. oleracea and B. rapa, respectively (Fig. 4f and Supplementary Table 47). For orthologous gene pairs of B. oleracea and B. rapa, 35.5% (8,467) of gene pairs showed differential expression due to AS variation. When only counting intron retention and exon skipping, 9.3% (2,215) of gene pairs differ. Divergence in AS variants of gene pairs presents an important layer of gene regulation, as reported35,36,37,38, and thus provides a genetic basis for species evolution and new species formation.

Unique GSLs metabolism pathways

GSLs and hydrolysis products have been of long-standing interest due to their role in plant defence and anticancer properties. Compared with B. rapa and B. napus, B. oleracea has the greatest GSL profile diversity, with wide qualitative and quantitative variation39,40. We identified 101 and 105 GSL biosynthesis genes in B. rapa and B. oleracea, respectively, and 22 GSL catabolism genes in each species (Fig. 5a, Supplementary Table 48 and Supplementary Data 2). In the GSL biosynthesis and catabolism pathways, tandem genes (41.4%, 40.7% and 33.9% in A. thaliana, B. oleracea and B. rapa, respectively) were present in a much higher proportion than the genome-wide average (Supplementary Table 32). The observed variation of GSL profiles is mainly attributed to the duplication of two genes, methylthioalkylmalate (MAM) synthase and 2-oxoglutarate-dependent dioxygenase (AOP).

Figure 5: Whole-genome-wide comparison of genes involved in glucosinolate metabolism pathways in B. oleracea and its relatives. (a) Aliphatic and indolic GSL biosynthesis and catabolism pathways in A. thaliana, B. oleracea and B. rapa. The copy number of GSL biosynthetic genes in A. thaliana, B. rapa and B. oleracea are listed in square brackets, respectively. Potential anticancer substances/precursors are highlighted in blue bold. Two important amino acid chain elongation and side-chain modification loci MAMs and AOP2 are highlighted in red bold, within the number in the green bracket representing the number of non-functional genes. (b,c) The neighbour-joining (NJ) trees of MAM and AOP genes were generated based on the aligned coding sequences and 100 bootstrap repeats. The silenced genes are indicated by red hollow circle, expressed functional genes are represented by red solid disc and green rectangle. In A. thaliana ecotype Columbia there are just MAM1 and MAM3. (d) Three B. oleracea AOP2 loci among which are one functional AOP2 and two mutated AOP2. 1MOI3M: 1-methoxyindol-3-ylmethyl GSL; 1OHI3M: 1-hydroxyindol-3-ylmethyl GSL; 3MSOP: 3-methylsulfinylpropyl GSL; 3MTP: 3-methylthiopropyl GSL; 3PREY: 2-Propenyl GSL; 4BTEY: 3-butenyl GSL; 4MOI3M: 4-methoxyindol-3-ylmethyl GSL; 4OHB, 4-hydroxybutyl GSL; 4OHI3M: 4-hydroxyindol-3-ylmethyl GSL; 4MSOB: 4-methylsulfinylbutyl GSL; 4MTB, 4-methylthiobutyl GSL; AITC: allyl isothiocyanate; I3C: indole-3-carbinol; I3M: indolyl-3-methyl GSL; DIM: 3,3′-diindolymethane; MAM: methylthioalkylmalate; AOP: 2-oxoglutarate-dependent dioxygenase. Full size image

In Arabidopsis, the MAM family contains three tandemly duplicated and functionally diverse members (MAM1, MAM2 and MAM3), and functional analysis demonstrated that MAM2 (absent in ecotype Columbia) and MAM1 catalyses the condensation reaction of the first and the first two elongation cycles for the synthesis of dominant 3 and 4 carbon (C) side-chain aliphatic GSLs, respectively40,41, while MAM3 is assumed to contribute to the production of all GSL chain lengths42. In B. rapa and B. oleracea, MAM1/MAM2 genes experienced independent tandem duplication to produce 6 and 5 orthologs respectively (Fig. 5b,c). The main GSLs in B. oleracea are 4C and 3C GSLs (progoitrin, gluconapin, glucoraphanin and sinigrin)43, while those in B. rapa are 4C and 5C GSLs (gluconapin and glucobrassicanapin)39 (Fig. 5a). Based on the results of expression and phylogenetic analyses, we found a pair of genes Bol017070 and Bra013007, which are the only orthologous genes showing high expression in B. oleracea but silenced in B. rapa (Fig. 5a). This expression difference most likely leads to greater accumulation of the 3C GSL anticancer precursor sinigrin in B. oleracea. Meanwhile, the expression level of MAM3 in B. rapa is much higher than in B. oleracea, explaining the accumulation of 5C GSL glucobrassicanapin in B. rapa. Other genes affecting specific anticancer GLS products are AOPs. Previously, research has reported four gene loci involved in the side-chain modifications of aliphatic GSLs in Arabidopsis. Two tandemly duplicated genes AOP2 and AOP3 catalyse the formation of alkenyl and hydroxyalkyl GSLs, respectively. When both AOPs are non-functional, the plant accumulates the precursor methylsulfinyl alkyl GSL. We identified three AOP2 genes in B. oleracea (Fig. 5d), but two are non-functional due to the presence of premature stop codons. In contrast, all three AOP2 copies are functional in B. rapa44. No AOP3 homologue has been identified in Brassica. This analysis supports GSL content surveys and explains why glucoraphanin is abundant in B. oleracea, but not in B. rapa.