The Ulva genome size is intermediate between sequenced genomes in the Chlorophyceae and Trebouxiophyceae ( Figure 2 ). The number of predicted genes and gene families is markedly lower compared to most Chlorophyceae, including the volvocine algae (Chlamydomonas, Gonium, Tetrabaena, and Volvox), but higher than the Trebouxiophyceae and prasinophytes ( Figures 2 and 3 ). The relative gene family sizes, however, are roughly equal between Ulva and volvocine algae when corrected for total genome size ( Figure S2 ). A phylogenetic tree inferred from a concatenated alignment of 58 nuclear protein-coding genes (totaling 42,401 amino acids) supports a sister-group relationship of Ulva with the Chlorophyceae in the crown chlorophytes ( Figure 3 ). This topology corroborates earlier phylogenetic hypotheses based on multigene organelle datasets (reviewed in []). The divergence of Ulva and the Chlamydomonas-Gonium-Volvox clade (Chlorophyceae, Volvocales) from their common ancestor coincided with substantial gain and loss of gene families in both lineages ( Figure 3 ). Gain and loss, however, do not seem to be correlated with multicellularity.

For each species, the total number of gene families, the number of orphans (genes that lack homologues in the eukaryotic data set), and the number of genes are indicated, as well as habitat and morphological characteristics. Maximum likelihood bootstrap values are indicated in black at each node. The number of gene families acquired or lost (values indicated in blue along each branch in the tree) was estimated using the Dollo parsimony principle.

Predicted Pattern of Gain and Loss of Gene Families during the Evolution of Green Algae and Land Plants

Figure 3 Predicted Pattern of Gain and Loss of Gene Families during the Evolution of Green Algae and Land Plants

The genome size of U. mutabilis was estimated by flow cytometry and k-mer spectral analysis to be around 100 Mbp. In total, 6.9 Gbp of PacBio long reads were assembled into 318 scaffolds (98.5 Mbp), covering 98.5% of the estimated nuclear haploid genome ( Figure 2 and Table S1 ). To increase the accuracy of the genome sequence at single-base resolution, the scaffolds were polished using PacBio and Illumina paired-end reads. We predicted 12,924 protein-coding genes, of which 91.8% were supported by RNA sequencing (RNA-seq) data. Analyses of genome completeness indicated that the genome assembly captures at least 92% of the eukaryotic BUSCO dataset. Analyses of pico-PLAZA core gene families resulted in a completeness score of 0.968 of the protein-coding genes ( Table S2 ). Annotation of repetitive elements resulted in 35% of the genome being masked. Among the identified repeats, 74% were classified as known or reported repeat families, with long terminal repeats (LTRs) and long interspersed elements (LINEs) being predominant, representing 15.3 Mbp and 9.3 Mbp, respectively ( Table S3 ).

Evolution of Multicellularity

20 Løvlie A. On the genetic control of cell cycles during morphogenesis in Ulva mutabilis. 20 Løvlie A. On the genetic control of cell cycles during morphogenesis in Ulva mutabilis. 20 Løvlie A. On the genetic control of cell cycles during morphogenesis in Ulva mutabilis. Ulva develops from gametes or zoospores into a multicellular thallus consisting of three main cell types (rhizoid, stem, and blade cells). After a first division, the basal cell gives rise to a rhizoidal cell and the apical cell []. Subsequent morphogenesis is then governed by a progressive change in cell cycles. The growth rate of the basal cells decreases after a few cell cycles, so that the holdfast remains small, while the division of blade cells continues and becomes synchronized to the prevailing light:dark cycle []. The shape of the multicellular thallus in Ulva is therefore largely driven by how cell size and division are controlled, and many morphological mutants in U. mutabilis, including the slender mutant used in this study (see below), appear to have arisen from underlying changes in cell-cycle regulation [].

1 Sebé-Pedrós A.

Degnan B.M.

Ruiz-Trillo I. The origin of Metazoa: a unicellular perspective. Figure 4 Comparative Analysis of Transcription-Associated Proteins Show full caption (A) Heatmap of transcription factors comparing Ulva with a selection of green algae (Bathycoccus prasinos, bpr; Chlamydomonas reinhardtii, cre; Chlorella variabilis, CN64a; Gonium pectorale, gpe; Micromonas pusilla, mpu; Micromonas sp., m299; Ostreococcus lucimarinus, olu; O. tauri, ota and o809; Coccomyxa subellipsoidea, cvu; Volvox carteri, vca), streprophytes (Klebsormidium nitens, kni), land plants (Arabidopsis thaliana, ath; Oryza sativa, osa; Physcomitrella patens, ppa), and red algae (Chondrus cripus, ccr; Cyanidioschyzon merolae, cme). 21 Khanna R.

Kronmiller B.

Maszle D.R.

Coupland G.

Holm M.

Mizuno T.

Wu S.-H. The Arabidopsis B-box zinc finger family. (B) Maximum likelihood phylogeny of CO-like transcription factors that are expanded in Ulva. Roman numbers refer to the classification as in Khanna et al. []. (C) Examples of tandem distributions of Ulva CO-like genes (containing a CCT and B-box domain) and genes containing either a CCT or B-box domain on contigs 003, 053, and 154. See also Data S1 The evolution of a complex thallus morphology is often associated with expansions in gene families that are involved in cell signaling, transcriptional regulation, and cell adhesion []. The Ulva genome encodes 251 proteins involved in transcriptional regulation—a comparatively low number for a green alga—which is also reflected in a low fraction of such proteins encoded by the genome (1.94% when compared to the average of 2.66% in green algae). Ulva lacks 10 families of transcription factors (TFs) and two families of transcriptional regulators (TRs) that are present in other green algae ( Data S1 ). Furthermore, the existing transcription-associated protein families are, on average, smaller than those in other green algae ( Figure 4 A).

5 Featherston J.

Arakaki Y.

Hanschen E.R.

Ferris P.J.

Michod R.E.

Olson B.J.S.C.

Nozaki H.

Durand P.M. The 4-Celled Tetrabaena socialis Nuclear Genome Reveals the Essential Components for Genetic Control of Cell Number at the Origin of Multicellularity in the Volvocine Lineage. 6 Prochnik S.E.

Umen J.

Nedelcu A.M.

Hallmann A.

Miller S.M.

Nishii I.

Ferris P.

Kuo A.

Mitros T.

Fritz-Laylin L.K.

et al. Genomic analysis of organismal complexity in the multicellular green alga Volvox carteri. 7 Hanschen E.R.

Marriage T.N.

Ferris P.J.

Hamaji T.

Toyoda A.

Fujiyama A.

Neme R.

Noguchi H.

Minakuchi Y.

Suzuki M.

et al. The Gonium pectorale genome demonstrates co-option of cell cycle regulation during the evolution of multicellularity. 22 Harashima H.

Dissmeyer N.

Schnittger A. Cell cycle control across the eukaryotic kingdom. 18 Ranjan A.

Townsley B.T.

Ichihashi Y.

Sinha N.R.

Chitwood D.H. An intracellular transcriptomic atlas of the giant coenocyte Caulerpa taxifolia. Among the most remarkable gene families that have been lost are genes of the retinoblastoma (RB)/E2F pathway and associated D-type cyclins. Comparative genomic studies of volvocine algae have revealed that the co-option of the RB cell-cycle pathway is a key step towards multicellularity in this group of green algae []. Apart from implying that evolution toward multicellularity progressed along different trajectories in Ulva and the volvocine algae, the absence of D-type cyclins, RB, and E2F signifies that entry into the cell cycle and the G1-S transition are independent on these genes, as is the case in yeast. As no homologs of Cln 2/3, SBF, and Whi5, which mediate G1-S transition in yeast [], are found in Ulva, we hypothesize that either a functionally analogous set of genes or an entirely different mechanism regulates Ulva S-phase entry. Interestingly, RB and E2F homologs were found in the transcriptome of the siphonous ulvophyte Caulerpa [], so at present, it remains unclear how widely the RB/E2F pathway is conserved within the Ulvophyceae. Other aspects of the cell cycle are more in line with other green algae ( Figure S4 and Data S1 ), be it that the single CDKA homolog (UM001_0289) contains a modified cyclin-binding motif, PSTALRE, instead of the evolutionarily conserved PSTAIRE motif. While variations in this motif are not uncommon in eukaryotes, Ulva is the first member of the green lineage to have such a variation reported.

23 Lang D.

Weiche B.

Timmerhaus G.

Richardt S.

Riaño-Pachón D.M.

Corrêa L.G.

Reski R.

Mueller-Roeber B.

Rensing S.A. Genome-wide phylogenetic comparative analysis of plant transcriptional regulation: a timeline of loss, gain, expansion, and correlation with complexity. 21 Khanna R.

Kronmiller B.

Maszle D.R.

Coupland G.

Holm M.

Mizuno T.

Wu S.-H. The Arabidopsis B-box zinc finger family. 21 Khanna R.

Kronmiller B.

Maszle D.R.

Coupland G.

Holm M.

Mizuno T.

Wu S.-H. The Arabidopsis B-box zinc finger family. 24 Cockram J.

Thiel T.

Steuernagel B.

Stein N.

Taudien S.

Bailey P.C.

O’Sullivan D.M. Genome dynamics explain the evolution of flowering time CCT domain gene families in the Poaceae. 25 Putterill J.

Robson F.

Lee K.

Simon R.

Coupland G. The CONSTANS gene of Arabidopsis promotes flowering and encodes a protein showing similarities to zinc finger transcription factors. 26 Strayer C.

Oyama T.

Schultz T.F.

Raman R.

Somers D.E.

Más P.

Panda S.

Kreps J.A.

Kay S.A. Cloning of the Arabidopsis clock gene TOC1, an autoregulatory response regulator homolog. 27 Liu J.

Shen J.

Xu Y.

Li X.

Xiao J.

Xiong L. Ghd2, a CONSTANS-like gene, confers drought sensitivity through regulation of senescence in rice. 28 Corellou F.

Schwartz C.

Motta J.-P.

Djouani-Tahri B.

Sanchez F.

Bouget F.Y. Clocks in the green lineage: comparative functional analysis of the circadian architecture of the picoeukaryote ostreococcus. 29 Serrano G.

Herrera-Palau R.

Romero J.M.

Serrano A.

Coupland G.

Valverde F. Chlamydomonas CONSTANS and the evolution of plant photoperiodic signaling. 30 Lang D.

Rensing S.A. The Evolution of Transcriptional Regulation in the Viridiplantae and its Correlation with Morphological Complexity. Contrary to the expectation for a multicellular organism [], few TF families are expanded in Ulva ( Figure 4 A). Notable exceptions are CONSTANS-LIKE (CO-like) TFs, of which Ulva has five genes, whereas all other sequenced algae encode between zero and two ( Figures 4 A and 4B). These CO-like TFs are characterized by one or two (group II or III) zinc-finger B boxes and a CCT protein domain. Both protein domains are involved in protein-protein interactions, and the CCT domain mediates DNA binding in a complex with HEME ACTIVATOR PROTEIN (HAP)-type TFs in Arabidopsis []. Ulva CO-like proteins form a single clade within other algal lineages ( Figure 4 B). In addition to the five CO-like TFs, functionally related proteins containing either (1) only a B-box domain similar to group V B-Box zinc fingers [] or (2) only a CCT domain and belonging to the CCT motif family [] are also expanded in Ulva. B-box zinc fingers and CMF proteins in angiosperms have been implicated in developmental processes such as photoperiodic flowering [], regulation of circadian rhythms [], and abiotic stress responses []. The control of light and photoperiod signaling is conserved in the green algae C. reinhardtii and Ostreococcus tauri []. Moreover, the CO-like TFs are one of the families potentially involved in the establishment of complex multicellularity in green algae and land plants []. Genome-wide mapping of Ulva CO-like genes and functionally related genes indicates that the majority (60%) originated through tandem duplication in Ulva ( Figure 4 C). Although the functions of these proteins will need to be confirmed experimentally, the CO-like and CMF genes in Ulva could be involved in the integration of a multitude of environmental signals in a highly dynamic intertidal environment—a kind that the other sequenced green algae are not regularly subjected.

31 Holzinger A.

Herburger K.

Kaplan F.

Lewis L.A. Desiccation tolerance in the chlorophyte green alga Ulva compressa: does cell wall architecture contribute to ecological success?. 32 Daher F.B.

Braybrook S.A. How to let go: pectin and plant cell adhesion. 33 Green J.L.

Kuntz S.G.

Sternberg P.W. Ror receptor tyrosine kinases: orphans no more. Figure 5 Comparative Analysis of Enriched and Depleted InterPro Domains in Ulva mutabilis Show full caption Significant differences relative to Chlamydomonas reinhardtii, Volvox carteri, and/or Gonium pectorale (Fisher’s exact test, false discovery rate [FDR]-corrected p < 0.05) are denoted with squares if significant in Ulva and Caulerpa and circles if significant in Ulva only. Z scores represent the number of IPR hits normalized by the total number of hits per species. See Table S7 for abbreviations used. A total of 441 protein kinases were identified in the Ulva genome, representing about 3%–4% of all protein-coding genes. The largest subfamily of Ulva kinases has similarity to PKnB kinase ( Data S1 ), a “eukaryotic-like” serine/threonine kinase originally discovered in bacteria. Around 20 of the PKnB kinases possess a transmembrane (TM) domain and an extracellular/adhesion domain—either Kringle (IPR000001), FAS1 (fasciclin-like; IPR000782), or Pectin lyase fold (IPR012334, IPR011050)—and so represent good candidates for Ulva receptor kinases, with potential roles in environmental sensing and/or developmental signaling. By acting on Ulva cell wall components, for example, the pectin-lyase-fold domains may contribute to desiccation resistance [] and to the growth and development of a multicellular thallus []. We note that although the three aforementioned extracellular domains are present in many green algae, including the green seaweed Caulerpa ( Figure 5 ), it is unusual to see them linked to a kinase, and this coupling is only seen in a few other eukaryotic species. The Kringle-kinase domain combination, for instance, was initially discovered in animal receptor tyrosine kinase-like orphan receptors (RORs), which use Wnt signaling proteins as ligands and function in multicellular development, neuronal outgrowth, cell migration, and polarity []. Our analysis additionally finds this domain combination in Ulva and the unicellular prasinophyte green algae (Ostreococcus). The Fasciclin-kinase combination is unique to Ulva, while the pectin-lyase-kinase combination is found only in the multicellular algae Ulva, Klebsormidium, and Ectocarpus. It is possible that Ulva Kringle-TM-kinase gene families arose via divergent evolution from a common ancestor based on sequence similarity of family members and the close proximity of some family members on single DNA scaffolds. The pectin-lyase-TM-kinase family proteins, on the other hand, are more divergent in sequence and structure (including kinase-TM-pectin-lyase proteins) and are more likely to have arisen by dynamic gene fusions or conversions.

34 Wheeler G.L.

Miranda-Saavedra D.

Barton G.J. Genome analysis of the unicellular green alga Chlamydomonas reinhardtii Indicates an ancient evolutionary origin for key pattern recognition and cell-signaling protein families. 35 Bessa Pereira C.

Bocková M.

Santos R.F.

Santos A.M.

Martins de Araújo M.

Oliveira L.

Homola J.

Carmo A.M. The Scavenger Receptor SSc5D Physically Interacts with Bacteria through the SRCR-Containing N-Terminal Domain. 36 Bowdish D.M.

Gordon S. Conserved domains of the class A scavenger receptors: evolution and function. 37 Zimmermann G.

Bäumlein H.

Mock H.-P.

Himmelbach A.

Schweizer P. The multigene family encoding germin-like proteins of barley. Regulation and function in Basal host resistance. In addition to containing extracellular protein domains that are linked to intracellular kinases, Ulva also shows a significantly enriched diversity of the protein domains associated with the ECM and cell surface relative to its sequenced sister taxa. Notable examples of these enriched ECM-associated domains include scavenger receptor cysteine-rich (SRCR) domain proteins (IPR001190, IPR017448), which are absent from land plants but present in animals and Volvocales []. In Metazoans, the SRCR proteins have diverse roles that include the recognition of the pathogen-associated molecular patterns (PAMPs) that mediate bacterial interactions [] and possibly encompass an early evolutionary role in cell-cell recognition or aggregation []. The germin (IPR001929) and RmlC-like cupin (IPR011051) domain folds are also among the ECM-associated domains and occur ubiquitously in streptophytes, where they are linked to the regulation of cell-wall properties such as extensibility and defense [].