Strain and genome sequencing

The Gonium pectorale strain K3-F3-4 (mating type minus, NIES-2863 from the Microbial Culture Collection at National Institute for Environmental Studies, Tsukuba, Japan, http://mcc.nies.go.jp/) was used for genome sequencing. Gonium was grown in 200–300 ml VTAC media at 20 °C with a 14:10 h light–dark cycle using cool-white fluorescent lights (165–175 μmol m−2 s−1).

For next-generation sequencing and construction of a fosmid library, total DNA was extracted. Sequencing libraries were prepared using the GS FLX Titanium Rapid Library Preparation Kit (F. Hoffmann-La Roche, Basel, Switzerland) and the TruSeq DNA Sample Prep Kit (Illumina Inc., San Diego, CA, USA) and were run on both GS FLX (F. Hoffmann-La Roche) and MiSeq (Illumina Inc.) machines. Newbler v2.6 was used to assemble the GS FLX reads. A fosmid library was constructed in-house using vector pKS300. The fosmid library (23,424 clones) and BAC library (18,048 clones, Genome Institute (CUGI), Clemson University, Clemson, SC, USA) were end-sequenced using a BigDye terminator kit v3 (Life Technologies, Carlsbad, CA, USA) analysed on automated ABI3730 capillary sequencers (Life Technologies).

Evidence-based gene prediction

Introns hint file generation was done through a two-step, iterative mapping approach using Bowtie/Tophat command lines and custom Perl scripts written by Mario Stanke as part of AUGUSTUS46, (available at: http://bioinf.uni-greifswald.de/bioinf/wiki/pmwiki.php?n=IncorporatingRNAseq.Tophat). AUGUSTUS version 2.6.1 was selected because its algorithm has been successfully tuned to predict genes in Chlamydomonas and Volvox genomes, which contain high GC content46. Reads were first mapped to the genome assembly with Tophat version 2.0.2 (ref. 47) and the raw alignments were filtered to create an initial (intron) hints file, which was subsequently provided to AUGUSTUS during gene prediction. An exon–exon junction database was generated from the initial AUGUSTUS prediction via a Perl script. The twice-mapped reads (once to the genome and once to the exon–exon sequences) were then merged, filtered and a final intron hints file was created. From this, the final gene prediction with AUGUSTUS was performed.

Pfam domain analysis

Diversity and abundance of Pfam domains was determined for all published green algae genomes. Chlorophyte genomes including Bathycoccus prasinos48, Chlamydomonas reinhardtii35, Chlorella variabilis49, Coccomyxa subellipsoidea C-169 (ref. 50), Micromonas pusilla CCMP1545 (ref. 51), Micromonas pusilla RCC299 (ref. 51), Ostreococcus tauri52, Ostreococcus lucimarinus53, Ostreococcus sp. RCC809 (US Department of Energy, Phytozome) and Volvox carteri (both versions 1 and 2; ref. 17) were searched using direct submission of Pfam A and Pfam B domains using Bioperl. Subsequent hits were counted and produced a matrix of Pfam domain diversity and abundance across green algae.

Analysis of transcription-associated proteins

Transcription-associated proteins (TAPs) include transcription factors (enhance or repress transcription) and transcription regulators (proteins which indirectly regulate transcription such as scaffold proteins, histone modification or DNA methylation). We combined three TAP classification rules for plants; PlantTFDB54, PlnTFDB55 and PlanTAPDB56 to make a set of classification rules for 96 TAP families. Conflicts between the three sets of rules were manually resolved using the rule that included more genes as transcription-associated proteins.

Each transcription family includes at least one, up to three, mandatory domains. Families may include up to six forbidden domains (that is, a gene G cannot be in family F if domain D is present); not all families have defined forbidden domains. All mandatory and forbidden domains were represented by a full-length, global, Hidden Markov Model (HMM). Available HMMs were retrieved from Pfam_ls database57,58. When HMMs were not available from the Pfam_ls database, custom HMMs were made using multiple sequence alignments from PlnTFDB55 and the HMM was calculated using HMMER version 3.0 (ref. 59) using ‘hmmbuild’ with default parameters and ‘hmmcalibrate—seed 0′.

Gathering cutoff thresholds (GA) for the custom HMMs were set as the lowest score of a true positive hit using a ‘hmmscan’ search against several complete Chlorophyte genomes. Chlorophyte genomes including Bathycoccus prasinos48, Chlamydomonas reinhardtii35, Chlorella variabilis49, Coccomyxa subellipsoidea C-169 (ref. 50), Micromonas pusilla CCMP1545 (ref. 51), Micromonas pusilla RCC299 (ref. 51), Ostreococcus tauri52, Ostreococcus lucimarinus53, Ostreococcus sp. RCC809 (available on the DOE Phytozome website, version 10.1) and Volvox carteri17 were searched using ‘hmmscan’ to search the library of 103 domains against the predicted protein sequences. Analyses were replicated with both Volvox version 1 and version 2; however, as results were not qualitatively different, results from version 1 are provided (Supplementary Fig. 3). Subsequent hits were classified into a TAP family. Conflicts between multiple TAP families were resolved by assigning the gene to the TAP family with the highest score (Supplementary Table 1).

Construction of protein families

Protein families were created using OrthoMCL60 with a variety of inflation values ranging from 1.2 to 4.0 in steps of 0.1 (Supplementary Figs 16–17). This analysis was performed using Chlorophyte genomes available on the DOE JGI Phytozome website, version 10.1 including Bathycoccus prasinos48, Chlamydomonas reinhardtii35, Chlorella variabilis49, Coccomyxa subellipsoidea C-169 (ref. 50), Micromonas pusilla CCMP1545 (ref. 51), Micromonas pusilla RCC299 (ref. 51), Ostreococcus tauri52, Ostreococcus lucimarinus53, Ostreococcus sp. RCC809 (available on the DOE Joint Genome Institute website) and Volvox carteri17. This analysis was repeated for both Volvox version 1 and Volvox version 2. The inflation value of 1.9 was used for both analyses for consistency and was chosen to have relatively large, coarser grained clusters that were robust to higher inflation values (Supplementary Figs 16–19). To avoid bias introduced by not including all genes for each species, genes not assigned to a gene family (singletons) were assigned to single gene families and included in all subsequent phylogenetic gene family analyses.

A species tree was calculated by extracting OrthoMCL gene families containing only one copy in each species, for a total of 1,457 genes. The OrthoMCL run with an inflation value of 1.5 was chosen to use larger, coarser grained clusters, thus increasing the likelihood of capturing true 1:1:1 orthologues. This species tree included Volvox carteri version 2. These genes were independently aligned using Muscle version 3.8.31 (ref. 61) and concatenated. A phylogenetic tree was produced using RAxML version 8.0.20 (ref. 62) using the Protein Gamma model with automatic model selection on a per gene basis via partitions for each protein. A rapid bootstrapping analysis to search for the best-scoring ML tree was run with 100 bootstraps. The resulting species tree is consistent with previous results16,51,63,64,65 and had 100 bootstrap support at every node (Supplementary Fig. 20). This result is also consistent with numerous morphological characteristics supporting a closer relationship of Gonium and Volvox66.

Gene family evolution within the volvocine algae was analysed using Count version 10.04 (ref. 67) to perform several parsimony analyses including symmetric Wagner parsimony (each gene family may be gained or expanded multiple times and the gain penalty is equal to the loss penalty) and asymmetric Wagner parsimony (each gene family may be gained or expanded multiple times and the gain penalty is two times higher than the loss penalty). This analysis was repeated for both Volvox version 1 and version 2 genomes (Supplementary Tables 5–8).

dN/dS analysis

During our OrthoMCL construction of protein gene families, we identified 6,154 clusters with exactly one copy in Chlamydomonas (version 5.3), Gonium and Volvox (version 2). The number of genes from other unicellular (non-Chlamydomonas) Chlorophyte species was ignored. This criteria is relatively strict as it does not include any genes with a duplicate in any species (copy number greater than one in any species) or any genes which are not essential (no copy present in any species) resulting in 1:1:1 orthologues. Given the relatively high gene duplication rates in volvocine algae (data not shown), these strict criteria support an interpretation of 1:1:1 orthology. Genome-wide pairwise comparisons of dN, dS and dN/dS were calculated (Supplementary Fig. 21; Supplementary Table 11) using PAML and codeml (ML analysis68) based on nucleotide translation based alignments (proteins were aligned using MUSCLE61).

Prediction of lineage-specific genes

The phylostratigraphy method20 assumes Dollo’s parsimony (that is, it is more likely that a gene observed in two distant clades was present in the common ancestor and multiple independent gains are not possible). This provides an entry point for testing evolutionary hypotheses related to the age of genes and to quantify how much gene-level innovation has occurred along each phylogenetic branch. Old genes are classified in low phylostrata (present in distant species, PS1–PS7) and young genes are classified in higher phylostrata (for example, genus- or species-specific genes, PS8–PS9). The resolution of each phylostratum strictly depends on the availability of reliable outgroups (the availability of reliable genomic outgroups is relatively low in Chlorophyte algae). The phylogenetic classes were defined from those in each NCBI Taxonomy entry for Chlamydomonas, Gonium and Volvox, resulting in nine expected phylostrata for each species. All proteins were subjected to a BLASTP search with an E-value threshold of 0.001 against the NCBI nr database. Placement in phylostrata was derived from the taxonomic information of these hits for each protein, using the most distant hit, and following Dollo’s parsimony.

Phylogenetic analyses

Unless otherwise stated, all phylogenetic analyses were performed using a custom pipeline of SATe version 2.2.7 (ref. 69) coupled with RAxML version 8 (ref. 62). Full gene protein sequences were passed to SATe using a FASTTREE tree estimation with a RAxML search after tree formation with a maximum limit of 10 iterations and the ‘longest’ decomposition strategy. Bootstraps were made on the SATe output alignment and tree using RAxML with automatic model selection, a rapid hill climbing algorithm (−f d) and 100 bootstrap partitions. Bipartition information (−f a) was obtained using the SATe output tree and RAxML bootstraps.

Chlamydomonas strains culture conditions

Wild-type Chlamydomonas reinhardtii 6145 and 21gr, and HA-CrRB (HA-MAT3::mat3–4, here referred to as HA-CrRB::rb), mat3–4 (here referred to as rb), and dp1 have been previously described9,10,11. Briefly, wild-type strains 6145 (MT−) and 21gr (MT+) are mating pairs that have been back crossed to eliminate the y1 mutation in 6145 (ref. 10). The RB knockout strain has been previously characterized as a null allele, and the knockout mutation is the rb allele9,11. The rb mutation can be complemented by a amino (N)-terminally tagged version of the gene that behaves identical to wild type. Previously, a knockout mutation in the Chlamydomonas DP1 gene, dp1, was identified and characterized10,11. All the strains were maintained on TAP plates. For phenotype analysis, the strains were grown in high salt media (HSM) synchronously under 14 h of 150 μE of light, samples were fixed hourly and examined by light microscopy10,11.

Cloning of Gonium pectorale RB and transformation into rb

A 3X haemagluttin (HA) tagged copy of the Gonium pectorale RB gene was cloned using InFusion Cloning (Clontech) to be driven by the Chlamydomonas RB promoter and terminator that includes a AphVIII selectable marker for Chlamydomonas transformation (Fig. 4, (ref. 11)). Gonium pectorale genomic DNA from K4F3 was used as a template and the genomic region of RB was amplified without its ATG start codon using the primers 5′-CAGATTACGCTACTAGATCTGCCGAAGCTGAACGTTTTACTGCG-3′, and 5′-CTCCGGCCGCGGTGCCTAATTTGCGCCGTACCGCCGGA-3′. These primers overlap with the 3X HA tag and 3′ terminator from the previously created HA-CrRB transformation clone that complements the rb mutation11. The HA-CrRB plasmid was amplified by inverse PCR with 5′-TCTAGTAGCGTAATCTGGAACGTCATATGGATAGG-3′ and 5′-GCACCGCGGCCGGAGGT-3′ primers. PCR products were gel purified with a QiaQuick gel extraction kit (Qiagen). Purified PCR fragments were fused by InFusion (Clontech) cloning based on overlaps in the amplified sequences and transformed into chemically competent DH5-apha cells, after which the clone was confirmed by sequencing.

Transformation of Chlamydomonas reinhardtii

The rb strain was transformed with glass beads11, with the HA-GpRB clone (above) and as a control with HA-CrRB and pSI103 (AphVIII selectable marker only) and selected on TAP plates supplemented with 20 μg ml−1 paromycin11. Candidate strains were screened by growth morphology10,11, and then screened for expression by immunoblotting with an anti-HA antibody (Roche 3F10, high affinity11). Four independent strains expressing the HA-GpRB, and five independent strains expressing HA-CrRB were created. Control complementation of the rb mutation with HA-CrRB occurred at rates similar to previous results11. The presence of the rb mutation was confirmed by replica plating on TAP plates supplemented with 10 μg ml−1 emetine9,11.

Genetic analysis of HA-GpRB-expressing strains

Two lines expressing HA-GpRB were crossed to a dp1 null mutation10. Because both the HA-GpRB and dp1 mutations are linked to AphVIII, single tetrads were dissected. HA-GpRB was genotyped with primers in the 3XHA tag 5′-AGTGCTAACAGCATGTCTAGTTAC-3′, and in the 5′ portion of GpRB 5′-TGCGAACAACCGCTGCAGACCTTC-3′. The dp1 mutation was genotyped as previously described10.

Immunoblotting HA-GpRB and HA-CrRB strains complementing rb

Whole-cell lysates from strains were prepared, separated and immunoblotted11. Briefly, the anti-HA antibody used for detection of HA-GpRB and HA-CrRB was an anti-HA high affinity monoclonal antibody (clone 3F10, Roche) and anti-alpha-tubulin monoclonal antibody (Sigma), as previously described11. The expression levels of RB in HA-CrRB strains have been previously shown to be similar to wild-type Chlamydomonas expression levels11. The expression levels of RB in HA-GpRB are similar, if not slightly below, the expression levels of HA-CrRB, suggesting that overexpression of RB is not causing the observed colonial phenotype, but rather modification to the Gonium RB gene.

Measurement of cell or colony size distribution

The size of cells and groups of cells was measured with a Moxi Z automated cell sizer/counter using type ‘S’ cassettes (ORFLO Technologies). Sizing is based on the Coulter principle used previously with Chlamydomonas reinhardtii10,11.