The unicellular eukaryote Tetrahymena thermophila has seven mating types. Cells can mate only when they recognize cells of a different mating type as non-self. As a ciliate, Tetrahymena separates its germline and soma into two nuclei. During growth the somatic nucleus is responsible for all gene transcription while the germline nucleus remains silent. During mating, a new somatic nucleus is differentiated from a germline nucleus and mating type is decided by a stochastic process. We report here that the somatic mating type locus contains a pair of genes arranged head-to-head. Each gene encodes a mating type-specific segment and a transmembrane domain that is shared by all mating types. Somatic gene knockouts showed both genes are required for efficient non-self recognition and successful mating, as assessed by pair formation and progeny production. The germline mating type locus consists of a tandem array of incomplete gene pairs representing each potential mating type. During mating, a complete new gene pair is assembled at the somatic mating type locus; the incomplete genes of one gene pair are completed by joining to gene segments at each end of germline array. All other germline gene pairs are deleted in the process. These programmed DNA rearrangements make this a fascinating system of mating type determination.

Tetrahymena thermophila is a single-celled organism with seven sexes. After two cells of different sexes mate, the progeny cells can be of any one of the seven sexes. In this article we show how this sex decision is made. Every cell has two genomes, each contained within a separate nucleus. The germline genome is analogous to that in our ovaries or testes, containing all the genetic information for the sexual progeny; the somatic or working genome controls the operation of the cell (including its sex). We show that the germline genome contains a tandem array of similarly organized but incomplete gene pairs, one for each sex. Sex is chosen after fertilization when a new somatic genome is generated by rearrangement of a copy of the germline genome. One complete sex gene pair is assembled when the cell joins DNA segments at opposite ends of the array to each end of one incomplete gene pair; this gene pair is thus completed and becomes fully functional, while the remaining sex gene pairs are excised and lost. The process involves programmed, site-specific genome rearrangements, and the physically independent rearrangements that occur at opposite ends of the selected gene pair happen with high reliability and precision.

Funding: This work is supported by the National Science Foundation, USA (MCB-1025069) to EO and (U.S. NSF IGERT DGE-02-21715) to Linda Petzold in support of MJL; Knowledge Innovation Program of the Chinese Academy of Sciences (KSCX2-EW-G-6-4) and National Natural Science Foundation of China (31071993) to MW; Tri-Counties Blood Bank Postdoctoral Fellowship to M.D.C. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

The T. thermophila germline mat locus was first described by Nanney et al. in 1953 [9] and remains the only locus known to control mating type specificity in this organism. These authors reported that the mat locus determines a spectrum of seven mating types (I–VII), one of which is stochastically and irreversibly expressed in each new somatic nucleus. Extensive field collections have revealed no additional mating types [10] . Two classes of germline mat alleles are known [10] – [13] . The mat-1-like alleles encode mating types I, II, III, V, and VI, while mat-2-like alleles encode mating types II, III, IV, V, VI, and VII [9] . All the strains used in this work are homozygous for the mat-2 allele of inbred strain B. Alternative DNA deletions, rather than epigenetic gene silencing, were proposed to be responsible for mating type determination [14] . The work reported here, made possible by the molecular identification of the mating type genes, has revealed a type of programmed DNA rearrangement in the somatic nucleus that assembles a gene pair of one mating type and deletes the rest.

T. thermophila is a ciliate that segregates germline and somatic functions into two nuclei with distinct genome structures: the diploid micronucleus (germline) and the polyploid macronucleus (somatic). Starvation induces mating (conjugation) between two cells of different mating types. During conjugation ( Figure S1 ) the parental somatic nucleus is destroyed while new somatic and germline nuclei are differentiated from a zygote nucleus. This differentiation includes extensive site-specific genome rearrangements, including fragmentation of the germline chromosomes, de novo telomere addition, and deletion of thousands of internal eliminated sequences (IESs) [7] . Mating type is also determined at this stage [8] .

Unicellular eukaryotes reproduce asexually, but most also have a sexual stage to their life cycle that increases genotypic variability. Sexual partners are usually morphologically indistinguishable and mating types, as part of a self/non-self recognition system, foster outbreeding. Mating types were first discovered by Sonneborn in the ciliate Paramecium aurelia [1] . This discovery initiated the field of microbial genetics, as mating types were subsequently found in bacteria and a diversity of microbial eukaryotes. The number of mating types and the mechanisms of mating type determination vary widely among unicellular eukaryotes [2] – [6] .

Analysis of the TM exons of twenty 120-fission strains ( Table S7 ; Texts S10 and S11 ) shows that MTB-TM exons undergo additional recombination after the resumption of vegetative multiplication. Highly significant differences between 0- and 120-fission cells are observed for the MTB-TM exons, whether one compares the number of haplotypes explained by a single joining event, or by recombination events involving more than two germline genes, or by gene conversions ( Table S7 ). PCR template switching is excluded as a spurious source of recombination in these results ( Text S9 ). We believe these events largely represent intragenic secondary recombination, distinct from the single, simple recombination events responsible for mating type determination.

To determine more precisely where the incomplete gene pairs are joined to full-length germline TM exons, we compared the sequences of TM exons from newly differentiated somatic nuclei to those of germline TM exon segments. We sequenced individual somatic TM exons from progeny that had not yet undergone the first cell division (exconjugants, Figure S1 stage 3). We constructed “collapsed alignments” to concisely represent all the polymorphisms in the somatic and germline nuclei ( Texts S6 – S8 ). Schematic representations of the complete set of sequenced exons are shown in Figure 6 . Somatic MTA2 and MTB3 genes, which are already complete in the germline, showed no evidence of any joining event. The TM exons of every other somatic mating type gene showed polymorphic nucleotide combinations not present in the germline genome ( Figure 6 ; Texts S7 and S8 ). A single, simple joining event connecting a truncated germline tm segment to the full-length germline MTA2-TM exon explains 98% of the somatic MTA-TM exons present in early progeny ( Table S7A ). The MTA join sites were mapped to a 269-bp segment near the start of the MTA2-TM exon. A single, simple event explains joining to the full-length germline MTB3-TM exon in 74% of the sequenced exons ( Table S7A ). This percentage varies from 42% for somatic MTB2 to 89% for somatic MTB6. The MTB join sites mapped to intervals distributed throughout the germline TM exon sequence. The number of distinct join sites may have been exaggerated if PCR template switching [29] reshuffled nucleotide diversity in these sequenced TM exons ( Text S9 ). These data confirm that many if not all of the joining events occur within the TM exon rather than exclusively between the TM exon and the mating type-specific segment. The frequency of novel nucleotides (not present in the germline) is less than one in 50,000 sequenced base pairs ( Figure 6 legend), showing that the joining events are highly precise.

During differentiation of a new somatic nucleus a pair of intact MTA and MTB genes must be assembled from the germline genes. One possibility is that joining occurs between the ends of the mating type-specific segment and the start of the MTA2 and MTB3 full-length TM exons. If this were the case, all the progeny would have full-length TM exons identical to those of the MTA2 and MTB3 germline genes. Alternatively, joining could occur at internal locations within the germline TM exons. In this case, the somatic TM exons would contain novel combinations of the unique polymorphic nucleotides found in the germline tm segments. Somatic mating type gene pair sequences from the mature strains mentioned above, and the SB210 shotgun macronuclear genome sequence, were found to contain novel combinations of these polymorphic nucleotides (see below), suggesting that joining can occur within the germline TM exons.

The sequenced TM exons are from progeny that had not yet undergone their first division (see Figure S1 , stage 3, and Materials and Methods ). The top six lines represent the germline mating type gene pairs of SB210, shown in their germline order (from top to bottom). All TM exons are drawn to scale. The darker gray bars represent intact and truncated MTA TM exons, while the lighter gray bars represent truncated and intact MTB-TM exons. The mating type-specific segments are color-coded, as labeled, and are not drawn to scale as indicated by the double slash marks. The dashes beyond MTA2-TM and MTB3-TM indicate sequence adjacent to the mat locus, which is identical in all nuclei. Vertical bars of mating type-specific color within MTA and MTB TM exon segments represent the location of polymorphic nucleotides relative to the germline consensus sequence of each TM exon (the consensus sequence is shown in Text S6 and a complete list of polymorphisms is shown in Tables S5 and S6 ). As an example, the simplest possible germline origin of the most common somatic MTA6-TM and MTB6-TM exons is indicated by boxed regions within the germline mating type gene pairs and somatic exons. For each mating type, approximately ten MTA-TM exons and 30 MTB-TM exons were sequenced (see Texts S7 and S8 for details). Numbers to the left of MTA and to the right of MTB TM exons represent the number of times each combination of polymorphic nucleotides was found among the sequenced TM exons. *, location of a base not present in the germline; these changes could be due to either PCR errors or replication repair errors and occurred at a rate of 1 bp in 50 Kbp (see Texts S7 and S8 for details).

The six MTA TM exon segments of the germline SB210 mat locus were aligned, delineating the position at which each germline tm segment is truncated ( Figure 6 ) and revealing 59 polymorphic sites ( Table S5 ). The MTB TM exon segments were similarly examined and 52 polymorphic sites were found ( Table S6 ). With only one exception, none of the polymorphic nucleotides generate stop codons or reading frame shifts and most are unique to a particular gene pair. Unique polymorphic nucleotides within the germline TM exon segments allow us to deduce the germline origin of somatic MTA-TM and MTB-TM exon DNA.

We identified homologs of the MTA and MTB genes in the somatic genome sequence of several additional species ( Figure 5 ). Somatic genome sequence is available for two Tetrahymena species that are within the same subgroup [27] as T. thermophila (T. malaccensis and T. elliotti) and two more distantly related species (T. borealis and T. pyriformis) (T. malaccensis, T. elliotti, T. borealis at the Broad Institute website, T. pyriformis strain GL by W. Miao, unpublished data). T. malaccensis and T. borealis have systems with six and seven mating types, respectively, and like T. thermophila, mating type determination is stochastic, without influence of the parental mating types [28] . The mating type system of T. elliotti is unknown. The same is true of T. pyriformis, where the GL strain is sole representative of this species. This strain also lacks a germline nucleus and thus would be sterile if it could mate. Nucleotide and protein BLASTN and TBLASTN searches using the sequence of the conserved TM exons led us to identify single-copy, head-to-head MTA and MTB homologs of approximately the same length for all four related species ( Text S4 ). The results of a phylogenetic analysis ( Figure 5 ) and Clustal Omega alignment ( Text S5 ) showed the mating type of the sequenced strain of T.elliotti to be most closely related to T. thermophila mating type III. Similarly, the mating type of the sequenced strain of T.malaccensis is most closely related to mt IV. Alignments of the predicted amino acid sequences are shown in Text S5 . For the remaining species, specific mating type relationships could not be recognized either because they carry a homolog of the mt I gene of the T. thermophila mat1 allele, which has not yet been sequenced, or the sequence divergence is too great. Neither T. thermophila MTA nor MTB protein show similarity to any of the other ciliate mating type protein deposited in GenBank, a total of 19 distinct proteins from four Euplotes species and one Blepharisma japonicum protein, as determined by BLASTP with expected value threshold = 10.

The mating type-specific segments of the germline gene pairs differ in size by up to 8.5 kb. This variation is due to the presence of IESs, germline-specific sequences that interrupt a contiguous region of somatic-destined sequence, within the array. By comparing somatic sequences to the germline genome sequence, we identified six IESs, ( Figure 3 ; Table S4 ). Each was confirmed by cloning and sequencing PCR products from the germline and somatic nuclei (unpublished data). The IESs lie within introns in mating type-specific segments or in an intergenic region; they range in size from 299 to 5,989 bp. No other differences were found between the germline and somatic sequences in the mating type-specific segments. Additional germline-limited sequence separates adjacent mating type gene pairs in the germline array ( Table S4 ).

The mating type genes represent two gene families. Predicted proteins within the MTA family are of similar size (1423–1494 aa). Clustal Omega alignment [24] , [25] of the six predicted MTA proteins reveals their TM exons share 99.6% amino acid identity ( Text S2A ). Mating type-specific regions were compared by means of all-by-all pairwise alignments of every MTA mating type-specific amino acid sequence using BLASTP. On average, the alignments covered 98% (range 92%–100%) of the sequences, and showed 42% (range 38%–47%) sequence identity and 60% (range 58%–65%) sequence similarity (identical and conservative substitutions); expected values ranged from 1E-162 to less than 1E-200. Predicted proteins within the MTB family are also of similar size (1,733–1,749 aa). Clustal Omega alignment of the six MTB proteins shows their TM exons share 99.4% amino acid identity ( Text S2B ). Analogous pairwise alignments of every MTB mating type-specific amino acid sequence on average covered 99% (range 97%–100%) of the sequences, and showed 43% (range 41%–46%) sequence identity and 62% (range 60%–64%) sequence similarity; expected values were all less than 1E-200. The two protein families were compared by all-by-all BLASTP alignments of MTA versus MTB predicted amino acid sequences; in every case, the only significant match (expected value around 1E-08) was restricted to a ∼80 amino-acid cysteine-rich segment containing furin-like repeats, starting about 50 amino acids into the TM exon-encoded sequence. Clustal Omega alignment of the furin-like repeats within the 12 TM exons is shown in Text S3 . Cysteines at 12 positions and other amino acids at 14 positions are absolutely conserved among the furin-like repeats of the 12 TM exons. The function of cysteine rich, furin-like repeat domains is not known, but they are found in some endoproteases and cell surface receptors [26] .

Using the above information, the somatic mat locus of each mating type was sequenced from mature mating type strains ( Tables S1 and S3 ) derived from a mating between strains SB210 mt VI and SB1969 mt II. The entire germline mat locus from SB1969 mt II was sequenced and found to be identical to that of SB210 ( Table S3 ). In the mature mating type strains every somatic gene pair has full-length MTA– and MTB-TM exons joined to a mating type-specific segment, an arrangement identical to that of the somatic mt VI gene pair. The TM exons of the other mating types revealed several single nucleotide polymorphisms when compared to the mt VI gene pair (see below), but otherwise are identical.

Southern blot analysis was carried out using whole-cell genomic DNA from a mature strain of each mating type (SB4208, SB4211, SB4214, SB4217, SB4220, and SB4223; see Table S1 ). The DNA was digested with PvuII restriction endonuclease and separated by pulsed-field gel electrophoresis. Black segments, mating type-specific segment of each gene pair; diagonally hatched segments, conserved TM exons; arrows, PvuII sites; thin black bars, probes; size (kb) shown is that of the relevant PvuII fragment in the somatic genome (the corresponding germline PvuII fragments are not visible due to differences in size and copy number).

A mating type was assigned to each germline gene pair segment by Southern blot analysis using probes from unique regions of each germline gene pair. Each probe was found to be mating type-specific, hybridizing to a single band from the somatic nucleus of one mating type ( Figure 4 ). This result clearly shows that only one mating type gene pair remains in the somatic nucleus. The order of the mating type gene pairs in the germline was identified as II – V – VI – IV – VII – III ( Figure 3 ).

The locus is a 91-kb tandem array of six incomplete, head-to-head mating type gene pairs, in the order II, V, VI, IV, VII, and III (order established as shown in Figure 4 ). Each gene pair begins to the left with the MTA conserved TM exon (diagonal lines) and ends with the MTB conserved TM exon (dark gray). Only the terminal genes (MTA2 and MTB3) have full length versions of their TM exons. The mating type-specific, somatic-destined segment for each mating type gene pair, which includes the 5′ MTA and MTB segments and the intervening upstream spacer region (putative promoter), is shown as a single thick colored bar. Between the TM exon segments of adjacent gene pairs, there is a small amount of germline-limited sequence (GLS; black). Several IESs are located within the mating type-specific segments (also black). Excluding IES sequence, the mating type-specific segments are of comparable size: II, 8,673 bp; V, 9,132 bp; VI, 9,352 bp; IV, 8,450 bp; VII, 8,277 bp; and III, 8,384 bp. Exact coordinates of all these features are given in Table S4 .

To identify the genes of the germline mat locus, we used the mt VI MTA6 and MTB6 gene pair sequence as query in a BLAST search of the SB210 germline genome sequence (Tetrahymena Comparative Sequencing Project, Broad Institute of Harvard and MIT, http://www.broadinstitute.org/ ). Multiple matching discontiguous segments were observed over a 91-kb region of the germline. The mating type-specific segments of MTA6 and MTB6 matched once in the middle of this region. Additional matches were due to the conserved TM exons of MTA6 and MTB6, each of which matched six times within this region. This led us to identify five additional gene pairs containing sequences homologous to those of the TM exons of MTA6 on the left and MTB6 on the right. The genes are arranged in a tandem array of six similarly oriented gene pairs, the number of mating types encoded by the mat-2 allele ( Figure 3 ). Sequence immediately flanking the mat locus is identical in the germline and somatic genomes. Before carrying out detailed analysis of the mat locus, we filled all sequence gaps in this region and corrected sequence errors ( Tables S2 and S3 ). In the finished sequence we found that each gene pair consists of an MTA- and an MTB-like gene. These are composed of a unique mating type-specific segment, and a terminal TM exon segment that is highly conserved among the MTA (or MTB) genes. The germline mat locus lacks a complete gene pair. The mat locus array begins and ends with the only complete genes within the array, later shown to be MTA2 and MTB3, respectively ( Figure 3 ). The TM exons of all the other mating type genes are truncated, indicated by the use of lower case “tm” (for example, MTA-tm or tm). Assembly of a somatic mating type gene pair requires joining of mating type-specific segments to the full-length copies of the MTA2– and MTB3-TM exons located at the ends of the array.

To determine whether other mating types express genes containing the TM exons shared by mts V and VI, we isolated RNA from starved, mature strains of each mating type ( Table S1 ). Northern blot analysis revealed that cells of every mating type have MTA– and MTB-like transcripts ( Figures 2 and S3 ). The length of the transcripts is similar to the lengths of the RNA-seq assembled transcripts, 4.8 kb for MTA6 (mt VI MTA) and 5.7 kb for MTB6 (mt VI MTB). These results, in combination with the RNA-seq results, support the hypothesis that all mating types have MTA and MTB genes consisting of two segments: one encoding a highly conserved TM segment found in all mating types and the other encoding a larger mating type-specific segment.

Each gene of the mt VI gene pair was deleted independently to investigate the functional relationship between the two genes. For both single knockouts, RT-PCR showed that removal of one gene did not abolish expression of the remaining gene ( Figure S4C ). Three independent MTB knockouts (MTB–) gave the same results as the gene pair knockout. No progeny were produced when MTB– cells were mixed with wt cells of a different mating type. The MTA knockout (MTA–) retained mating specificity but very little mating competence. It paired extremely poorly and rarely produced progeny (0.16% on average) when mated with wt cells of a different mating type. No pairs or progeny were detected when it was mated to cells of the same (mt VI) mating type. Identical results were obtained with three independent knockout clones.

If the MTA and MTB genes determine mating type, they may also be essential for mating. This was addressed by removing the entire somatic gene pair of mt VI (SB210) by homologous gene replacement ( Figure S4A and S4B ) [23] . The gene pair knockout (MT–) abolished the cell's ability to pair or produce progeny when mixed with starved wild-type (wt) cells of a different mating type or with cells of the same mating type. Identical results were obtained with three independent knockout strains. In contrast, control assays of mating between two wt strains of different mating types showed high levels of pair formation and produced abundant (>85%) progeny.

The MTA and MTB genes identified above are arranged head to head, are divergently transcribed ( Figure 1B ) and are predicted to code for unique proteins. The MTA gene (TTHERM_01087810, KC405257) is predicted to encode a 161-kD protein while the MTB gene (TTHERM_01087820, KC405257) is predicted to encode a 194-kD protein. Each terminal exon is unique in the somatic mt VI genome sequence and both are predicted to encode transmembrane (TM) helices. TM domain proteins that can localize to the cell surface could play a role in self/non-self recognition, since cell-cell contact is required to stimulate cells to mate [20] – [22] .

Whole-cell RNA was extracted from starved mature strains of mating types II through VII (SB4208, SB4211, SB4214, SB4217, SB4220, and SB4223; see Table S1 ). Probes from within each conserved TM exon were hybridized to Northern blots. The MTA6-TM probe hybridized to a ∼5-kb transcript (left panel), while MTB6-TM probe hybridized to a ∼6-kb transcript (right panel). RPT3, a 26S proteasome subunit P45 family protein (XP_001007748) expressed during starvation, was used as a loading control. The RPT3 probe hybridized to the expected ∼1.3-kb transcript. The complete blots are shown in Figure S3 .

(A) RNA-seq data from mt VI and mt V cells [18] mapped to a ∼300-kb region of the SB210 macronuclear reference genome (mt VI) (see Figure S2 ). The graph shows the number of RNA-seq reads (y-axis) from growing mt VI cells (orange, positive values), 3-h starved mt VI cells (blue, positive values) and 3-h starved mt V cells (red, shown as negative values) that mapped to the ∼300-kb region. Orange overlays blue. The box encloses a segment containing two genes with mating type-specific expression in starved cells and no expression in growing cells. x-axis: position within the 300-kb segment. (B) Transcripts (mt VI, blue) and transcript segments (mt V, red) were assembled from RNA-seq reads mapping to the boxed region in (A) and, for mt VI, from sequenced RT-PCR products. 5′ and 3′ untranslated regions are not included. The mt VI-derived transcripts correspond to a pair of divergently transcribed predicted genes (KC405257), now named MTA6 and MTB6, respectively. Thin connecting lines represent introns. Both transcripts are drawn to scale, where each tick mark on the scale represents 1 kb. Each gene contains a TM exon and furin-like repeats (*).

The genetically mapped mat locus [15] – [17] was assigned to a roughly 300-kb segment of a somatic chromosome sequence assembly ( Figure S2 ). As cells must be starved to mate, we assumed that a candidate mating type gene would be expressed in a mating type-specific manner during starvation and not expressed during growth. In a previous whole-transcriptome RNA-seq study [18] , mRNA was prepared and sequenced from starved SB4217 (mating type V or mt V) cells as well as from starved and growing SB4220 (mt VI) cells ( Table S1 ). To identify mating type candidate genes, we mapped the RNA-seq reads to the 300-kb segment of the mt VI somatic reference genome [15] , [19] . Two adjacent genes in this region showed mating type-specific expression in starved cells and no expression during growth ( Figure 1A ) making them good mating type gene candidates. We named these genes MTA and MTB. A transcript for each gene was assembled primarily from reads that mapped to the mt VI reference genome. Reads from mt VI covered both genes except for one small gap in MTA, which was filled in by cDNA sequencing (unpublished data). Northern blot analysis ( Figures 2 and S3 ) confirmed a single transcript for each mt VI gene. Only the terminal exons of MTA and MTB could be assembled from the mt V reads that mapped to the mt VI reference genome ( Figure 1B ). In addition, a partial transcript was assembled de novo from the mt V RNA-seq reads ( Text S1 ). Two thirds of this partial transcript has 99.9% identity with the terminal exon of mt VI MTA gene but the remainder is absent from the mt VI somatic reference genome and could encode a mating type-specific segment.

Discussion

Our findings suggest that mating type determination in T. thermophila involves a remarkable type of programmed genome rearrangement. We have identified a pair of mating type genes that are arranged head-to-head. Each mating type is characterized by a similarly organized pair of somatic genes and each gene of the pair encodes a TM domain shared by all mating types. Starvation is required for mating and induces transcription of both genes. Both genes are required for wt levels of pair formation and progeny production. The germline genome contains an array of incomplete gene pairs, one for each mating type. During development of the somatic nucleus in progeny cells, the germline array undergoes rearrangement to assemble one complete gene pair and delete all others in the somatic chromosome. Thus, mating type determination occurs by deletion rather than by an epigenetic gene silencing mechanism. These findings account for the irreversibility of mating type determination. The mating type locus can be thought of as a multi-state developmental switch where the switch is stochastically and permanently set to one state in the somatic genome.

The removal of either or both genes caused a significant inhibition of pairing between cells of different mating types, suggesting the MTA and MTB genes are both fundamental for recognition of cells of a different mating type (allorecognition). This inhibition of pairing suggests that the gene products may be functioning cooperatively for allorecognition. In addition to allorecognition, the gene products could be distinguishing self to prevent homotypic pairing. If this were the case, homotypic pairing would be observed in the absence of one or both genes. This does not appear to be a function of the MTA and MTB genes because pairing between starved cells of the same mating type was not observed in our knockouts.

At least two events are required to assemble a complete somatic mating type gene pair from the mat germline array (see model shown in Figure 7). At the left end of the gene pair, the MTA-tm segment must be joined to the single copy, full-length MTA2-TM exon located at the far left end of the array. At the right end of the same gene pair, the MTB-tm segment must join to the single copy, full-length MTB3-TM exon located at the far right end of the array. The breakage and rejoining mechanism is highly precise. Since both joining events occur within translated exons segments, without this precision mating competence could be lost. Possible mechanisms include homologous recombination and precise nonhomologous end joining. The mechanism will become clearer once we experimentally determine which of the observed recombination events are essential to mating type determination and which are unrelated to this process. Regardless of the mechanism, an interesting question is how joining at opposite ends is coordinated to result in the assembly of a somatic gene pair. A stochastically selected germline gene pair may be epigenetically marked, its two ends cut, and full length TM exons joined coordinately. Alternatively, each end could be processed independently resulting in the deletion of one or more gene pairs from either end, until only one complete gene pair remains. Additional knowledge of the mechanism will be needed to understand how mating type frequencies are influenced by environmental conditions, such as temperature and nutritional state [30],[31].

PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Figure 7. Model proposing that homologous recombination assembles a single mating type gene pair during somatic differentiation. In this model, intramolecular recombination events are initiated at both ends of the germline array; subsequent resolution results in removal of intervening gene pairs by looping out and joining of a gene pair to the full length TM exons at the ends of the array. Any number of gene pairs could be excised in a single recombination event; since the chromosomal product regenerates the recombination substrate, recombination steps can be reiterated until a single, complete gene pair remains, at which point the process has to stop. Sequestering or disabling the ability of side products to recombine again would minimize unproductive reversal of the process. Recombination events need not always involve a full length TM exon; two internal tm exons could also be involved at intermediate steps. The recombination process is labeled “homologous recombination” for simplicity, but identical results could be obtained by highly precise non-homologous end-joining. Side products containing a discrete number of gene pairs, shown here as circular, could also be linear depending on the details of the recombination and repair mechanism. A related DNA rearrangement model of T. thermophila mating type determination, also involving recombination and alternative deletion in a tandem array of germline mating type genes was proposed previously [14]. The key conceptual difference is that in the original model a unique segment was somatically attached at one end of an individual mating type gene, instead of attaching unique segments at both ends of a mating type gene pair, as reported here. https://doi.org/10.1371/journal.pbio.1001518.g007

In addition to the single, simple recombination events associated with mating type determination, we have observed secondary recombination events in somatic TM exons, especially MTB TM exons. These events are particularly frequent in the MTB TM exons of mature cell lines (Table S7). As explained in Text S9, artifacts of PCR template-switching are excluded in these results. Since the majority of joined TM exons from 24-h exconjugants show no evidence of secondary recombination, these events are probably unrelated to mating type determination. Presumably they chiefly reflect recombination between multiple somatic chromosome copies carrying independently differentiated TM exons prior to the purification brought about by assortment during vegetative multiplication (Figure S1). A number of recombination events, most simply interpreted as gene conversions, have also been detected among MTB exon haplotypes. We believe that these MTB gene conversions are also due to the secondary recombination described above and are unrelated to Tetrahymena mating type determination, in part because gene conversions are found in only a small minority of the sequenced TM exons in 24-h exconjugants. In addition, gene conversion per se cannot result in the loss of intervening mating type gene pairs. Gene conversion is responsible for mating type switching in yeast, but no intervening DNA is lost in yeast mating type switching [32].

Programmed somatic DNA rearrangements are well known among the ciliates [33],[34]. In T. thermophila, approximately 6,000 IESs in the germline genome are excised during differentiation of a new somatic nucleus [35]. The deletions that join TM exons to mating type-specific segments differ in several important ways. IES excision is imprecise; precision is not required, as nearly all IES are found in intergenic regions or within introns [36]. In contrast, the deletions involved in mating type determination are highly precise and occur within the coding segment of the TM exon. Furthermore, IES excision is maternally controlled; only sequences absent from the parental somatic genome are targeted for elimination [37],[38]. Mating type, on the other hand, is stochastically inherited; determination of mating type in each progeny cell occurs autonomously during the differentiation of the new somatic nucleus. Mating type-specific sequences absent from the parental somatic nucleus escape deletion by the IES excision mechanism and are retained in progeny somatic nuclei. Finally, preliminary experiments (unpublished data) indicate that mating type determination occurs several hours after excision of IES within the mat locus. All these considerations lead us to conclude that these two processes, which occur in the differentiating somatic nucleus, proceed by different mechanisms. In mating type determination, DNA breakage and rejoining occurs physically independently and precisely at both ends of one gene pair. This leads to the assembly of one complete gene pair and the excision of the other germline gene pairs from the somatic chromosome. To our knowledge, this type of programmed genome rearrangement is novel, at least in ciliate molecular biology.

The modular organization of the T. thermophila germline mat locus (Figure 3) in combination with rare unequal meiotic crossing-over between homologous germline TM/tm domains could facilitate rapid evolutionary change in the number of available mating types. This hypothesis is consistent with the existence of two T. thermophila germline mat allele classes specifying different numbers of mating types (five for mat-1 and six for mat-2). mat-1-like alleles carry mt I but are missing mts IV and VII. mat-2-like alleles are the opposite, carrying mts IV and VII in adjoining gene pairs while missing mt I. Using somatic genome sequence data we assigned a mating type to the sequenced strains of two other Tetrahymena species by virtue of their similarities to T. thermophila mating types. This suggests that a similar mating type system is conserved in multiple Tetrahymena species. If so, the mechanism proposed above could also explain the finding that the number of mating types described in species of the genus Tetrahymena is dynamic, ranging from 3 to 9 (reviewed in [39]). Using the strong sequence conservation observed at the TM exons, it may be possible to isolate and sequence mating type genes from many species of the genus Tetrahymena to investigate the evolution of their mating type system.

T. thermophila is a model organism for eukaryotic biology [15]. Future research of this mating type system should advance our knowledge in several areas of biology. The biochemical functions of the MTA and MTB gene products are of interest for understanding the principle of self/non-self discrimination. The study of genomic rearrangements employed for mating type determination can inform mechanisms of genome dynamics in other systems.