Plant genomes are complex and contain large amounts of repetitive DNA including microsatellites that are distributed across entire genomes. Whole genome sequences of several monocot and dicot plants that are available in the public domain provide an opportunity to study the origin, distribution and evolution of microsatellites, and also facilitate the development of new molecular markers. In the present investigation, a genome-wide analysis of microsatellite distribution in monocots (Brachypodium, sorghum and rice) and dicots (Arabidopsis, Medicago and Populus) was performed. A total of 797,863 simple sequence repeats (SSRs) were identified in the whole genome sequences of six plant species. Characterization of these SSRs revealed that mono-nucleotide repeats were the most abundant repeats, and that the frequency of repeats decreased with increase in motif length both in monocots and dicots. However, the frequency of SSRs was higher in dicots than in monocots both for nuclear and chloroplast genomes. Interestingly, GC-rich repeats were the dominant repeats only in monocots, with the majority of them being present in the coding region. These coding GC-rich repeats were found to be involved in different biological processes, predominantly binding activities. In addition, a set of 22,879 SSR markers that were validated by e-PCR were developed and mapped on different chromosomes in Brachypodium for the first time, with a frequency of 101 SSR markers per Mb. Experimental validation of 55 markers showed successful amplification of 80% SSR markers in 16 Brachypodium accessions. An online database ‘BraMi’ (Brachypodium microsatellite markers) of these genome-wide SSR markers was developed and made available in the public domain. The observed differential patterns of SSR marker distribution would be useful for studying microsatellite evolution in a monocot–dicot system. SSR markers developed in this study would be helpful for genomic studies in Brachypodium and related grass species, especially for the map based cloning of the candidate gene(s).

Copyright: © 2011 Sonah et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

The present study deals with a genome-wide comparative analysis of microsatellite distribution in the nuclear and chloroplast genomes of monocot and dicot species, the development of genome-wide SSR markers, and the validation of a subset of these new markers in Brachypodium.

Thousands of SSR markers have been developed in rice and their application in other grass species has been proven and well documented [10] - [12] . However, it is postulated that the Brachypodium genome would exhibit a higher level of collinearity to the genomes of temperate cereal crops as compared to the rice genome. Therefore, SSR markers developed in Brachypodium may be utilized more effectively in wheat compared to rice SSR markers. Moreover, several features such as a typical plant structure, growth habit, rapid generation time, self pollination and a compact genome size (∼300 Mb) make Brachypodium an excellent model system for functional and structural genomic studies in cereals and grasses [13] – [14] . Several efforts have been made to make Brachypodium a more useful and convenient model system for genomic studies [14] . However, to develop any model system for genomic studies, a dense molecular genetic linkage map and genome-wide distributed molecular markers are required. Recently, a genetic linkage map of 139 marker loci for Brachypodium was developed using F 2 population derived from a cross between diploid lines Bd3-1 and Bd21 [15] . The map was constructed using SSR markers derived from the EST, BAC end sequences and information available on conserved orthologous sequence (COS) in other grass species. Currently, efforts are underway to develop a dense molecular genetic linkage map for Brachypodium using different types of markers. However, a genome-wide resource of a large number of informative molecular markers is required to supplement this effort.

Microsatellites or simple sequence repeats (SSRs) are co-dominant, abundant, multi-allelic, and uniformly distributed over the genome, and can be detected by simple reproducible assays [1] . These important features have made microsatellites the markers of choice for marker-assisted plant breeding, DNA fingerprinting of genetic resources, molecular mapping and map based cloning of specific genes. Microsatellite markers have also been used in several studies to define conserved regions among related species [2] – [4] . Initially, SSR markers were developed from expressed sequence tags (ESTs) and bacterial artificial chromosome (BAC) end sequences in most plant species. However, whole genome sequencing has led to the identification of numerous SSR markers that are distributed over the entire genomes of rice and Arabidopsis. The mechanism of microsatellite evolution and their genome-wide distribution, however, are still not well studied in plants mostly due to the lack of genomic information. The recently sequenced genomes of Brachypodium [5] , Populus [6] , Medicago ( www.jgi.doe.gov ) and sorghum [7] along with the already well-characterized sequenced genomic information available for rice [8] and Arabidopsis [9] , will facilitate the comparative genomics studies in plants. Besides understanding genome organization, sequenced genomes can be effectively used for the generation of molecular markers and their cross species utilization, specifically for those species where very little or no genomic information is available.

Results and Discussion

Microsatellite distribution in monocot and dicot species A total of 797,863 SSRs were identified among six plant genomes — three monocot (Brachypodium, sorghum and rice) and three dicot (Arabidopsis, Medicago and Populus) plant species (Table 1). Among the six genomes analyzed, the maximum number and frequency of SSRs were obtained from Populus followed by Medicago whereas the sorghum genome had the lowest frequency. The frequency of SSRs was considerably higher among dicots compared to monocots. Among monocots, the frequency of SSRs in the rice genome was nearly twice that in sorghum and Brachypodium genomes (Table 1). Since the six selected plant species belonged to very diverse groups of monocots and dicots, the distribution pattern of SSR motifs, with specific sequences in these genomes, was not uniform. However, the overall pattern of SSR motifs of particular lengths was similar. Mono-nucleotide repeats dominated over other type of repeats in all the six plant species. However, the frequency of SSRs decreased stepwise with increase in motif length (mono- to hexa- nucleotide repeats) except in Brachypodium where the frequency of tri- nucleotide repeats was higher than that of the di-nucleotide repeats (Figure S1). Mono-nucleotide repeats were found to be minimum (43%) in sorghum and maximum (79%) in Medicago genomes. While the mono-, di- and tri-nucleotide repeats mostly contributed to the major proportion of SSRs, a very small share was contributed by tetra-, penta- and hexa-nucleotide repeats. A maximum of 5.4% contribution of tetra-, penta- and hexa- nucleotide repeats, was observed in the sorghum genome. A similar trend was observed for other genomes studied in the present investigation (Table 1). PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Table 1. Distribution of microsatellite with respect to motif length and genome size in monocot and dicot plant species. https://doi.org/10.1371/journal.pone.0021298.t001 Among the two types of mono-nucleotide repeats, (A/T)n was the most abundant in all the plant species while (G/C)n was comparatively scarce (Table 2). In the mono-nucleotide repeats category, the maximum (99%) A/T repeats were present in the Arabidopsis genome and the minimum (78%) in the Brachypodium genome. In the di-nucleotide repeat category, the distribution of SSRs in different motif types was not uniform and the most frequent motif type was different for each plant species. For example, AG/CT repeats were more frequent in Brachypodium and rice with 50.7% and 41.9% frequency, respectively; whereas AT/AT repeats were more frequent in Populus (60.5%) and Medicago (59.9%). In rice, both AG/CT and AT/AT repeats dominated other di-nucleotide repeats. Interestingly, the CG/CG motif contributed less than 0.5% in dicots, whereas it was 3.1%–7.0% in all di-nucleotide repeats identified in the monocots. The analysis of mono- and di-nucleotide repeats concluded that CG-rich motifs were least preferred in both monocot and dicot genomes. However, for tri-nucleotide repeats the AGC/CGT, AGG/CCT and CCG/CGG were observed more frequently in all the monocot species, whereas A/T-rich repeats, such as AAC/GTT, AAG/CTT and AAT/ATT, were preferred in dicots (Table 2). The frequency of tetra-, penta- and hexa-nucleotide repeats was very low in all the plant genomes investigated in the present study and their motif-wise distribution was not significant across the genomes. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Table 2. Frequency of different types of motifs in a class of microsatellites with mono-, di-, and tri-nucleotide repeats analysed in monocot and dicot plant genomes. https://doi.org/10.1371/journal.pone.0021298.t002 The dominant occurrence of repeat motifs, of a particular sequence and length, in plant genomes is the outcome of selection pressures applied on that specific motif during evolution. The molecular mechanisms for the origin of microsatellites are not completely understood. The most common mutational mechanism affecting microsatellites is replication slippage, a process involving addition or removal of one or more motif repeats; however other mechanisms, such as unequal crossing over, nucleotide substitutions, or duplication events, have also been considered to be responsible for microsatellite variations [16]–[18]. However, these theories cannot explain the species-specific accumulation of particular motif repeats observed in the present study. Other factors, such as codon preference, DNA replication and the mismatch repair system, as well as structural and functional attributes of genomes that are unique to the species or for the particular taxon, may be responsible for the unique microsatellite distribution patterns in plant genomes. Moreover, the SSR length, motif structure and G/C content of a genome are considered to be factors influencing microsatellite evolution [19]–[21]. Polymorphism among SSRs is a repeat length polymorphism due to repeat elongation/shortening events, which indicates that such processes are important factors for molecular evolution. The repeat elongation/shortening processes also lead to increase in biological complexity, which is a characteristic of biological evolution. It is known that SSRs within genes are substantially involved in the regulation of evolutionary processes as SSRs in the protein-coding regions can lead to a gain or loss of gene function. Earlier, sequence variations in genomes, particularly in microsatellite distribution, were supported by the theory of stabilization patterns and potential secondary structures, as well as factors such as the mismatch repair enzymes [22]–[24]. All these theories, which deal with a particular factor being responsible for sequence preference, were suggested based on very limited knowledge and lack of genome-wide information on a large variety of genomes. Till date, very little work has been done to propose a genome-wide mechanism for the selection of microsatellite motifs with a particular sequence. These mechanisms may be further illustrated with available genomic resources, and the data presented in this paper would definitely help in the understanding of microsatellite evolution in the genomes of plant species.

Microsatellite distribution in coding regions Microsatellites were identified in the coding DNA sequences (CDS) of six plant species to study the pattern of distribution in the coding regions of monocots and dicots. A total of 36,585 SSRs were identified in the 238,798 CDS of about 269.3 Mb size data for all the six plant species included in this study. Interestingly, the frequency of SSRs observed in the CDS region (CDS-SSRs) of monocots was twice that observed in the CDS of dicots (Table S1). The highest frequency (203.7 SSR/Mb) of CDS-SSRs was identified in rice followed by sorghum (181.1 SSR/Mb) whereas the lowest frequency (68.1 SSR/Mb) was observed in Populus. Tri-nucleotide repeats were found to be most abundant among the microsatellites in the coding region of plant genomes (Figure 1) and contributed to about 93% of SSRs in monocots and about 76% of SSRs in dicots. Such an accumulation of tri-nucleotide repeats in the coding regions was mostly due to the triplet-repeat nature of the codon. Mono-nucleotide repeats contributed about 2% of such SSRs in monocots while in dicots it was 14.2%; this variation may be due to the more frequent occurrence of A/T repeats in dicots. Moreover, G/C-rich repeats in the CDS region of monocots were identified with much more frequency than in dicots (Table 3). In the category of mono-nucleotide repeats, A/T repeats dominated over G/C repeats in both monocots and dicots except in rice where G/C repeats contributed to 53.1% of mono-repeat SSRs. Although in monocots G/C repeats were slightly less than A/T repeats, these were very infrequent in dicots, and as low as 3.4% in Arabidopsis (Table 3). Interestingly, in the category of di-nucleotide repeats, GC/CG repeats were predominant in monocots with an average of 49.9%, while they were completely absent in dicots. The AG/CG repeats accounted for an average of 41.1% SSRs in monocots and were also major contributors in dicots with an average of 73.5% di-nucleotide SSRs. Tri-nucleotides with repeat motifs CCG/GGC were dominant only in monocots and contributed to 51.5% of the total tri-nucleotide repeats identified in monocots, whereas only 1.9% of these repeats were present in dicots. Tri-nucleotides with repeat motifs AAG/CTT accounted for 29.5% of the total tri-nucleotide repeats in dicots. The microsatellite distribution pattern in the CDS region was found to be very unique for the monocot and dicot species. Functional annotation of Brachypodium CDS sequences containing CG-rich repeats revealed that about 50% of these genes were involved in binding activity (Table S2). PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Figure 1. Distribution of microsatellites in coding DNA sequences (CDS) of six plant species with respect to motif length. Microsatellites were identified with criteria of mono- to hexa-nucleotides motifs using MISA software tool, and the minimum repeat unit was defined as 10 for mono-, 6 for di-, and 5 for tri-, tetra-, penta-, and hexa-nucleotides. https://doi.org/10.1371/journal.pone.0021298.g001 PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Table 3. Frequency of microsatellite motifs in coding DNA sequence of six plant species. https://doi.org/10.1371/journal.pone.0021298.t003 Monocots and dicots are thought to have diverged from a common ancestor approximately 200 million years ago [25]. In several comparative genomic studies, Arabidopsis and rice have been considered as models for dicots and monocots, respectively. However, numerous interesting findings have emerged while comparing these two genomes; for example, rice genes are longer and GC-rich than Arabidopsis genes [26]. Though Arabidopsis has the smallest genome among the dicot species, it is thought to have evolved by chromosomal duplication; while the rice genome, which is comparatively larger than the Arabidopsis genome, showed more duplication [26]–[29]. The GC-rich monocot genomes may have microsatellites with GC-rich motifs whereas dicots lack GC-rich motifs. The relationship between microsatellite evolution and chromosomal duplications has not been well studied. The duplicated regions are thought to have different selection pressures than other regions, which may be a reason for motif preference and frequency in monocots and dicots. Such a biased selection of SSRs was observed in Populus where most of the SSRs in the coding regions are missing at the duplicated chromosomal segment mostly due to loss of corresponding genes [30]. This emphasizes the role of microsatellites in gene and genome evolution. Although, a systematic study on a number of genomes is required to make any definite conclusions, recent developments in sequencing technology and the availability of an increasing number of genome sequences for analysis would definitely provide a basis for the study of microsatellite evolution in plants [31].

SSRs frequency in plant chloroplast genomes A total of 337 SSRs were identified for the chloroplast genome of the six plant species analyzed in this study. The highest frequency of SSRs was identified in the chloroplast genome of Populus followed by Medicago and Arabidopsis (Table S3). Compared with dicots, monocots had very infrequent SSRs in the chloroplast genome. Most of the SSRs identified in the plant chloroplast genome were mono-nucleotide repeats that contributed about 92.5% of the total SSRs (Figure 2). In the mono-nucleotide category of repeats, A/T contributed 97.4% of the total repeats. For nuclear genomes, G/C repeats were predominantly found in the sorghum and Brachypodium chloroplast genomes. Although, the number of di- and tri-nucleotide repeats identified in the chloroplast genome is not enough to compare patterns in the chloroplasts of monocots and dicots, in a broader context the chloroplasts of dicots were richer in SSRs with di- and tri-nucleotide repeats, which were otherwise lacking in monocots. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Figure 2. Distribution of microsatellites in the chloroplast genome of six plant species with respect to motif type. Microsatellites were identified with criteria of mono- to hexa-nucleotides motifs using MISA software tool, and the minimum repeat unit was defined as 10 for mono-, 6 for di-, and 5 for tri-, tetra-, penta-, and hexa-nucleotides. https://doi.org/10.1371/journal.pone.0021298.g002

SSRs frequency in Brachypodium genomes A total of 51,875 SSRs were identified in the 271 Mb sequence of the Brachypodium genome with a maximum of 30,573 mono-, followed by 10,625 di- and 9,407 tri-nucleotide repeats (Table 4). The penta- and hexa-nucleotide repeats were only 1,270, which represented 2.5% of the total SSRs identified in the Brachypodium genome. Chromosome 4 of Brachypodium contained the maximum frequency (197 SSRs/Mb) of SSRs, whereas chromosome 5 contained the minimum frequency (175 SSRs/Mb) of SSRs. The frequency of SSRs in the different chromosomes was almost uniform and the overall frequency of SSRs in Brachypodium was 191 SSRs/Mb. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Table 4. Chromosome-wide distribution of microsatellites in the B. distachyon genome. https://doi.org/10.1371/journal.pone.0021298.t004 Interestingly, the frequency of SSRs observed on the short arm of chromosome 5 was much lower than that on the long arm. This low frequency of SSRs in the short arm was common for all types of motifs (Figure S2). The short arm of chromosome 5 (Bd5s) has several features that are different from the rest of the chromosomes [5]. These include a low gene density (roughly half of the rest of the chromosomes); a high LTR retrotransposon density with the youngest intact Gypsy elements; and the lowest solo LTR density. These attributes may be responsible for the low frequency of microsatellites in Bd5s.