Sample collection and morphological studies

We examined 71 hair samples from 22 individuals of Choloepus hoffmanni, 2 individuals of C. didactylus, 10 individuals of Bradypus tridactylus, 17 individuals of B. variegatus, 12 individuals of B. pygmaeus and 8 individuals of B. torquatus. The sampled sloths came from French Guiana, Panama, Costa Rica and Brazil. The sampling was done in dry or wet season and details of collection sites, sloth species and dates are presented in the Additional file 1. A small tuft of sloth hair was sampled from a greenish patch, if the animal was visibly green or a darker patch if no greenish coloring was observed. The hair was removed with scissors and preserved in a plastic vial containing silica gel. Samples were stored in silica gel at ambient temperature until further processing, which usually varied from one to three months. All hair samples were studied with a light microscope. If green algae were visibly present they were photographed for further comparison and preserved samples were kept in silica gel for herbarium material. In addition to the sloth hair samples we collected environmental samples from 12 locations on Barro Colorado Island (Additional file 2). The samples were scraped from the surface of tree trunks, or in one case from the surface of a metal pole, from patches which were visibly green. The greenish material was scraped using a scalpel, stored in a plastic vial containing silica gel and stored as described above.

DNA extraction, PCR, Cloning and Sequencing

DNA was extracted with phenol-chloroform [17] and further purified using the High Pure PCR Template Preparation Kit (Roche), according to the manufacturers' instructions. A 1.5 kb fragment of the 18S rRNA was amplified using the universal primers UNI7F and UNI1534R [18]. However, in some hair samples we were only able to get sequences from other eukaryotes but not from green algae, even though they were clearly visible under the microscope. Therefore, additional primers 501F (5' GGGTCTGGTTTTGAAATGAGG 3') and 1700R (5' CCGAAGTCTTCACCAGCACATC 3') were used to amplify a 1.2 kb fragment of the 18S gene from the green algae. Alternatively a combination of universal and green algal specific primers was used to amplify the green algae present in the samples. A combination of primers 107F (5' CGAATGGCTCATTAAAT 3') and UNI1534R were used to amplify the green algae in the sample BT54_10, which was collected in Brazil.

The PCR amplification was done using Taq DNA polymerase (ABgene) in 25 μl reactions. 0.6 μl Bovine Serum Albumin (BSA, Fermentas, 1 mg/ml) was added into the 25 μl PCR reaction to prevent potential inhibitors in the total DNA carried from hair samples to the PCR reactions. The fragments were amplified in a PCR cycle of initial denaturing of 3 minutes at 94°C, followed by 30 cycles of 1 minute at 94°C, 1 minute at 50°C, 2 minutes at 72°C and a final extension of 5 minutes at 72°C. Products were run on 1% agarose gel and the fragments of desired length were cloned using pGEM®-T Easy Vector System II -cloning kit (A1380, Promega) according to the manufacturers' instructions. Positive colonies were picked with a toothpick, dipped into PCR reaction mixture with Taq DNA polymerase (ABgene) and PCR amplified using vector primers T7 (5' TAATACGACTCACTATAGGG 3') and SP6 (5' ATTTAGGTGACACTATAGAA 3').

The PCR products from the clone libraries were purified using Illustra GFX™ PCR DNA and Gel Band Purification Kit (GE Healthcare) or MultiScreen®PCR μ96 (Millipore). They were sequenced using primers SP6 and T7 or primers used for PCR amplification (UNI7F, UNI1534R, 501F or 1700R) as well as with internal universal primers 384F (5' ACCACATCCAAGGAWGGCA 3') or 18S384F (5' GGCKACCACAUCCAAGGAWGGCA 3') designed for this study. The samples were loaded on an automated sequencer 3730×l (Applied Biosystems) or ABI3130XL Genetic Analyzer (Applied Biosystems).

Sequence alignment and phylogenetic analyses

The 18S rRNA gene sequences obtained in this work were manually corrected using Chromas 2.31 (Technelysium, Pty Ltd.) sequence analysis software. Vector sequence was removed and all sequences shorter than 650 bp were excluded from further analysis. The dataset of 426 sequences (hair samples) and 78 sequences (environmental samples) were analyzed using the BLAST network service [19] against nr database at NCBI and results were parsed for taxonomic information.

In order to obtain a reliable alignment for further analyses, 43 sequences of sloth hair that began at the 3' end of the 18S gene were removed. For the remaining 383 sequences operational taxonomic units (taxa) were defined at 97% similarity (Additional file 3) from the MAFFT vs. 6 [20] aligned (1.53 gap opening penalty, 0.123 gap extension penalty) and Phylip version 3.6 [21] -Kimura 2-parameter corrected distance matrix using the furthest neighbor command of the DOTUR program [22]. Coverage [23] was calculated for the whole dataset by the following equation: C = (1 - n/N)*100%, where C is the coverage percentage, n is the number of taxa (97% similarity) appearing only once, and N is the number of all sequences in the library. The estimated number of taxa (S Chao1 ) and Shannon diversity index with 95% confidence intervals were calculated using DOTUR. Principal Coordinate Analysis followed by canonical analysis on PCO axes and calculation of canonical test statistic (t 2 ) was performed with the program CAP [24, 25] using chi-square distance and visualized with SPSS 15. The chi-square distance was chosen since it standardizes differences in scale and emphasizes changes in composition rather than changes in abundance. This was chosen because the abundance of sequences was derived from PCR amplification which is not a real abundance, but a compositional view. The program determined the appropriate number of dimensions (m) included in the canonical analysis and identified the taxa that were responsible for multivariate patterns. The number of each 97% taxa was treated as one variable and a sloth species collected from one locality was treated as a sample.

The newly determined green algal 18S rRNA gene sequences were compared to a broad selection of corresponding sequences from members of the Chlorophyta. The closest matching full 18S rRNA gene sequence for each algal clone, as found from BLAST [19] searches, was also included. The selection of sequences was based on a phylogenetic tree comprising an expanded sample of more than 1500 rRNA gene sequences from the green algae which is available in the 18S rRNA gene sequence database maintained in the ARB program [26]. This database was updated with all currently available 18S rRNA gene sequences from the Chlorophyta. Newly determined almost full sequences (>1600 bp long) as well as partial sequences from clone libraries were added to the database using the parsimony interactive tool in ARB. The alignment was refined by comparing the sequences with their next relatives from the ARB database based on their pairing across a helix using secondary structure models as implemented in ARB. This program generates a MP tree from all sequences and all positions in the database as its reference tree, using a filter based on 50% base frequency across all species. A subset of these sequences comprising a total of 63 representatives of the green algal classes Ulvophyceae, Trebouxiophyceae and five prasinophytes (with prasinophyte sequence AF203402 as outgroup) was then downloaded from the ARB database for further analyses using the 50% base frequency filter. The final alignment of 68 taxa (23 algal sequences from sloth hair, 2 from tree bark) was 1690 nucleotides in length. Of the aligned sites, 419 were parsimony informative and an additional 185 were variable but not informative. An optimized maximum likelihood tree (see below) was then uploaded into ARB. To infer the phylogenetic positions of additional rDNA clones (almost full and partial sequences) which were not used for phylogeny construction were added to this tree using the parsimony interactive tool.

Maximum likelihood phylogenies were calculated using PAUP* version 4.0b10 [27] and GARLI v0.96 http://www.nescent.org/wg_garli[28] (Additional file 4 and Additional file 5). ModelTest 3.7 [29] used in conjunction with PAUP 4b10 determined that the TrN+I+G model [30] provided the best fit to the data according the AIC criterion with estimations of nucleotide frequencies (A = 0.254, C = 0.217, G = 0.281, T = 0.248), a rate matrix with six different substitution types, assuming a heterogeneous rate of substitutions with a gamma distribution of variable sites (number of rate categories = 4, shape parameter = 0.60) and a proportion of invariable sites (pinvar) of 0.46. The ML tree obtained in PAUP was used as input for three additional rounds of ML analyses to search for trees with smaller -ln likelihoods, but trees with better likelihood scores were not obtained. Bootstrap resampling was performed on the ML tree obtained in GARLI with 1000 replications. Bayesian phylogenetic analysis was performed with MrBayes version 3.1.2 [31, 32] using the GTR+I+G model (rate matrix with six different substitution types, number of rate categories = 4, and with the nucleotide frequencies, shape parameter α, and pinvar estimated from the data), four Markov chains and 2,000,000 generations sampling every 100 generations with the first 25% of the sampled trees discarded, leaving 15,000 trees. Posterior probabilities were then calculated from two independent runs using the 50% majority rule consensus of the kept trees. Minimum Evolution [33], neighbor-joining [34] distance and Maximum Parsimony (MP) approaches were done in PAUP 4b10. ME distance trees were constructed with DNA distances set to maximum likelihood and a heuristic search procedure with 10 random input orders and TBR were employed to find the best tree. Best scoring trees were held at each step. NJ phylogenies were constructed in connection with the "HKY85 model" [35]. In MP analyses, the sites were weighted (RI over an interval of 1-1000). The heuristic search for the best tree was the same as in ME analyses. Bootstrap resampling was performed on NJ trees with 2000, for ME and MP with 1000 replications.