Agassiz’s desert tortoise (Gopherus agassizii) is a long-lived species native to the Mojave Desert and is listed as threatened under the US Endangered Species Act. To aid conservation efforts for preserving the genetic diversity of this species, we generated a whole genome reference sequence with an annotation based on deep transcriptome sequences of adult skeletal muscle, lung, brain, and blood. The draft genome assembly for G. agassizii has a scaffold N50 length of 252 kbp and a total length of 2.4 Gbp. Genome annotation reveals 20,172 protein-coding genes in the G. agassizii assembly, and that gene structure is more similar to chicken than other turtles. We provide a series of comparative analyses demonstrating (1) that turtles are among the slowest-evolving genome-enabled reptiles, (2) amino acid changes in genes controlling desert tortoise traits such as shell development, longevity and osmoregulation, and (3) fixed variants across the Gopherus species complex in genes related to desert adaptations, including circadian rhythm and innate immune response. This G. agassizii genome reference and annotation is the first such resource for any tortoise, and will serve as a foundation for future analysis of the genetic basis of adaptations to the desert environment, allow for investigation into genomic factors affecting tortoise health, disease and longevity, and serve as a valuable resource for additional studies in this species complex.

Competing interests: AEK is commercially affiliated with Alice E. Karl and Associates and this affiliation does not alter our adherence to PLOS ONE policies on sharing data and materials.

Funding: Author AEK is employed by Alice E. Karl and Associates. This company provided support in the form of salary for author AEK but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section.

Data Availability: All genomic and transcriptomic sequence files are available from the NIH-NCBI BioProject database (accession numbers PRJNA352725, PRJNA352726, and PRJNA281763). All genome assembly, transcriptome assembly, predicted protein, transcript, genome annotation, repeatmasker, phylogenetic trees, .vcf and GO enrichment files are available on Harvard Dataverse (doi: 10.7910/DVN/EH2S9K ).

This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.

In addition to providing resources that can save threatened or endangered species from extinction, comparative genomics may reveal information about many traits relevant to human illnesses [ 15 ] and longevity [ 16 ]. Gopherus agassizii possesses many traits common to turtles, or chelonians, including longevity, as well as those specific to desert tortoises such as robustness to harsh environments. Genome assemblies exist for several chelonians: the Chinese softshell turtle, Pelodiscus sinensis [ 17 ], the green sea turtle, Chelonia mydas [ 17 ], and the western painted turtle, Chrysemys picta bellii [ 18 ]. Embryonic transcriptomes have been published for the red-eared slider turtle, Trachemys scripta [ 19 ], a blood transcriptome for the giant Galapagos tortoise, Chelonoidis nigra [ 20 ], and liver transcriptomes for the common musk turtle (Sternotherus odoratus), common snapping turtle (Chelydra serpentina), and African sideneck turtle (Pelusios castaneus) [ 21 ]. Recent studies have confirmed a sister-taxon relationship between chelonians (including turtles and tortoises, order Testudines) and archosaurs (including crocodilians and birds), which split ~250 Ma [ 17 , 18 ], and have estimated that the basal divergence of most modern chelonian lineages occurred ~100 Ma. Gopherus agassizii and C. picta bellii last shared a common ancestor ~70 Ma [ 22 ] ( Table 1 ), and comparisons using this more recent divergence may provide a better opportunity to understand the evolution of many chelonian-specific traits, such as longevity and adaptations to aquatic or terrestrial habitats, including hot deserts. To assist both conservation and comparative genomics, we report a whole genome assembly and deep transcriptomic sequence-based annotation for Agassiz’s desert tortoise, and provide insights from comparative and population genomic analyses.

Previous population genetic studies based on microsatellites, nucleotide sequences from mitochondrial and nuclear DNA, and RNA sequencing (RNA-Seq) data have laid the groundwork for assessing the distribution of genetic diversity across the range of G. agassizii [ 8 – 10 ]. Agassiz’s desert tortoise is a long-lived species that is slow to mature, with a maximum lifespan of over 50 years in the wild [ 11 ], making the long-term impacts on adult populations and conservation priorities difficult to assess. Genomic analyses of G. agassizii may help to understand key aspects necessary to its survival, such as immune system responses to diseases, including upper respiratory tract disease [ 3 ]. Functional trait analysis combined with genomics can advance knowledge of physiological [ 12 ], morphological [ 13 ], and life history variation among the six species of Gopherus [ 2 ]. Genomic information can also inform management decisions about habitat corridors and reproduction especially in the light of climate change [ 14 ]. A whole blood transcriptome assembly has been published for G. agassizii [ 8 ]. However, this represents only a portion of the total genetic information, and a complete genome assembly would better facilitate future analyses of polymorphism and the geographic distribution of variation for the species and its congeners.

Species at risk of extinction can benefit from human interventions, and effective management of threatened populations depends on an understanding of their genetic variation, including the effects of inbreeding, disease, and adaptation to local environments [ 1 ]. One of six species of desert tortoise estimated to have arisen in North America ~35 million years ago (Ma) [ 2 ], Agassiz’s desert tortoise (Gopherus agassizii) has been heavily impacted by habitat loss, a respiratory tract disease [ 3 , 4 ], and other anthropogenic factors [ 5 ]. For instance, in one area of the species’ range density declined from about 225 individuals/km² in 1979 to about 75 individuals/km² in 1992 [ 6 ]. Such declines prompted populations of G. agassizii north and west of the Colorado River to be listed as threatened under the U.S. Endangered Species Act [ 5 ]. Despite this protection, desert tortoise numbers continued to decline ~50% between 2004 and 2013 [ 7 ], warranting further conservation efforts. Effective management of G. agassizii populations would benefit from genomic resources, which have only recently been developed for the species.

Genomic and transcriptomic sequences are available from the NCBI BioProject (accession numbers PRJNA352725, PRJNA352726, and PRJNA281763). The genome assembly is available from NCBI BioProject PRJNA352726. Genome and transcriptome assembly, predicted protein, transcript, genome annotation, and annotated repeat files are available from the Harvard Dataverse (doi: 10.7910/DVN/EH2S9K ).

We identified fixed differences between the three species using the Filter command in SnpSift v4.0 [ 67 ], filtering by quality (≥ 30) and read depth (≥ 20). We filtered for only SNP variants within the unique variants file per taxon, excluding STRs and insertion-deletions. We cross-referenced these unique SNP variant files to orthology tables for human (Homo sapiens), chicken (Gallus gallus), and green anole (Anolis carolinensis) (see Genome Annotation Methods) for each Gopherus species (9 comparisons in total). Each set of cross-referenced orthologous Ensembl IDs were: 1) converted to Gene Ontology (GO) terms using g:Convert ( http://biit.cs.ut.ee/gprofiler/gconvert.cgi , accessed July 31, 2016) [ 68 ], to plot and analyze semantically with REViGO, and 2) entered into g:GOSt [ 67 ] to test for statistical enrichment of GO terms per Gopherus species. For enrichment analysis in g:GOSt we retained only significant results, used hierarchical sorting without filtering, and used the default significance threshold (g:SCS) correction for multiple hypothesis testing. For visualization in REViGO, we allowed medium similarity (0.7) for semantic overlap, clustered terms by ‘uniqueness,’ and used the default whole UniProt database for comparison of GO terms. Gene clusterings and GO-term enrichments were similar across reference taxon (human, chicken, or the green anole), so we only show results using orthology assignments from the green anole.

In 2011, divergence across genetics, ecology and behavior was used to recognize two different species of desert tortoise in the southwestern United States: Agassiz’s desert tortoise (G. agassizii), found in the Mojave Desert of California, Nevada, Utah and northwestern Arizona, and Morafka’s desert tortoise (G. morafkai), found in the Sonoran Desert of Arizona and Sonora, Mexico [ 60 ]. More recently, genetic data were used to characterize and describe the third desert species, Goode’s Thornscrub tortoise (G. evgoodei), which resides in the states of Sonora and Sinaloa, Mexico [ 61 ]. These three species are thought to have diverged nearly simultaneously ~5.5 Ma [ 8 ] and occupy diverse habitats, including the high desert of the Great Basin shrub steppe in Utah, to the Joshua tree forests of California, through the coastal desertscrub along the east coast of the Sea of Cortez and the lush tropical deciduous forests of Sinaloa, Mexico [ 62 ]. The three species of desert tortoise have likely adapted to different climatic regimes and landscapes. To find genomic regions responsible for species-specific desert tortoise adaptations, we analyzed fixed unique variants within three species of the desert tortoise species complex. We mapped previously published blood RNA (NCBI BioProject PRJNA281763) data from three individuals per species to our new G. agassizii reference genome using STAR v.2.5.0a [ 63 ], with default parameters for the two-pass method set to “basic” to find splice junctions. We added read groups and removed PCR duplicates using MarkDuplicates from Picard v1.140 ( http://broadinstitute.github.io/picard/ ). We then used the Split’N’Trim tool in the Genome Analysis Tool Kit pipeline (GATK v3.4–46) [ 64 ] to split reads into exon segments and clip sequences that overlap introns. The split, de-duplicated alignment files from each individual were then merged using Picard. We used the CallableLoci GATK tool to filter the genome for areas that met default coverage criteria for each alignment file, and concatenated, sorted and merged the output BED files using bedtools version 2.24.0 [ 65 ], with–d set to 20 to join nearby yet non-contiguous callable sites. Using the merged callable loci as a target file, we called variants on all nine alignments with FreeBayes v1.0.1 [ 66 ].

To identify protein-coding genes potentially responsible for adaptations in Agassiz’s desert tortoise, we calculated the ratio of non-synonymous to synonymous substitutions (Ka/Ks) in pairwise comparisons separately to western painted turtle (chrPic2), Chinese softshell turtle (pelSin1), green sea turtle (cheMyd1) and chicken. Syntenic genome alignments were generated as described above, and the Stitch Gene Blocks tool in Galaxy [ 54 ] was used to extract orthologs using the chicken reference annotation (Ensembl v82). We removed gene alignments with less than 50% coverage, ambiguous codons, and internal stop codons using AlignmentProcessor ( https://github.com/WilsonSayresLab/AlignmentProcessor ). Estimated Ka/Ks for the filtered gene alignments were obtained using KaKs_calculator2.0 [ 55 ], and we collected the RefSeq (western painted turtle, green sea turtle, Chinese softshell turtle) or Ensembl (chicken) IDs of accelerated genes (Ka/Ks > 1). These genes were enriched for non-synonymous and potentially functional substitutions resulting from amino acid changes [ 56 ]. A Bonferroni adjustment for multiple hypothesis testing [ 57 ] was used to identify genes with highly significant Ka/Ks > 1, which indicates positive selection. Many genes in chelonian genome assemblies lacked clear orthology to an organism in RefSeq. Thus, human orthologs were obtained by the ID Mapping of associated gene names in UniProt. Next, Ensembl BioMart [ 58 ] was used to for retrieve Gene Ontology (GO) terms. REVIGO [ 59 ] semantic clustering was used to visualize GO terms for biological processes, cellular components and molecular function. To compare GO terms, analyses used medium similarity (0.7) for semantic overlap, clustering of terms by ‘uniqueness’ and the default whole UniProt database.

Divergence at fourfold degenerate (4D) sites, which approximates the neutral mutation rate [ 49 ], was used to compare variation in DNA substitution rates of chelonians and the other sauropsids. We extracted 4D sites from the WGA based on the Ensembl v82 chicken gene annotations using msa_view in the PHAST v1.3 package [ 50 ]. To reconstruct the phylogeny of the 10 sauropsids included in the WGA using the 4D data, we used RAxML v8.2.3 [ 51 ]. We generated 20 maximum likelihood (ML) trees under the GTRCAT substitution model and conducted 500 bootstrap replicates to assess statistical support for the ML tree with the highest likelihood. We used the topology of the best ML tree from RAxML and obtained branch lengths in terms of substitutions per site by fitting a time-reversible model (REV) to the 4D site data using phyloFit in PHAST. We then estimated lineage-specific substitution rates in sauropsids using the penalized likelihood method implemented in r8s v1.8.1 [ 52 ] while placing constraints on nodal ages obtained from TimeTree [ 53 ]. The age of the ancestral sauropsids node was fixed at 280 Ma and the ancestral archosaur node at 245 Ma; these were the median age estimates. Estimates of divergence time for most chelonian families differed widely in the literature. Thus, we placed minimum and maximum age constraints on the nodes representing Cryptodira (encompassing all four represented chelonian families, 97.4 Ma to 250 Ma) and Drurocryptodira (excluding P. sinensis, 36 Ma to 183 Ma). The divergence time of Testudinae (G. agassizii) and Emydidae (C. picta bellii) was estimated (1) without constraint and (2) with constraints, which fixed the node age as the median estimate from TimeTree (74.2 Ma).

To understand patterns of molecular evolution across chelonians and other reptiles, the genome assemblies of Agassiz’s desert tortoise, western painted turtle, Chinese softshell turtle, green sea turtle, American alligator (allMis1), budgerigar (Melopsittacus undulatus, melUnd1), and green anole lizard (AnoCar2.0) were aligned to the reference chicken genome (galGal4) using LASTZ v1.0.2 [ 44 ]. We used chaining to form gapless blocks and netting to select the highest-ranked chains [ 45 ]. Pairwise alignments for zebra finch (Taenopygia guttata, taeGut3) and medium ground finch (Geospiza fortis, geoFor1) to chicken were downloaded from the UCSC Genome Browser. A guide tree representing the phylogenetic relationships of the included taxa [ 46 , 47 ] was used to construct a 10-way whole genome alignment (WGA) from the pairwise alignments with MULTIZ v11.2 [ 48 ]. The WGA was filtered to include alignment blocks containing at least nine out of the 10 species.

The Agassiz’s desert tortoise genome assembly was masked for repetitive elements, first with RepeatMasker v4.0.5 [ 41 ] using the complete library of known repeats available in RepBase [ 42 ]. The remaining unmasked regions of the desert tortoise genome were then searched for repeat families using RepeatModeler v1.0.8 [ 43 ]. This output of de novo repeats from desert tortoise was then combined with the first RepBase library to create a combined library, which was used in a final repeat-masking step to classify all annotated repeats in the genome. We also ran RepeatMasker on the western painted turtle genome using RepBase, which includes repeat libraries specific to Chrysemys picta bellii, in parallel for comparison to Agassiz’s desert tortoise. To estimate the amount of evolutionary divergence within repeat families of Agassiz’s desert tortoise, we generated repeat-family-specific alignments and calculated the average Kimura 2-parameter (K2P) distance from consensus within each family, correcting for high mutation rates at CpG sites with the calcDivergenceFromAlign.pl tool included with RepeatMasker. For comparison, the divergence profile for western painted turtle was obtained from the RepeatMasker server ( http://www.repeatmasker.org/species/chrPic.html , accessed August 10, 2015).

Gene predictions were functionally annotated by collecting the top hits from a BLASTP search of the UniProtKB/Swiss-Prot database. Protein predictions from the final gene models were assigned one-to-one orthology with proteins from the western painted turtle genome through reciprocal BLAST. First, BLASTP was used to search for the best hit for every predicted Agassiz’s desert tortoise protein in the western painted turtle proteome (evalue = 1E-6, max_target_seqs = 1), followed by BLASTP of each western painted turtle protein in the Agassiz’s desert tortoise proteome. We also used reciprocal best hits to identify orthologs of Agassiz’s desert tortoise in Chinese softshell turtle, chicken (Gallus gallus), green anole (Anolis carolinensis), and human proteomes. Clusters of homologous genes across Agassiz’s desert tortoise, western painted turtle, Chinese softshell turtle, chicken, American alligator (Alligator mississippiensis), green anole and human were identified using OrthoFinder v0.4 [ 40 ].

The RNA-Seq fastq files for skeletal muscle, brain and lung were filtered for adapters, quality score ≥ 30 and length ≥ 37 bp using Trimmomatic v0.27, and paired-end and single-end reads were assembled together into transcripts using Trinity v2.0.6 [ 36 ]. Previously published whole blood-based transcriptome data were also used (NCBI BioProject PRJNA281763) [ 8 ]. We used MAKER2 v2.31.8 [ 37 ] in multiple iterations for gene-finding, which incorporated: (1) direct evidence from the transcriptomes, (2) homology to proteins in the UniProtKB/Swiss-Prot database and the predicted proteome of C. picta bellii (NCBI BioProject PRJNA210179), and (3) ab initio predictions from SNAP (11/29/2013 release) [ 38 ] and Augustus v3.0.2 [ 39 ]. The first iteration of MAKER aligned the transcript and protein sequences to the assembly, producing draft gene models. Ab initio gene predictors benefit from the training of their Hidden Markov Models (HMM) to the species-specific genome, and we trained SNAP using a two-pass approach in MAKER. First, we ran MAKER a second time with SNAP using G. agassizii-specific HMMs generated from the initial MAKER iteration. Using the gene models from this output, we then generated an improved HMM for SNAP in a third MAKER iteration. In parallel, Augustus was trained by sampling the SNAP-improved MAKER gene models followed by optimization. A fourth and final run of MAKER was then performed, which incorporated the optimized HMM for Augustus, the final SNAP HMM, and the aligned protein and transcriptome data, resulting in the final set of gene model predictions.

DNA sequence reads were trimmed to eliminate nucleotide biases and remove adaptors with Trimmomatic v0.27 [ 24 ]. We retained all reads ≥ 37 bp with a quality score ≥ 28. Illumina sequencing errors were corrected using SOAPec v2.01, and overlapping reads from the 200 bp libraries were joined to form single-end reads using FLASH v1.2.8 [ 25 ]. We compared the outputs of different de Bruijn graph assemblers, including ABySS 1.5.2 [ 26 ], SOAPdenovo2 [ 27 ], and Platanus v1.2.1 [ 28 ] for both contig and scaffold assembly, in addition to SSPACE v3.0 [ 29 ] for scaffold assembly; we then closed gaps using the GapCloser v1.12 module from SOAP ( S1 Appendix ). The assemblies with the most scaffold contiguity (i.e., N50) and reasonable total length given the expected genome size were selected for further analysis. We evaluated completeness of the assemblies by their estimated gene content, using the Conserved Eukaryotic Genes Mapping Approach (CEGMA v2.5) [ 30 ], which calculated the proportion of 248 core eukaryotic genes present in the genome assembly, and Benchmarking Universal Single Copy Orthologs (BUSCO v1.22) [ 31 ], which calculated the proportion of a vertebrate-specific set of 3,023 conserved genes that were either complete, fragmented, or missing. We further improved the raw assembly by RNA scaffolding. In brief, assembled transcripts (described below) were searched for open reading frames (ORFs) and filtered to include only sequences that produced a significant BLAST hit to a protein sequence in the UniProtKB/Swiss-Prot database [ 32 ] using TransDecoder v2.0 [ 33 ] ( https://transdecoder.github.io/ ). This filtered gene set was mapped to the genome assembly using BLAT [ 34 ] and identified scaffolds were merged using L_RNA_scaffolder [ 35 ].

To confirm the taxonomic identity of the specimen, and to assure that it was not a hybrid of G. agassizii and G. morafkai [ 23 ], we followed described protocols [ 23 ] using 25 short tandem repeats and mtDNA sequence data for population assignment against a database of 1,258 specimens of Gopherus. We confirmed the specimen to be G. agassizii from the northern Mojave genetic unit, which is geographically bounded as east of the Ivanpah, California area and northwest of the Colorado River in Nevada, Arizona and Utah [ 12 ].

The Institutional Animal Care and Use Committee at Arizona State University reviewed and approved this study, and methods of transport, housing, and euthanasia were carried out in accordance with ethical guidelines in Protocol #13-1319R (to DFD). Specifically, an adult male G. agassizii collected in urban Las Vegas, Nevada under U.S. Fish and Wildlife Service recovery subpermit #FWSDTRO-1 was transported to Arizona under Nevada Department of Wildlife export permit #S37016, housed for two days and euthanized with an intracoelemic injection of sodium pentobarbital and phenytoin solution. Tissues were harvested under sterile conditions and either flash-frozen in liquid nitrogen or placed in RNAlater (Thermo Fisher Scientific) for DNA and total RNA extraction, respectively. DNA was extracted from liver and skeletal muscle using the DNeasy kit (QIAGEN) and total RNA was isolated from lung, brain and skeletal muscle with the total RNA protocol included in the mirVana miRNA Isolation Kit (Thermo Fisher Scientific). RNA quantity and quality was measured by spectrophotometry (Nanodrop) and Agilent 2100 Bioanalyzer. All sequencing was carried out on the Illumina HiSeq 2000 or 2500 platforms. Paired-end DNA libraries were constructed with targeted fragment sizes of 200 base-pairs (bp), 300 bp, and 1,000 bp using the Kapa Biosystems Library Prep Kit by the Collaborative Sequencing Center at the Translational Genomics Research Institute (Phoenix, AZ; S1A Table ). Mate-pair libraries were constructed and sequenced for DNA fragment sizes of 2 kbp, 4 kbp, and 10 kbp ( S1 Table ). Additionally, RNA-Seq libraries for brain, muscle and lung total RNAs were constructed and sequenced at the University of Arizona Genetics Core (Tucson, AZ; S1B Table ).

Results and discussion

Repeat analysis Repeat identification of the G. agassizii genome using known RepBase elements results in 31% of the genome masked (Table 4). Additional de novo repeat identification with RepeatModeler improves the annotations of several classes of repeats, resulting in 43% of the genome being assigned to repeats. These include a diverse array of transposable elements, which comprise 33% of the genome, including DNA transposons, long terminal repeat (LTR) elements, long interspersed nuclear elements (LINEs), and short interspersed elements (SINEs). The genomic proportion of repeats in Agassiz’s desert tortoise is greater than that for the western painted turtle (29%). Slight differences in the abundance of repeat classes exist between the analyzed chelonians, although relative abundances are similar except for DNA transposons, which are more abundant in Agassiz’s desert tortoise (12.4% versus 9.8% in the western painted turtle), as are LTR retrotransposons (7.7% versus 5.2%, respectively). The genomes of both Agassiz’s desert tortoise and the western painted turtle possess a large proportion of CR1 subfamilies at ≤ 5% K2P divergence from their consensus sequence (S1 Fig), suggesting they are recent inserts. Therefore, these elements comprise the most active and common retrotransposons in both chelonians and archosaurs [72,77]. The high percentage (8%) of unclassified interspersed repeats identified by RepeatModeler in the Agassiz’s desert tortoise’s genome, for which we could not find matches in either RepBase or GenBank, suggests that novel elements exist in the genomes of chelonians. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Table 4. Repetitive content of the Agassiz’s desert tortoise (Gopherus agassizii) and the western painted turtle (Chrysemys picta bellii) genomes, estimated with libraries of known repeats (RepBase) and de novo repeat identification (RepeatModeler). https://doi.org/10.1371/journal.pone.0177708.t004

Slow DNA substitution rates in chelonians The filtered whole genome alignment (WGA) is just over 217 Mbp in length, including 15% gaps, with an average of 9.23 species represented in each alignment block. The RAxML analysis included full statistical support for the a priori expected relationships, including the monophyly of C. picta bellii and G. agassizii, P. sinensis as the outgroup to all other included chelonians, and a monophyletic archosauria. When analyzing 1,815,350 4D sites, branch lengths in terms of substitutions per site on the sauropsid phylogeny (Fig 4, S2 Fig) suggest limited neutral divergence since the most recent common ancestor of chelonians. In contrast, the avian clade has longer internal branches, consistent with faster evolutionary rates in birds compared to other archosaurs [72,77]. Incorporating different divergence times between Gopherus and Chrysemys (S5 Table) affects the absolute estimated rates but not relative results. These results are similar to previous findings that chelonians have accumulated fewer DNA substitutions over time than other sauropsid lineages [17,18] (S3 Fig, S7 Dataset). This may be due to the unusually long generation times of many chelonians, which correlate with lower substitution rates in reptiles [78]. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Fig 4. Phylogeny of 10 sauropsids including Gopherus agassizii. Phylogeny shows the relationships of sauropsids, including four chelonians, with complete genome sequences. Branch lengths are given in terms of DNA substitutions per site derived from fourfold degenerate sites. https://doi.org/10.1371/journal.pone.0177708.g004