The novel genome sequences described herein provide useful genomic information concerning millipede groups that had not been investigated. Taken together with existing sequences, the variety of compositions and evolution of myriapod mitochondrial genomes are shown to be more complex than previously thought. Unfortunately, the use of mitochondrial protein-coding regions in deep arthropod phylogenetics appears problematic, a result consistent with previously published studies. Lack of phylogenetic signal renders the resulting tree topologies as suspect. As such, these data are likely inappropriate for investigating such ancient relationships.

The newly sequenced genomes are similar to previously characterized millipede sequences in terms of synteny and length. Unique translocations occurred within the newly sequenced taxa, including one half of the Appalachioria falcifera genome, which is inverted with respect to other millipede genomes. Across myriapods, amino acid conservation levels are highly dependent on the gene region. Additionally, individual loci varied in the level of amino acid conservation. Overall, most gene regions showed low levels of conservation at many sites. Attempts to reconstruct the evolutionary relationships suffered from questionable relationships and low support values. Analyses of phylogenetic informativeness show the lack of signal deep in the trees (i.e., genes evolve too quickly). As a result, the myriapod tree resembles previously published results but lacks convincing support, and, within the arthropod tree, well established groups were recovered as polyphyletic.

Arthropods are the most diverse group of eukaryotic organisms, but their phylogenetic relationships are poorly understood. Herein, we describe three mitochondrial genomes representing orders of millipedes for which complete genomes had not been characterized. Newly sequenced genomes are combined with existing data to characterize the protein coding regions of myriapods and to attempt to reconstruct the evolutionary relationships within the Myriapoda and Arthropoda.

We report herein sequences for an additional three entire millipede mitochondrial genomes representing the remaining major groups comprising the Helminthomorpha clade, the worm-like millipedes. We combine these genomes with existing sequence data to investigate, for the first time, the evolution of mitochondrial genomes across the Diplopoda using comparative genomic and phylogenetic methods. We also combine these data with 56 additional Ecdysozoa exemplars, spanning the diversity of this major clade, to ascertain the effects of adding three myriapod taxa in an attempt to strengthen the nodes containing these terminals. All of these analyses taken together provide an evolutionary framework for evaluating the appropriateness of using mitochondrial genome sequences to reconstruct deep evolutionary relationships across the arthropod tree of life and provide further examples of the shortcomings associated with mitochondrial phylogenomics.

Another source of error when using mitochondrial genomes to reconstruct deep evolutionary relationships stems from accelerated substitution rates. This leads to long-branch attraction (LBA) [39] , [40] resulting in strong outgroup dependent effects [30] , [41] . Methods to deal with this issue have been suggested and include the use of site-specific models of molecular evolution, increasing taxon sampling to reinforce weakly supported nodes, and removing problematic sites in the data matrix. The CAT model of molecular evolution accounts for site-specific heterogeneity [42] and has been shown to increase the efficacy of mitochondrial genome-based phylogenomics [31] , [43] . Talavera & Vila found that removing problematic portions of the alignment improved the resulting topologies [43] .

The use of full mitochondrial genome sequences to reconstruct deep relationships has been advocated since 1998 [20] . These data have been employed in various taxa including: salamanders [21] , [22] , gastropods [23] , echinoderms [24] , and arthropods [25] , including investigations of relationships among the various myriapods classes [26] , [27] . However, the utility of mitochondrial genomes in reconstructing deep phylogenies has been the subject of ongoing debate [28] – [30] . The phylogenetic signal present in mitochondrial genomes useful for the study of ancient relationships can be affected by a number of factors (recently summarized in Rota-Stabelli et al. [31] ). In particular, lineage-specific compositional heterogeneity, has been shown to affect the amino acids used in constructing proteins [32] , [33] . This issue is prevalent in ecdysozoans where the genomes tend to have a high A-T to G-C ratio, and, as a result, protein sequences show a bias towards codons that contain adenine and thymine bases [32] , [34] . Additionally, compositional heterogeneity can exist between the two strands wherein one strand is more A-T rich than the other [35] . If a region of the genome was inverted, the A-T to G-C bias of the opposite strand could cause a shift in the nucleotide sequence of the region towards the composition of the complementary strand [36] . Heterogeneity in A-T proportion and A-T to G-C bias between strands has been shown to confound phylogenetic inference when using mitochondrial genomes [33] , [37] , [38] .

Only recently have molecular phylogenetic techniques been used to reconstruct millipede phylogenies [6] – [18] , however, all but three of these studies focused on relationships at the generic or species level. Regier and Shultz investigated relationships with the Myriapoda using two nuclear protein-coding genes, but their results lacked support. Sierwald and Bond [6] represents the first attempt to reconstruct diplopod ordinal relationships using a total evidence approach by combining the molecular dataset of Regier and Shultz [17] , [18] with a morphological matrix from Sierwald et al. [19] . Pitz and Sierwald [13] investigated the familial relationships within the order Spirobolida. All of these higher-level studies suffer from limited taxon and locus sampling.

Arthropods comprise more of the Earth's nominal biodiversity than any other comparable eukaryotic group [1] , [2] with an estimated 3.7 million species in the tropics alone [3] . Despite their diversity and variety of life strategies, many arthropod groups remain woefully underrepresented in the scientific literature. Consequently, we know little about some of the most diverse and abundant animals on the planet. Even the relationships between many of the most basic groups (i.e., arachnids, myriapods, “crustaceans”, and insects) remain ambiguous. One such understudied taxon is the arthropod class Diplopoda, the millipedes. Millipedes are a highly diverse yet poorly studied group of some 12,000 described species organized into 16 orders. With estimates of global species richness ranging from ∼20,000 [4] to ∼80,000 [2] , [5] , many species remain to be discovered. Additionally, we know little concerning millipede higher-level relationships, ecology, behavior, physiology, and genomic composition. Due to relatively high degrees of morphological homogeneity and questionable homology of many somatic structures, millipede relationships are poorly resolved with fewer than half of all higher taxa delineated on the basis of phylogenetically defined apomorphic characters.

The peaks for each gene region skewed toward the terminals of both trees. As a result, most signal deep in the trees is confounded by noise. A) Myriapod BI tree converted to ultrametric. B) Ecdysozoan BI tree converted to ultrametric. These results indicate that mitochondrial protein-coding sequences are not appropriate for reconstructing deep arthropod relationships, even when the data is encoded as amino acid residues. The color scheme follows figures 2 and 3 .

The BI panarthropod analysis has poor resolution at some levels but recovered many of the higher groups with moderate support (pp >0.90 unless noted below). The following groups were recovered as monophyletic: Ecdysozoa, Panarthropoda, and Arthropoda, Myriapoda (pp = 0.80), Chilopoda, Lithobiomorpha, Diplopoda, Pycnogonida, Scorpiones, Araneae, Xiphosura, Insecta, Dicondylia, Pterygota, and Neoptera. The Pancrustacea is paraphyletic and the Chelicerata is polyphyletic as a result of the position of the mite Steganacarus magnus (Nicolet, 1855) which groups with the branchiuran Argulus americanus Wilson, 1902 (pp = 0.99). See figure 5D for all supported groupings.

The Panarthropoda trees are shown in figure 5C and 5D . The ML tree has a highly improbable topology and very poor support values at most nodes. Monophyletic Ecdysozoa, Panarthropoda, Arthropoda, and Myriapoda are recovered but lack support (BS <70). In general, Pancrustacea taxa are intermingled amongst the chelicerates. The Chilopoda (BS = 100), Lithobiomorpha (BS = 100), Diplopoda (BS = 100), and Symphyla (BS = 100) are monophyletic with strong support. Within the Chelicerata, the orders Pycnogonida (BS = 100), Xiphosura (BS = 100), Scorpiones (BS = 100), and Araneae (BS = 100) are monophyletic with strong support along with the Pedipalpi (Amblypygi + Thelyphonida; BS = 80). Within the Pancrustacea, a clade comprising most of the Hexapoda was recovered with strong support (BS = 97) but omits a number of taxa that are placed in other groups (e.g., in the chelicerate clade).

The following phylogenies were reconstructed using maximum likelihood and Bayesian inference of amino acid sequences. The ML trees were obtained using RAxML with 1000 random addition searches followed by 1000 boostrap replicates. The BI trees were obtained from two phylobayes runs consisting of 10000 cycles. The first 2000 cycles were discarded as burn-in. A) Myriapod ML tree, B) myriapod BI tree, C) ecdysozoan ML tree, and D) ecdysozoan BI tree. In the myriapod trees, millipedes taxa are colored green, symphylans are blue, and centipedes are red. In the ecdysozoan trees, outgroup taxa (non-arthropods) are colored black, myriapods are green, chelicerates are red, “crustaceans” and non-insect hexapods are blue, and insects are yellow.

The myriapod phylogenetic analyses are summarized in figure 5A and 5B . The maximum likelihood (ML) and Bayesian Inference (BI) analyses recovered similar topologies differing only in the placement of Ab. magnum and Narceus “annularis”. The positions of these two taxa are swapped in the two analyses rendering the Juliformia paraphyletic in the ML tree. Overall, the support values are not convincing in either case except for higher-level relationships. The Chilopoda, Pleurostigmophora, Symphyla, and Diplopoda are recovered as monophyletic with strong support in the ML analysis whereas the Pleurostigmophora, Symphyla, Diplopoda, Colobognatha + Polydesmida, and Juliformia + Callipodida are well supported in the BI tree.

Graphical summaries of the per site amino acid residue conservation score based on identity (AARCI) values for each myriapod gene alignment are shown in figure 3 . Pairwise t-tests comparing the AARCIs of each gene region both with and without a Bonferroni correction for multiple tests are summarized in table S2 . Of the 78 possible pairwise gene comparisons, 56 are significant in the Bonferroni corrected analysis (71.79%) while 65 are significant in the uncorrected analysis (83.33%). An analysis of variance (ANOVA) indicates differences in the AARCIs between gene regions (F = 67.887, df = 12, p = 2.2×10 −16 ). Pairwise comparisons of percent identity (%ID) based on AARCIs for each myriapod taxon using the concatenated dataset are summarized in figure 4 .

The mitochondrial genome syntenies are phylogenetically illustrated in figure 2 for all myriapods and the chelicerate Limulus polyphemus Müller, 1785, which is considered representative of the ancestral arthropod synteny. The phylogeny on which they are mapped is adapted from the only total evidence analysis of all diplopod orders [6] and mirrors the results obtained herein.

All known myriapod mitochondrial genomes comprise 13 protein-coding regions, two ribosomal subunits, 22 transfer RNAs, at least one non-coding region, and are approximately 15,000 bp in total length. The three genomes sequenced are similar in composition to those previously reported but contain a number of unique features ( figure 1 ). All coding regions are on a single strand in Appalachioria falcifera (Keeton, 1959) (Chilognatha: Helminthomorpha: Eugnatha: Merocheta: Polydesmida: Xystodesmidae) (GenBank accession #: JX437063). The regions of Abacion magnum Loomis, 1943 (Chilognatha: Helminthomorpha: Eugnatha: Nematophora: Callipodida: Abacionidae) (GenBank accession #: JX437062) are coded half on one strand and half on the other with overlap only in the tRNAs. Brachycybe lecontii Wood, 1864 (Chilognatha: Helminthomorpha: Colobognatha: Platydesmida: Andrognathidae) (GenBank accession #: JX437064) is similar to Ab. magnum but has a single translocation (ND1) causing a protein-coding region to be on the opposite strand in relation to surrounding regions. All three genomes have undergone tRNA translocations when compared to previously known myriapod sequences. In addition to the previously mentioned translocation of ND1 in B. lecontii, Ap. falcifera appears to have an entire side of the genome inverted.

Discussion

Genome synteny and features The mitochondrial genomes of Ap. falcifera, Ab. magnum, and B. lecontii were similar to the previously sequenced millipede genomes in terms of length, composition, and synteny (figure 1). The synteny of these novel genomes is similar to the juliform millipedes already sequenced with a few exceptions. The translocations of tRNAs is common throughout myriapods and also occurs between the presumably closely related juliform taxa previously sequenced [46], [47]. The genome of Ab. magnum is very similar to those of the Juliformia. This is not unexpected; the Callipodida group close to the Juliformia in Sierwald and Bond [6] and in the present study (figure 1). The translocation of ND1 in B. lecontii is the first example of a protein coding gene synteny change in the Diplopoda. The inversion of half of the mitochondrial genome in Ap. falcifera is a major rearrangement that is unprecedented in myriapods. Given how anomalous this particular rearrangement is, it was further assessed using novel primers to amplify sequences spanning the inverted section boundaries to confirm this finding. The synteny of Ap. falcifera is unique among the currently sequenced millipedes in that all genes appear to be on the same strand (i.e., all genes are transcribed in the same direction). The genomes of Ab. magnum and B. lecontii, and all previously analyzed millipedes, have two non-coding regions with most of the genes on either side of the circular genome transcribed in opposite directions and therefore on opposite strands. A mechanism has been previously proposed to explain how two major regions of the genome could have opposite directionality of sense strands [46]. Under this mechanism, the entire genome is duplicated creating a circular genome twice the size and containing two copies of all gene regions and two A-T rich regions (each containing a bi-directional transcription initiator and a bi-directional transcription terminator). Ancestrally, these genes were mixed in terms of which strand was the sense strand because the transcription of each strand could proceed uninterrupted all the way around the circular genome. With the duplication of the A-T rich region, transcription of each strand was terminated at the halfway point. After the duplication, one duplicate of each gene is lost resulting in two halves of the genome transcribed in opposite directions. The directionality of each half of the genome is determined by the position of the translation initiator and terminator sequences (the two remaining non-coding regions found in many millipedes). The process of gene loss is non-random, and, as Lavrov et al. [46] point out, could have major implications on the use of mitochondrial genome synteny to reconstruct phylogenies. Myriapod gene synteny is quite similar to that of Limulus polyphemus (figure 2). The lithobiomorph centipedes (Lithobius and Bothropolys) are identical but for a single tRNA translocation in Lithobius. The symphylan Symphylella sp. is remarkably different, but Scutigerella causeyae is very similar to Limulus. All millipedes surveyed have ND6 + CytB placements that differ from that of Limulus, and B. lecontii has a unique positioning of ND1, as mentioned above. These changes are likely associated with the strand specific nature of the opposing halves of millipede mitochondrial genomes. Because each half of diplopod mitochondrial genomes, except for that of Ap. falcifera, can only be transcribed in a single direction, those genes on the sense strand of one half of the genome were retained while those on the nonsense strand were lost. Because Ap. falcifera has a synteny more similar to that of the other millipedes than Limulus, taking into account the inversion, and all of the genes are on a single strand, we hypothesize the inversion is a secondary event. Ap. falcifera likely had a synteny similar to the remaining millipedes and underwent a second inversion event, thus loosing the second non-coding region. The presence of overlapping regions is not uncommon in taxa studied to date, including the existing myriapod genome sequences [26], [27], [44]–[47]. Recently, White et al. [23] found overlapping gene regions in all ten novel pulmonates (Mollusca: Gastropoda). Perseke et al. [24] found overlapping genes in the mt-genomes of the ophiurid echinoderms Ophiocomina nigra (Abildgaard, 1789) and Amphipholis squamata (delle Chiaje, 1828) and in the acorn worm Balanoglossus clavigerus (delle Chiaje, 1829) (Hemichordata: Enteropneusta). In the Panarthropoda, overlapping regions exist in the velvet worm Opisthopatus cinctipes Purcell, 1899 (Onychophora: Peripatopsidae) [49]. The A-T richness of the genomes reported here are close to the average arthropod level of ∼70% [45]. Ap. falcifera (64%) and Ab. magnum (66.6%) are lower than B. lecontii (76.6%), the highest millipede value to date, but fall within the established range of diplopod A-T contents 62.1% (Antrokoreana gracilipes) to 67.8% (Thyropygus sp.). Given the paucity of colobognathan genomes sequenced to date and the values of other myriapods (e.g., 72.6%– Scutigerella causeyae), it is difficult to say whether B. lecontii has abnormally high A-T richness. Other arthropods have even higher A-T content; Melipona bicolor (Lepeletier, 1836) (Pancrustacea: Insecta: Hymenoptera) has a value of 86.7% [50].

Protein Coding Region Statistics The amino acid residue conservation scores based on identity (AARCIs) suggest inherent differences in the evolution of the protein coding regions of myriapod mitochondrial genomes. The genes show varying levels of site-specific conservation (figure 3). These data indicate some genes with many highly conserved, high AARCI value sites (e.g., COI), whereas others have few conserved, low AARCI value sites (e.g., ATP8). Additionally, the mean values of AARCI scores differ between the coding regions (table S2). Of the 78 possible pairwise gene comparisons, 56 are significantly different with a Bonferroni correction, whereas 65 are significant in the uncorrected analysis (α = 0.05). An ANOVA comparing the protein coding gene regions show that they are not equal. The taxa also differ in their mean AARCI values across the genome as shown in figure 4, which illustrates the %ID of the amino acid residues for all pairwise comparisons. Because these taxa have been separated for as long as 504 MY [51], it is not surprising that the gene regions and taxa differ in per site amino acid conservation values. Despite the specific and vital functions performed by these mitochondrial genes, large portions appear to be under varying selection pressures between taxa. Determining whether variable regions are under relaxed or divergent selection will require many more taxa be sequenced to increase phylogenetic coverage. Portions of some genes do appear to be under strong stabilizing selection and probably represent functional domains or important structural regions of the final peptide.