Taxonomic sampling

We compiled molecular datasets based on transcriptomic data from 95 species (Supplementary Table 1), four of which are newly sequenced. Our dataset includes representatives of all arachnid orders, with the exception of two minor lineages, Palpigradi and Schizomida (the latter being widely accepted as the sister lineage of Uropygi14), for which genomic-scale data are still missing. We include sequence information from three of four living horseshoe crab species, and most importantly we expand taxon sampling within mites and ticks, including short-branched representatives of Sarcoptiformes (Acariformes) and Mesostigmata (Parasitiformes). The inclusion of short-branched Acariformes and Parasitiformes is of particular relevance because previous studies suggested that the chelicerate tree is prone to the effects of Long Branch Attraction (LBA) artifacts8,15, and long-branched mites and ticks are particularly common across previous datasets8.

Matrix assembly and model selection

To test chelicerate relationships, we generated three datasets using two alternative orthology selection strategies. For the first dataset (Matrix A) we expanded our legacy dataset2,16,17 to generate a superalignment including 45,939 amino acid positions (78.1% complete) from 233 loci. The genes in Matrix A were originally selected to maximise inclusion of known single-copy genes (to minimise the negative effects of hidden paralogy) as well as slowly evolving genes (to reduce the negative impact of saturation-dependent tree reconstruction artifacts, like LBA18). Matrix A is thus enriched in informational genes19, e.g. ribosomal proteins, that are frequently present in single copy and are characterised by a low rate of evolution20. For our second dataset (Matrix B) we used the OMA stand-alone software21 to de novo identify orthologs from our set of transcriptomes. Using OMA we generated a new set of 3982 loci based on retaining genes for which orthologs were present in at least in 35% or more of the 95 considered taxa. Each of these high occupancy loci was then concatenated into a final supermatrix that was trimmed to remove poorly aligned regions. The final, trimmed, supermatrix included 34,839 amino acid positions (86.4% complete); see Methods for details. The third dataset (Matrix C) was created by concatenating all genes from Matrix B that are not in Matrix A (see Supplementary Table 2 and Methods). Matrix C was then trimmed using the same software and parameters used for Matrix B. While Matrices A and B are not fully independent datasets, Matrices A and C (the latter including 30,552 amino acids) are, and Matrix C was assembled to better establish the extent to which similarities between the trees derived from Matrices A and B might have been caused by the presence of a common set of shared sites (Supplementary Table 2). In light of previous suggestions that chelicerate trees might be prone to LBA artifacts8,15, we used saturation plots to compare substitutional-saturation levels in these three matrices, as an increased level of saturation is a factor that can lead to the recovery of tree reconstruction artifacts18,22,23. Results of saturation-plot analyses provide us with an objective way to rank a-priori the expected relative phylogenetic utility of our datasets. The saturation plots (Fig. 1a) indicated that Matrix A, which was originally designed to exclude fast evolving genes, is less saturated (R2 = 0.74) than are Matrices B and C (R2 = 0.42 and 0.48, respectively). These latter two are in effect equally saturated: they display minimal differences in R2 values and regression lines of almost identical slope (Fig. 1a). These results are not unsurprising given that Matrices B and C were pooled from a very large set of orthologs that was not filtered to remove genes with a high rate of evolution, and Matrix C only differs from Matrix B in excluding a small (9.33%), random sampling of sites that are also present in Matrix A. Overall, these results indicate that while the strategy used to derive Matrices B and C allows homogeneously sampling genes from across genomes, it retains a greater number of substitutionally-saturated genes, and according to the considered criterion Matrix A should be considered the most reliable dataset. Matrices B and C were subjected to very stringent trimming protocols (to minimise missing data) implemented using the software Gblocks24 (see Methods for details). We performed a sensitivity analysis to test whether results of the analyses of Matrix B and C might have been biased by our site trimming strategy. To do so, we generated a fourth dataset (Matrix D) applying less stringent trimming criteria, and using a different alignment trimming software (trimAl25), to the untrimmed version of Matrix B. Matrix D includes 127,114 amino acids and is more than three times larger than Matrix B.

Fig. 1 Saturation plots and Bayesian results. a Saturation plots for Matrices A, B and C, illustrating that Matrix A has the lowest level of saturation. The dots represent the intersection between uncorrected genetic distances and the patristic distances obtained under ML for each pair of the 95 taxa. The lines of best fit show the general trend and direction of the data and above is shown the R2 and the slope of the regression line. Matrix A is represented in green, Matrix B in magenta and Matrix C in blue. Source data are provided as a Source Data file. b Schematic representation of the results of the CAT-GTR + G analysis of Matrix A. c Schematic representation of the results of the CAT-GTR + G analysis of Matrix B. d Schematic representation of the results of the CAT-GTR + G analysis of Matrix A, after Dayhoff-6 recoding. b–d Lineages are summarized at the ordinal level. Support values represent posterior probabilities, with lack of a number indicating maximum support. Silhouettes designed by ART Full size image

We performed model testing (using Bayesian Cross Validation26 on Matrix A) and found that the compositionally site-heterogeneous CAT-GTR + G model fits the data best: Cross-Validation Score = 21.1139 ± 8.39011 (in favour of CAT-GTR + G—against the second best fitting model GTR + G). This result was expected as CAT-GTR + G27 has been shown to possess a greater ability to adequately describe site-specific amino acid preferences, compared to other available models28. Accordingly, all our Bayesian analyses were performed using the compositionally site-heterogeneous CAT-GTR + G model. We also performed Maximum Likelihood (ML) analyses; however, CAT-GTR + G is not implemented in ML software. Accordingly, our ML analyses used LG + I + G4 (Matrix A and B) and LG + F + I + G4 (Matrix C), which emerged as the best fitting site-homogeneous models (according to ModelFinder29) for Matrices A, B and C, respectively, but only among the limited set of predefined empirical GTR models (see ref. 24 for a discussion) we tested in IQTree30. Matrix D proved too large to be analysed under CAT-GTR + G. Accordingly, this dataset was only analysed under ML in IQTree under the LG + F + I + G4 model (selected using ModelFinder). Given that for the ML analyses the overall best fitting model (CAT-GTR + G) could not be used (see above), and less fitting models had to be used instead, we suggest that when our ML and Bayesian results disagree, the latter should be preferred.

Phylogenetic patterns in Chelicerata

We found that CAT-GTR + G analyses of both Matrices A and B invariably support monophyletic Acari (Figs. 1b, c and 2). CAT-GTR + G analyses of Matrix C did not converge, perhaps because Matrix C is derived from Matrix B but excludes the less saturated genes from Matrix A (Supplementary Table 2). The less saturated Matrix A supports the horseshoe crabs as the sister of monophyletic Arachnida (Figs. 1b and 2), whereas the more saturated Matrix B nested horseshoe crabs within the arachnids (Fig. 1c). Analyses performed under CAT-GTR + G can model site-specific compositional heterogeneity, but lineage-specific compositional heterogeneity can also negatively affect phylogenetic results28,31. We analysed Matrix A, which we deem more reliable based on the saturation-plot analyses (see above), under the best fitting CAT-GTR + G model after Dayhoff-6 recoding, which mitigates lineage-specific compositional heterogeneity and allows CAT-GTR + G to achieve greater modelling adequacy28,31. Analyses of the Dayhoff-6 recoded version of Matrix A destabilised some of the relationships within Arachnida (Fig. 1d) and caused support values for some nodes to reduce significantly, indicating that some relationships in Figs. 1b, 2 might represent compositional attractions. Nonetheless, our CAT-GTR + G analyses of the Dayhoff-6 version of Matrix A confirmed support for monophyletic Acari and did not reject a possible relationship between the horseshoe crabs and Arachnida. On the contrary, a placement of the horseshoe crabs within Arachnida is rejected by our CAT-GTR + G analyses (with and without recoding) of Matrix A. The possibility that a placement of the horseshoe crabs within Arachnida8,9 might be artefactual is further suggested by the observation that this topology is recovered from the more saturated Matrix B (Fig. 1c – irrespective of the model used), and from all four datasets under the less fitting LG + I + G4 (Matrices A and B) and LG + F + I + G4 (Matrix C and D) models (Supplementary Fig. 1 and Supplementary Fig. 2). In contrast, the consistent support for monophyletic Acari across all matrices, models and coding strategies, suggests that this node most likely represent phylogenetic signal, rejecting the rival Poecilophysidea hypothesis in which Solifugae (sun spiders) is the sister group of Acariformes alone32,33,34,35, and a recently recovered clade in which Parasitiformes emerges as the sister group of Acariformes plus Pseudoscorpiones9. However, Matrix A resolves Solifugae as the sister of a monophyletic Acari, still possibly indicating a close, even if not exclusive, relationship between the sun spiders and the Acariformes. In contrast, our CAT-GTR + G analyses invariably found Pseudoscorpiones in a close association with Arachnopulmonata (either as its sister group or as the sister group of Scorpiones). Pseudoscorpions were unstable in our ML analyses and emerged either as the sister group of Euchelicerata to the exception of Acari in Matrix A, B, and D (Supplementary Figs. 1A, 1B and 2B, respectively) or as a poorly supported sister group of Acari in Matrix C (Supplementary Fig. 1C). We are unaware of potential morphological synapomorphies for Acari and Pseudoscorpiones to the exclusion of Solifugae (e.g. the free finger of the chelicera articulating ventrally is shared by all three groups11). In contrast, a potential morphological synapomorphy for Acari and Solifugae is a tight grouping of the epistomo-labral plate and discrete lateral lips around the anteriorly-positioned mouth, shared by this group to the exclusion of pseudoscorpions36, and this grouping has been recovered in some morphological cladistic analyses (e.g. extant taxa cladogram of ref. 37).

Fig. 2 Phylogenetic tree derived from the CAT-GTR + G analyses of Matrix A after exclusion of six unstable taxa (Calanus, Bothriurus, Liocheles, Pandinus, Superstitionia and Vietbocap). Removal of unstable taxa did not affect topological relationships but increased support levels at key nodes and improved convergence statistics. Support values represent posterior probabilities. Convergence statistics: Burnin = 3000; Total Cycles = 10,000 subsampling frequency = 20. Maxdif = 0.28; Minimal effective size = 63. Silhouettes designed by ART Full size image

Is Acari monophyletic?

Extensive lists of differences between acariforms and parasitiforms have been enumerated, and these entail numerous organ systems12. However, monophyletic Acari is still supported by more morphological apomorphies than any alternative set of relationships that have been suggested for both Acariformes and Parasitiformes. In addition, apomorphies supporting independent sets of sister group relationships for these taxa conflict with each other11. Morphologically, support for Acari has emphasized the gnathosomal feeding apparatus, a morphological unit at the front of the body composed of the chelicerae, the epistome, and the fused coxae of the pedipalps38. A gnathosoma has been variably regarded as shared by Acari and Ricinulei39, but that of Acari exhibits a closer association of the chelicerae and palps and greater mobility of the complex as a whole40. Several other putative autapomorphies of Acari39,41 are homoplastic with other arachnid orders (e.g. ingestion of solid food, aflagellate sperm, stalked spermatophores, an ovipositor), are reversals (e.g. absence of a pygidium), or have not been conclusively shown to be absent in Ricinulei (e.g. a hexapod prelarva). Some recent phylogenetic analyses of Chelicerata based on morphological characters have recovered monophyly of Acari when extant taxa are considered in isolation40, or when fossils are included42. Alternatively, inclusion of fossils can result in Ricinulei shifting to an ingroup position40, or either Solifugae or Ricinulei moving within a paraphyletic Acari37. Our analyses, including 10 species of Parasitiformes and 11 of Acariformes, converge on a sister-group relationship between these lineages, providing support for the monophyly of Acari based on genomic-scale datasets. Monophyly of Acari is the only current hypothesis for the relationships of the Parasitiformes and Acariformes that has consilient2,16 support by genomic and morphological data. Within Acari, Parasitiformes comprises four major lineages, Mesostigmata, Ixodida, Holothyrida, and Opilioacarida, whereas Acariformes is composed by Trombidiformes and Sarcoptiformes. All analyses show a relationship between Phytoseioidea and Dermanyssoidea within Mesostigmata, and between Argasidae and Ixodidae within Ixodida. Relationships within Sarcoptiformes, on the other hand, could not be resolved with confidence as some alternative positions for an internal branch result in different definitions of Oribatida, either containing or excluding the Astigmata (Supplementary Fig. 3). We acknowledge that some key groups of Acari are not yet included in this or other transcriptomic analyses, such as Opilioacarida and Holothryida.

Phylogenetic relationships within Arachnida

All chelicerate taxa traditionally classified as orders are consistently retrieved as monophyletic across our analyses. Within Arachnida, all our analyses recover a monophyletic Acari. The sister group of Acari, however, could not be clearly identified. The CAT-GTR + G analyses of Matrix A found Solifugae in this position (but with variable support levels; PP = 0.77 in Fig. 1b and PP = 1 in Fig. 2). In addition, despite the existence of some morphological support for a clade of Acari and Solifugae37, this clade was not recovered in the Dayhoff-6 recoded analysis of Matrix A. This suggests that grouping of Acari and Solifugae in our trees might represent a compositional artifact. In addition to that, the ML analyses of Matrix C suggested a close alliance between Acari and Pseudoscorpiones, rather than Solifugae, although support for this node is very low. All our analyses recover a monophyletic Tetrapulmonata (Araneae sister to Uropygi and Amblypygi) in alliance with Scorpiones (Arachnopulmonata) or to a clade composed by Scorpiones + Pseudoscorpiones. Therefore, our results mostly support a single origin of book lungs, a hypothesis underpinned by detailed structural similarity between scorpion and tetrapulmonate book lungs43, a shared ancestral whole-genome duplication44, and also recovered in previous phylogenomic analyses8,9. In most instances in which Arachnopulmonata is retrieved, Pseudoscorpiones is found as its sister group. Together, these results suggest a close relationship between pseudoscorpions and arachnopulmonates (Fig. 2). Considerations of model fit favour a close relationship between Pseudoscorpiones and Arachnopulmonata. All CAT-GTR + G analyses of Matrices A and B, including the analyses of the Dayhoff-6 recoded version of Matrix A (Fig. 1b–d), support a sister group relationship between Opiliones and Ricinulei, in most instances with maximum support. This clade has not been supported by any morphological analysis, although early taxonomists sometimes classified Ricinulei as opilionids. The closest modern hypothesis is one that unites these two lineages with Acari45. A clade composed of Opiliones as sister group to Ricinulei and Solifugae was recovered in a dataset of the 500 slowest evolving genes in ref. 8, noting that these taxa share a fully segmented opisthosoma and tracheae. In most cases our ML analyses likewise support a relationship between Ricinulei and Solifugae, although always with low support (Supplementary Fig. 1A, B). Consideration of model fit (see above) leads us to suggest that Ricinulei plus Opiliones might be more likely to be correct. Intraordinal relationships within the larger arachnid groups, such as in Opiliones or Araneae, are almost equivalent between different matrices and analyses.

Summary

We found support for clades that clarified key controversies in chelicerate phylogeny. Foremost among these is the alliance between mites and ticks, resulting in a grouping of arachnids with even more species than spiders. More broadly, our results suggest that the success of the arachnid order was most likely based on a single terrestrialisation event that happened after the last common ancestor of the horseshoe crabs diverged from the last common ancestor of Arachnida. Our analyses have also proposed some novel clades that need corroboration (such as a putative sister group relationship between Opiliones and Ricinulei, and an alliance between Pseudoscorpiones and Arachnopulmonata). We caution, however, that such lineages as Palpigradi and Opiliocarida remain unsampled. While further taxonomic sampling is needed to fully clarify the evolutionary history of chelicerates, consilience of genomic and morphological results marks a significant advance in our understanding of chelicerate evolution.