Sequencing, orthology assessment, and data processing

We collected between 0.06 and 3.4 M paired-end quality-trimmed raw reads per species. These reads assembled on average into 17,551 contigs. After having compared all contigs sequenced on the same lane against each other, we removed on average ~ 7.1% potential cross-contamined contigs per species (Additional file 1: Table S1). We successfully enriched on average 71% of the target DNA across all species. The base coverage depth of the on-target contigs (C t ) was on average 967× (Additional file 1: Table S1). When searching for the 195 target genes in the sequenced and assembled enriched DNA libraries, we found on average 139 target genes per species. In comparison, when searching the available transcript libraries [27|, we found on average 187 target genes per species (Additional file 1: Table S1 and S2). Additional sequencing information on, for example, length of contigs referring to target genes and the number of identified orthologs per species is given in Additional file 1: Table S1 (enrichment data set) and in Additional file 1: Table S2 (transcriptomic data set). We furthermore provide additional supplementary results of all conducted processing steps of the aligned multiple sequence alignments (MSAs) on nucleotide and amino acid level on, for example, alignment reliability and masking and protein domain identification is given in the Supplementary information.

Phylogenetic analyses

Our phylogenetic inferences are based on enriched nucleotide sequence data of 95 and on transcriptomic sequence data of 79 apoid wasp and bee species. We added corresponding sequence data of nine outgroup species for rooting the inferred tree topology. The analyzed dataset comprises 94,869 amino acid and 284,607 corresponding nucleotide sites (representing all codon positions), encoding a total of 195 single-copy protein-coding genes. For estimating divergence times, we applied an independent-rate molecular clock approach considering ten validated fossil calibration points [30, 31] (Additional file 2: Figure S1 and Additional file 1: Table S9). Information on taxa with unstable phylogenetic position (rogue taxa) can be found in the Supplementary information and in Additional file 1: Table S3.

We inferred largely congruent topologies, irrespective of whether we analyzed the amino acid or nucleotide sequence data (1st and 2nd codon positions only as well as all three codon positions) under the maximum likelihood optimality criterion (Additional file 2: Figures S2, S3 and S4), and almost all clades received high bootstrap support (Fig. 1). Our analyses confirm the monophyly of Apoidea, as previously suggested by analyzing morphological characters and molecular sequence data [24, 27, 32, 33] (Fig. 2; node 1). We estimate the origin of Apoidea to have been in the late Jurassic, ca. 185 million years ago [Mya] (95% confidence interval [CI] 220–165; node 1), indicating that Apoidea are likely older than previously thought [27, 34,35,36]. We confirm Ampulicidae as closest extant relatives of all remaining Apoidea. Our results show the family Crabronidae to be polyphyletic, a result consistent with earlier studies [12, 23,24,25, 27] (Fig. 2 ; node 2). Our study confirms the monophyly of each of the species-rich crabronid wasp subfamilies Astatinae, Bembicinae + Heterogynaidae, Crabroninae + Dinetinae, Philanthinae, and of the apoid wasp family Sphecidae (classification according to Pulawski 2016). The phylogenetic placement of the species-poor crabronid wasp subfamily Mellininae, currently included in the Crabronidae, differs between topologies inferred from analyzing different datasets: in the topology inferred from analyzing all codon positions on the nucleotide sequence level (Fig. 2 ; node 3) Mellininae are suggested as the sister group of Sphecidae, although with low bootstrap support. In contrast, in the topology inferred from analyzing the amino acid sequence data (Fig. 1c) and in the topology inferred from analyzing only 1st and 2nd codon positions (Fig. 1b), Mellininae are the sister lineage of (Sphecidae + (Craboninae + Dinetinae). The latter placement of the Mellininae was also inferred by Peters et al. (2017), although with higher bootstrap support than what we found in our study (Fig. 1d). We inferred Craboninae and Dinetinae as sister groups, irrespective of the analyzed datasets (Fig. 1a–1c and Fig. 2; node 4). In contrast to Branstetter et al. (2017), we find the apoid wasp family Heterogynaidae to be a subordinated lineage of Nyssonini, a tribe of the crabronid wasp subfamily Bembicinae (Fig. 2; node 8), but with poor bootstrap support. Finally, we confirm the polyphyly of the species-rich crabronid subfamily Pemphredoninae as suggested by Peters et al. (2017), representing an artificial group of three lineages. One of these three lineages, the one comprising Stigmina, Pemphredonina, and Spilomenina, is inferred as the sister lineage of the crabronid wasp subfamily Philanthinae (Fig. 2; node 7). The remaining two lineages (i.e., Psenini + Odontosphecini and Ammoplanina) constitute a paraphyletic grade leading to Anthophila. Since the taxon sampling in the study by Peters et al. (2017) neither included Ammoplanina nor Odontosphecini, the authors inferred Psenini as the closest relatives of bees. With a more comprehensive taxon sampling, our study is thus the first to suggest that the closest relatives of Psenini are Odontosphecini and that Ammoplanina possibly represents the extant sister lineage of bees. Note, however, that our taxon sampling does not include representatives of the apoid wasp tribe Entomosericini, a lineage that could thus be even more closely related to bees than the Ammoplanina. In any case, the insight of a close phylogenetic relationship between Ammoplanina and bees allows us to further specify the age of the last common ancestor of bees. Specifically, we estimate that the lineage leading to extant bees began to diverge from the lineage leading to the Ammoplanina in the Early Cretaceous, ca. 128 Mya (CI: 148–108 Mya), thus at a time period during which angiosperms rapidly radiated [11, 14, 36, 37] (Fig. 2; node 9).

Fig. 1 Possible phylogenetic relationships of the major apoid wasp lineages and of bees (Anthophila) as inferred in the present investigation and by Peters et al. (2017). Members of the apoid wasp family “Crabronidae” are scattered across eight major clades, whereby we combine Crabronidae: Dinetinae and Crabroninae to one clade: Crabronidae (marked by an asterisk). Numbers in brackets represent the number of taxa inluded in the analyses. Highlighted group names of Astatinae (red), Bembicinae (yellow) and Mellininae (green) show unambiguous sister group relationships, resulting in a total of three alternative tree topologies: (a) inferred from analyzing 284,607 nucleotide sites and applying a combination of protein domain – and codonbased partitioning scheme by modeling 1st, 2nd and 3rd codon positions separately, (b) inferred from analyzing 284,607 nucleotide sites and applying a combination of protein domain – and codon-based partitioning scheme by modeling 1st and 2nd codon position separately - 3rd codon position excluded, (c) inferred from analyzing 94,869 amino acid sites and applying a protein domain-based partitioning scheme, (d) inferred by Peters et al. (2017) and (e) inferred, including bootstrap support values by Branstetter et al. (2017) Full size image

Fig. 2 Maximum likelihood phylogenetic tree inferred from analyzing 284,607 nucleotide sites and applying a combination of protein domain – and codon-based partitioning scheme by modeling 1st, 2nd and 3rd codon positions separately. Support values are obtained from 100 bootstrap replicates. Species marked by an asterisk (*) indicate rogue taxa. Two asterisks (**) point to the misplaced species Ammatomus sp. I and (***) to the position of the Stenotritidae. Circled numbers (nodes) indicate taxonomic groups of special interest described in the main text. Former classification according to W. J. Pulawski’s Catalog of Sphecidae “sensu lato” Full size image

The results of the Bowker’s matched-pairs test of symmetry indicate that the nucleotide dataset with all codon positions included (PF-NT-1,2,3) strongly violates the assumptions of global stationarity, reversibility, and homogeneity. In contrast, the amino acid dataset suffers significantly less from these model violations (Additional file 2: Figure S6). Since both inferred trees, from amino acid and nucleotide level (using ML and Bayesian approach) resulted virtually in the same major clades, we conclude that a possible GC bias of single species had most likely no impact on the obtained major results.

We applied Four-cluster Likelihood Mapping (FcLM) to assess whether or not confounding signal due to compositional heterogeneity across taxa and/or confounding signal due to non-random distribution of missing data in the amino acid and in the nucleotide supermatrices could have influence the phylogenetic tree inference. Specifically, we used FcLM to further assess the possible relationships of a) Ammoplanina, Psenini + Odontosphecini, Anthophila, and all remaining species in our dataset to each other (Hypothesis 1), and of b) Mellininae, Sphecidae, Crabroninae + Dinetinae, and all remaining species in our dataset to each other (Hypothesis 2) (Additional file 3: Table 1). When testing the phylogenetic position of Ammoplanina, we found a strong signal for Ammoplanina being the sister group of the bees when analyzing the nucleotide sequence data. When analyzing the amino acid sequence data, however, the results were inconclusive, as we found support for both a sister group relationship of Ammoplanina and bees and a sister group relationship of Ammoplanina and (Psenini + Odontosphecini). This phylogenetic ambiguity in respect of Ammoplanina is also reflected to some extend by the low bootstrap support for a sister group relationship of Ammoplanina and bees in the phylogenetic tree interred from analyzing the amino acid sequence data. Permutation tests cannot completely exclude confounding signal, more likely because of model violation due to among lineage heterogeneity than non-random distribution of missing data. However, when comparing the proportion of quartets without quartets showing confounding signal, we still have a higher support for Ammoplanina being the sister group of the bees than Ammoplanina being the sister group of Psenini + Odontosphecini (Additional file 1: Table S10, Hypothesis 1). In order to gain further confidence in hypothesis 1 describes the actual evolutionary history of the group, we suggest increasing the taxon sampling within Pemphredoninae (Fig. 2, former classification).

We also assessed the phylogenetic position of Mellininae via FcLM and found strong signal for Mellininae being the sister group of (Sphecidae + (Crabroninae + Dinetinae)) irrespective of whether or not we analyzed amino acid and nucleotide sequence data. This result is congruent with the outcome of the phylogenetic analysis of the amino acid sequence data, but it is incongruent with the phylogenetic tree inference results from analyzing the nucleotide sequence data. While we did not find confounding signal in amino acid sequence dataset when applying permutation tests, we found such signal when applying these tests on the nucleotide sequence data. The confounding signal is likely due to compositional heterogeneity among taxa, and the confounding signal could have caused a model violation in the phylogenetic tree inference, possibly having resulted in a misplacement of Mellininae as sister group to Sphecidae. This possibly erroneous phylogenetic relationship is seen in the ML tree that was inferred from the nucleotide supermatrix that included all three codon positions (Fig. 1a). We here consider the position of Mellininae as sister to (Sphecidae + (Crabroninae + Dinetinae)) to be the more credible hypothesis. For more information on the FcLM results, see Additional file 4: Figures 2, 3, 4 and 5 and Additional file 3: Table 1 in the Supplementary file and Additional file 1: Table S10.

Sister group of the bees

Our analyses reveal Ammoplanina as possibly representing the extant sister group of bees (Fig. 2 ). Ammoplanina comprise ca. 130 species in a total of ten genera [2]. All species of Ammoplanina are remarkably small in size. They have a body length of 2–4 mm and a strongly reduced wing venation. Ammoplanina occur in the Holarctic and in the Ethiopian region [38]. Assuming that Ammoplanina are the closest extant relatives of bees, it might be conceivable that the most recent common ancestor of Ammoplanina and bees was characterized by a small body size, an assumption that fits with the characteristics of a previously described bee fossil, Melittosphex burmensis [39]. This supposedly earliest bee fossil, described from 99-Mya-old Myanmar amber, also has a small body size (2.95 mm) and is thus similar in size to species of Ammoplanina. The small body size of M. burmensis has been interpreted as an adaptation to the small size of the flower in the Early to Late Cretaceous [39]. In general, it is assumed that the origin of bees and their diversification was strongly linked to the Cretaceous radiation of angiosperms [12, 27, 36, 39]. During the Cretaceous the majority of flowers in Myanmar amber show a size range of 0.5–3.0 mm [40]. It is likely that the small flowers of the Cretaceous were primarily pollinated by small flies, beetles, and thrips, as well as by small Hymenoptera [41], such as the predatory ancestors of bees and early bees.

In this regard, our analyses partly support an almost 50-year-old idea proposed by Sergei Ivanovitch Malyshev (1968) [42]. Malyshev hypothesized that bees are derived from pemphredonine wasps. He deduced his hypothesis from the hunting behavior of some Pemphredoninae on flowers: most pemphredonine and all Ammoplanina wasps hunt for flower-visiting thrips as food for their offspring [43, 58]. Our results allow the reconstruction of an evolutionary scenario for the transition from hunting Ammoplanina wasps to pollen-collecting bees. Species in the Ammoplanina have specialized on thrips, which have been shown to often aggregate on flowers and to feed on pollen [44, 59, 60]. It can be assumed that the visual and potentially olfactorial floral cues, which wasps have used to locate their prey (flower-visiting thrips) could similarly be employed for locating pollen resources by the proto-bee. Pollen-fed and pollen-covered thrips are transported to the nest by the female wasp, and this might have allowed for the switch from thrips consumption to obligatory pollen feeding by the wasp respectively early bee larvae. Malyshev (1968) also stated that the pemphredonines necessarily have to visit flowers and collect thrips repeatedly to provision a nest cell with the sufficient number of prey specimens due to the small prey size. This implies that progressive provisioning was already accomplished by the wasp-like ancestor of the bees. This specific behavior likely further facilitated the transition to pollen collecting, which is also progressive in bees. Thus, in the light of our inferred phylogeny of apoid wasps, an evolutionary shift from predation on flower visiting and pollen feeding thrips in the common ancestor of Ammoplanina and bees to pollen feeding in the stem species of bees appears plausible.

Evolution of sociality in apoid wasps and bees

While most Apoidea are solitary, a social behavior, such as communal nesting and eusociality, evolved in some apoid wasp lineages. Communal nesting means that a group of females share a nest to lay their eggs and provision them. However, the females neither collaboratively care for the food nor do they form castes. In contrast, eusocial species live in colonies with castes that show a division of labor (“queens” and “workers”) and overlapping generations [18, 45].

The most widespread form of sociality (besides brood care) in Apoidea is communal nesting [17, 18]. In bees, communal nesting is known in Andrenidae, Apidae, Halictidae, Megachilidae, and Melittidae [17, 18], whereas in apoid wasps, it occurs exclusively in some pemphredonine and philanthine wasps. In pemphredonine wasps, communal nesting is known to occur in species of (1) Spilomenina (e.g., Spilomena socialis [10], Arpactophilus mimi [46], and species of the genus Xysma [46]) and (2) Stigmina (e.g., Carinostigmus [47]). In philantine wasps, communal nesting is known to occur in, for example, Cerceris antipodes [48], Cerceris rubida and Trachypus petiolatus [49]). In apoid wasps, Microstigmus comes (Pemphredoninae) is the only species considered eusocial, based on family groups inhabiting a nest, each with a single mated and size-structured reproductive female [50]. In contrast, ca. 10% of the known bee species are eusocial [17]. Traits, such as intraspecific communication or nesting biology might have triggered the manifold evolution of social behavior in apoid wasps and bees. Solitary species require, for example, intraspecific communication primarily for mate finding and recognition. Species that show communal nesting additionally need to exchange group information, for example, females that compete for reproductive dominance. Here, the relationship between cuticular hydrocarbon profile and ovarian activity informs both competitors and potential helpers about the reproductive potential of each female and can be used to establish dominance and induce helping behavior [45]. Finally, eusocial species need to communicate dominance hierarchies and coordinate division of labor [45]. The results of our phylogenetic study indicate that within Apoidea, social behavior (communal nesting and eusociality) exclusively occurs in representatives of a single clade, which comprises bees, pemphredonine and philanthine wasps (Fig. 2 ; node 10).

The phylogenetically restricted occurrence of species showing some form of social behavior within Apoidea raises the question of whether there might be a common physiological, ecological or morphological trait fostering the establishing of this intriguing trait. In this respect, the role of (chemical) communication in social and non-social species is highly relevant, particularly concerning (1) modes of communication (e.g., among/within sexes and among/within groups), (2) cuticular hydrocarbons (genetically, food-, nest-, and climate-driven [51, 52]), and (3) the origin of pheromones from different glands (e.g., postpharyngeal gland and Dufour’s gland [53, 54]). Therefore, further studies on the above listed traits are needed to provide deeper insights in the evolutionary origin of sociality in Apoidea [45].

Implications for the classification of Apoidea

Our phylogenetic analysis of Apoidea confirmed the polyphyletic status of “Crabronidae”, which comprise about 90% of all known extant apoid wasp species [4]. Two of the currently described four families of apoid wasps (i.e., Heterogynaidae and Sphecidae s. str.) are nested deeply within crabronid wasps (Fig. 2 ; nodes 3 and 8). We identified ten major clades that can be consistently distinguished within crabronid wasps across all inferred topologies. Regarding the still ambiguous sister group relationships of Astatinae, Bembicinae, and Mellininae, we suggest assigning nine of these major clades family rank (i.e., Ammoplanidae, Astatidae, Bembecidae, Crabronidae, Mellinidae, Pemphredonidae, Philanthidae, Psenidae, Sphecidae; see Fig. 1 ; nodes 1–9). By raising the nine former subfamilies (one subtribe) to family rank, we establish a natural system of Apoidea. Given the contradicting phylogenetic placement of the Heterogynaidae in our study and that by Branstetter et al. (2017)), we conservatively refrain from altering the taxonomic rank of the subfamily Heterogynaidae (e.g., by treating it as subordinated group of Bembicidae) until more species and DNA sequence data of Heterogyna have been included in a phylogenetic analysis.

Since a phylogenetic system does not necessarily correspond with the traditional categories of the Linnaean hierarchy, conventions are required in order to handle newly inferred topologies, for example, those proposed by Wiley and Lieberman (2011). Specifically, the Linnaean system of biological nomenclature is in use since the eighteenth century and provides methods of arranging and ranking taxa to reflect their relative hierarchical position [55]. However, applying the Linnaean system of categories to a phylogeny might result in numerous problems. Considering the sister group relation of bees and Ammoplanina, we have to keep in mind that the clade representing the bees has no formally ranked name in the current classification. Some authors refer to this group as Apiformes [56] other as Anthophila [34]. However, this clade comprises various subordinated lineages that were granted taxonomic names of family rank (i.e., Andrenidae, Apidae, Colletidae, Halictidae, Megachilidae, Melittidae, Stenotritidae).

Here, we propose to raise Ammoplanina from subtribe to family level. Although a Linnaean system of nomenclature seems not to be applicable in this case, Wiley and Lieberman (2011) proposed a modern way of integrating Linnaean nomenclature with phylogenetic studies by minimizing taxonomic decisions and changes to existing classifications. Applied to the present study, this suggests that the huge group of bees, having no formal ranked name, should be referred to as Anthophila due to their well-known special status. However, rising Ammoplanina to family rank (Ammoplanidae) is needed to ensure that the system remains encaptic (i.e., that the sister group of a clade comprising multiple families does not hold a rank lower than that of a family). Our new classification of apoid wasps is thus minimally redundant, but maximally informative in respect of all newly proposed families [57]. As mentioned in the introduction, however, we were unable to study samples of Entomosericini, which are part of the former subfamily Pemphredoninae. The phylogenetic position of this lineage of apoid wasps consequently remains unclear. Until samples of Entomosericini are included in a future phylogenetic analysis, we suggest referring to this enigmatic lineage as Entomosericini incertae sedis.