There have been repeated shifts in social behavior in L. albipes

Whole-genome resequencing data identified a total of 2,655,960 genetic variants among 143 individuals of L. albipes that passed our quality filters (see Methods; Supplementary Data 1). We estimated the nucleotide diversity per site (π) to be 0.002, and we did not find differences in genetic diversity among social forms (Supplementary Table 1; two-sample t-test; t 4 = −0.961, p = 0.391). Next, to determine the extent of linkage disequilibrium (LD), we used 50 kb sliding windows across each genomic scaffold. As in honey bees5, LD decays rapidly with physical distance: we find a ~50% reduction in r2 within 250 bp (Supplementary Fig. 1), providing near SNP-level resolution in subsequent genome-wide association analyses. This pattern is consistent across all populations and likely reflects high levels of genetic variation found within the species (potentially maintained via extensive gene flow, panmictic mating, and/or by the high recombination rates typical of social Hymenoptera6).

Using these genomic data, we first asked if there was evidence of gene flow among behavioral forms, or if the social and solitary populations represent distinct genetic lineages. Specifically, we estimated the genetic structure among individuals from the six population samples. The mean F ST between social and solitary forms is 0.06 (mean permuted F ST = 0.0009 ± 0.0019; permutation test, p < 0.001), suggesting that there is, or recently has been, gene flow among behavioral groups. These estimates are similar to those from honey bee populations, where F ST ranges from 0.05–0.2 across different populations5. Next, we implemented a principal component analysis (PCA) using a set of LD-pruned SNPs. The results demonstrate that individuals largely cluster by population, but populations do not cluster by social behavior (Fig. 2a). A population tree mirrors these relationships, with social and solitary populations repeatedly clustering together rather than separating by social form (Fig. 2b). Taken together, these analyses indicate substantial gene flow and/or shared evolutionary history between behavioral forms within L. albipes. Furthermore, the relationships among social and solitary individuals suggest repeated shifts in social behavior within this species—perhaps as a result of local adaptation3,7. Ancestral state reconstructions for this group suggest that L. albipes descended from a social ancestor;2 therefore, this pattern likely reflects losses of eusociality in this species.

Fig. 2 Repeated shifts in social behavior within L. albipes. a Genetic principal component analysis of six populations: three social (blues and greens) and three solitary (reds and oranges). b Population tree. Each dot represents an individual (n = 143). Analyses were run using LD-pruned SNPs (n = 688,836). Both the PCA and the population tree demonstrate that social and solitary forms are not incipient species; instead, the multiple groupings of social and solitary populations are consistent with repeated shifts in social behavior within L. albipes, likely the result of local adaptation Full size image

Multiple genomic regions are associated with this social polymorphism

The observed lack of genetic structure between social and solitary forms of L. albipes coupled with the repeated shifts in social behavior allows for a population-genomic approach to identify genetic variants associated with behavioral variation. We used a genome-wide, mixed-model association test (GEMMA8) that explicitly models and accounts for genetic relatedness while testing for correlations between phenotype and genetic variants. First, no variant was consistently fixed among all social versus all solitary populations, suggesting that there is not a single, shared locus shaping variation in social behavior in this species but rather that the genetic underpinnings of this trait are complex. Concordantly, we found eight regions containing 194 SNPs passing a genome-wide, FDR-corrected significance threshold9 of 5 × 10−5 (which roughly corresponds to a raw-p threshold of 5 × 10−9; Fig. 3; Supplementary Data 2), suggesting that multiple regions throughout the genome contribute to intraspecific behavioral variation in L. albipes.

Fig. 3 SNPs associated with social behavior and their genomic location. a Manhattan plot of the genome-wide associations (n = 71 social, 72 solitary bees). Each point represents a single SNP and its -log10 p-value (FDR-corrected for multiple testing). Only SNPs with FDR < 0.2 are included. SNPs associated with syntaxin 1a are highlighted (yellow). b Multiple SNPs are associated with variation in social behavior in L. albipes (n = 194 SNPs), ~40% of these occur in regulatory regions neighboring coding exons (i.e., n = 45 within 5 kb upstream of the transcription start site; n = 28 within 1 kb downstream of the last codon) Full size image

Variants are enriched in both regulatory and coding regions

The candidate SNPs fall within 10 kb of 62 genes, and many of these differences are located in potential regulatory regions10. In fact, 40% of identified SNPs fall nearby genes, either 5 kb upstream of the transcription start site (n = 45) or 1 kb downstream of the last codon (n = 32), a significantly greater proportion of variants in these regions than expected by chance (hypergeometric test, p = 1.4 × 10−5). Moreover, 17 of these 194 SNPs, located in nine different genes, are predicted to be non-synonymous variants (Supplementary Table 2; hypergeometric test, p = 1.3 × 10−11), and eight variants occur at synonymous sites (hypergeometric test, p = 0.02). Similar proportions of coding to non-coding changes were observed in a comparison of stickleback freshwater and marine morphs11. Taken together, these results suggest that changes in both gene regulation and coding sequence play an important role in the social polymorphism within this species.

In total, we identified 62 candidate genes that either contained coding changes or were associated with nearby regulatory variants (within 10 kb). Gene ontology analyses performed on these candidates identified a significant overrepresentation of loci associated with neurotransmission (hypergeometric tests; SNARE binding, p = 0.001; regulation of neurotransmitter secretion, p = 0.02) and metabolism (TOR signaling, p = 0.02; negative regulation of insulin-like growth factor receptor signaling pathway, p = 0.004), among others (Supplementary Data 3). These gene functions suggest a link to behavioral variation, but because behavioral variation is tightly linked to environmental conditions in this species, some may also reflect associations driven by differences in climatic conditions. Overall, the mechanisms linked to facultative social behavior in halictids appear similar to those associated with obligate eusocial behavior in other species12.

Highly conserved genes, including syx1a, shape social behavior in bees and humans

Of the 194 SNPs associated with social behavior, 21 are clustered in or nearby six candidate genes implicated in human autism (Supplementary Table 3), a significant overrepresentation (hypergeometric test, p = 0.001). This result is consistent with the observation that genes whose expression level is correlated with social responsiveness in honey bee workers include genes implicated in human autism spectrum disorders (ASD)13. Thus, our results, which focus on genetic sequence variants rather than correlations with gene expression, independently support the idea, with data from a different species and a different social behavior, that a highly conserved set of genes or pathways may be involved in shaping social behavior across both insects and vertebrates, including humans14.

An intronic variant in syx1a reconstructs the expression differences between social forms

Our top candidate SNPs fall within a single peak on scaffold 28 (Fig. 3a). This peak contains variants exhibiting the strongest association with social behavior and includes seven SNPs clustered in regulatory regions surrounding an ASD-associated gene, syntaxin 1a (syx1a; Fig. 4a; Supplementary Table 4). Syx1a is a key component of the SNAP/SNARE complex and is critical for binding synaptic vesicles for their subsequent fusion and release of neurotransmitters at chemical synapses15. Changes in syx1a expression levels have been associated with social behavior in other species. For example, in migratory locusts, an increase in syx1a expression is associated with the transition from solitary to gregarious life cycles16; in honey bees, changes in syx1a protein expression are correlated with olfactory learning17; in mice, syx1a knockouts show unusual social behaviors and deficits in synaptic plasticity and memory formation18; and, in humans, syx1a has been repeatedly associated with ASD19,20,21. Thus, syx1a represents an exciting candidate for harboring causal mutations that contribute to the natural evolution of transitions in social behavior among halictid populations.

Fig. 4 Functional effects of regulatory SNPs in the syx1a locus. a Schematic of syx1a. Arrows denote locations of the seven SNPs associated with social behavior in L. albipes. Arrow colors illustrate F ST between social forms at each SNP. Two SNPs, labeled SNP 1 and 2 (red), have the highest F ST estimates, and are in strong LD with each other (r2 = 0.803), compared with the mean pairwise value among all seven SNPs (r2 = 0.18). b Quantitative PCR identified differences in syx1a brain gene expression between social forms. Expression values are plotted as the fold change of normalized syx1a expression levels. Social individuals (n = 4) have significantly higher levels of syx1a brain gene expression than solitary individuals (n = 5; t-test, t = −6.32, p = 0.0004). c Luciferase reporter assays test if candidate SNPs in the regulatory regions of syx1a affect enhancer activity (n = 5 replicates per group). Both tested regions show enhancer activity relative to control (Tukey’s honestly significant difference (HSD) post-hoc test, p < 0.05). There is no significant difference between social and solitary alleles of SNP 1 (Tukey’s HSD, p = 0.1773), but the social allele of SNP 2 drives ~1.5 times higher reporter expression than does the solitary allele (Tukey’s HSD, p < 0.0001), consistent with the qPCR assay (b) Full size image

First, to test if solitary and social bees show differences in syx1a expression, we used quantitative, reverse transcription PCR and found that bees from social populations have significantly higher levels of syx1a brain gene expression than those from solitary populations (Fig. 4b; t-test, t = −6.32, p = 0.0004). Second, to test if any of these seven SNPs—all of which are within 3.5 kb upstream of the start site (n = 4) or located in the first intron (n = 3)—affect syx1a gene expression, we measured divergence between social and solitary forms. As expected, F ST between social forms is elevated at each of these seven SNPs, ranging from 0.16–0.38 (vs. a genome-wide F ST = 0.06).

Two SNPs exhibited exceptionally high levels of F ST (0.38 and 0.37, respectively) between social and solitary forms (Fig. 4a). These SNPs are in strong LD with each other (r2 = 0.803) despite being separated by 4.14 kb (SNP 1; LALB_28:262159 and SNP 2; LALB_28:266299), compared with LD between the other five SNPs, which is similar to genome-wide patterns (Supplementary Fig. 2). Both of these SNPs are located in regions with predicted binding motifs22. Moreover, at both SNPs, the derived allele (by comparison with the outgroup genotype in L. calceatum)2 is present at higher frequency in social populations.