Abstract Repeated adaptation to a new environment often leads to convergent phenotypic changes whose underlying genetic mechanisms are rarely known. Here, we study adaptation of color vision in threespine stickleback during the repeated postglacial colonization of clearwater and blackwater lakes in the Haida Gwaii archipelago. We use whole genomes from 16 clearwater and 12 blackwater populations, and a selection experiment, in which stickleback were transplanted from a blackwater lake into an uninhabited clearwater pond and resampled after 19 y to test for selection on cone opsin genes. Patterns of haplotype homozygosity, genetic diversity, site frequency spectra, and allele-frequency change support a selective sweep centered on the adjacent blue- and red-light sensitive opsins SWS2 and LWS. The haplotype under selection carries seven amino acid changes in SWS2, including two changes known to cause a red-shift in light absorption, and is favored in blackwater lakes but disfavored in the clearwater habitat of the transplant population. Remarkably, the same red-shifting amino acid changes occurred after the duplication of SWS2 198 million years ago, in the ancestor of most spiny-rayed fish. Two distantly related fish species, bluefin killifish and black bream, express these old paralogs divergently in black- and clearwater habitats, while sticklebacks lost one paralog. Our study thus shows that convergent adaptation to the same environment can involve the same genetic changes on very different evolutionary time scales by reevolving lost mutations and reusing them repeatedly from standing genetic variation.

Author summary When organisms colonize a new environment in replicate, natural selection often leads to similar phenotypic adaptations. Such “convergent evolution” is known from both distant relatives, e.g., sea cows and whales adapting to an aquatic life, and from multiple populations within a species, but the causing genetic changes are rarely known. Here, we studied how a fish, the threespine stickleback, repeatedly adapted its color vision to living in red light–dominated blackwater lakes. Using multiple natural populations and a 19-y evolution experiment, we found selection on a blue light–sensitive visual pigment gene. One allele of this gene with a red-shifted light sensitivity facilitated repeated blackwater colonization. Surprisingly, two amino acid changes responsible for the red-shift have independently occurred 198 million years earlier, after the gene was duplicated in the ancestor of all spiny-rayed fish and modified into blue- and red-shifted gene copies. While other fish species today use these two gene copies to adapt to clear- and blackwater, stickleback have lost a copy and reevolved these mutations on different alleles of the same gene causing convergent adaptation to these habitats. Thus, we conclude that the same genetic changes can be responsible for convergent evolution on very different time scales.

Citation: Marques DA, Taylor JS, Jones FC, Di Palma F, Kingsley DM, Reimchen TE (2017) Convergent evolution of SWS2 opsin facilitates adaptive radiation of threespine stickleback into different light environments. PLoS Biol 15(4): e2001627. https://doi.org/10.1371/journal.pbio.2001627 Academic Editor: Nick Barton, The Institute of Science and Technology Austria, Austria Received: November 24, 2016; Accepted: March 6, 2017; Published: April 11, 2017 Copyright: © 2017 Marques et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability: Aligned sequences have been deposited on the NCBI short read archive under accession SRP100209. Funding: National Research Council Canada http://www.nrc-cnrc.gc.ca (grant number NRC2354). This research was largely supported by an NSERC operating grant to TER. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. National Institute of Health https://www.nih.gov/ (grant number 3P50HG002568-09S1 ARRA).Received by DMK. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. National Institute of Health https://www.nih.gov/ (grant number 3P50HG002568). Received by DMK. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: The authors have declared that no competing interests exist. Abbreviations: dN/dS, ratio of non-synonymous to synonymous substitutions; EHH, extended haplotype homozygosity; MAF, minor allele frequency

Introduction Successful colonization of a new habitat requires adaptation to a multitude of different selection pressures. When similar habitats are colonized in replicate by different populations or species, phenotypic adaptation is often convergent [1], and this is most striking in adaptive radiations in multiple lakes or on several islands [2]. Whether a similar phenotypic adaptation is caused by selection on variants present in a shared ancestor due to recurrent mutation or due to changes in different genes, however, is still poorly understood [3, 4]. Only recently, genetic and population genomic studies have started to unravel the evolutionary mechanisms of phenotypic convergence [5–12]. An adaptive radiation with replicate habitat colonization is found among threespine stickleback (Gasterosteus aculeatus) inhabiting the Haida Gwaii archipelago, British Columbia, Canada. Since the retreat of the ice sheets 12,000 years ago, and likely before that [13–15], marine stickleback have colonized hundreds of freshwater habitats independently in different watersheds and adapted in predictable ways to highly divergent “ecological theatres” [16]. One major predictor of natural selection in the Haida Gwaii stickleback radiation is the spectrum of visible light [16]. Most Haida Gwaii lakes are either oligotrophic clearwater lakes featuring full-spectrum light to blue-shifted light with increasing depth, or they are dystrophic blackwater lakes, stained by dissolved tannins leading to a red-shifted light spectrum [16–18]. Blackwater lakes are extreme, almost “nocturnal” visual environments, as both downwelling short-wavelength light and almost all up- or sidewelling light is absorbed, leaving only downwelling red light in a small cone above the focal animal. Evolutionary adaptation to blackwater lakes in Haida Gwaii stickleback had consequences for multiple traits: stickleback have evolved larger body sizes and reduced lateral plates, both maximizing burst velocity and agility to escape from a predator on short reaction distance, and the former increasing postcapture resistance to predators [16]. Not only natural selection by predators, but also sexual selection interacts with light spectra: blackwater stickleback males have replaced red with black nuptial throat color, which maximizes contrast to the blue eye and against the background via reversed counter-shading [17]. And both traits, the black throat and blue eyes, are preferred by females choosing their mates [19]. Also, color vision was adapted to the blackwater light spectrum: double cones in the retina of stickleback from blackwater systems express only red light–sensitive photopigments instead of one red and one green light–sensitive photopigment, increasing the stickleback’s visual sensitivity to dominant red light [18]. Expression differences are heritable and replicated between independently colonized blackwater lakes [18], but the genetic mechanisms underlying these differences are still unknown. Here, we study the evolutionary history of color vision genes during adaptation to blackwater environments. To perceive color, vertebrates use a combination of membrane-bound photosensitive proteins, called visual opsins, that are expressed in cone cells in the retina (i.e., cone opsins) and have peak sensitivities at different wavelengths [20]. Vertebrates have evolved large opsin repertoires via gene duplication and divergence [21, 22]. Previous research showed that opsins have evolved in response to the visual environment by sequence or expression divergence (i.e., “spectral tuning,” [18, 23–30]). Adaptation of color vision has been found both between [31–34] and within species [35–37], sometimes without gene duplication [38] or without sequence divergence [39]. Remarkably, spectral tuning of opsins has led to convergent adaptation by recurrent mutations, leading to the same amino acid sequences in distantly related species, families, orders, or phyla [31, 40–43]. These observations together with extensive biochemical and mutagenesis study of opsin proteins have led to the identification of several “key site” substitutions [44–46], from which genotypes the light absorption phenotype can be predicted. Both functional experiments and the evolutionary history of opsins thus show that there are many different, functionally tractable “molecular roads” to color vision adaptation. Most ray-finned fish possess large repertoires of eight or more cone opsin genes, originating from a combination of ancient and recent lineage-specific gene duplication events, facilitating adaptation to a diversity of visual environments in aquatic systems [21, 22]. However, threespine stickleback have only four cone opsins: the UV sensitive SWS1, a single blue-sensitive SWS2, a single green-sensitive RH2, and a red-sensitive LWS [18, 22]. This is an impoverished repertoire compared to most other fish species, and when compared to relatives among spiny-rayed fish, two SWS2 paralogs and one RH2 paralog have been lost [22, 47]. Although we know of parallel and heritable expression differences between blackwater and clearwater habitats [18] and between marine and freshwater habitats [36], the targets of selection in the genome are unknown and it is unclear whether spectral tuning of amino acids is involved in adaptation to divergent freshwater habitats [6, 36]. Here, we assess whether and which cone opsin genes have experienced recent selection using two types of evidence. First, we use whole genomes from one oceanic and 27 freshwater fish from across the Haida Gwaii adaptive radiation and from coastal British Columbia, including 15 clearwater and 12 blackwater populations derived from 18 watersheds independently colonized by marine ancestors over the last 12,000 y [13]. Second, we use whole genomes from a selection experiment in which 100 adult stickleback from a blackwater lake were transferred to a barren clearwater pond, from which 11 individuals were resampled after 19 y [48]. Then, we screen cone opsin genes for amino acid variation with predictable effects on color vision and test whether selection on such coding changes or noncoding variation has facilitated the colonization of blackwater habitats. Finally, we compare the molecular mechanisms of adaptation to blackwater habitat among Haida Gwaii threespine stickleback to other blackwater-inhabiting fish species and therein uncover convergent evolution on vastly different time scales.

Discussion Our study reveals that threespine stickleback have adapted wavelength sensitivity through selection at the SWS2 locus. A single red-shifted SWS2 allele has been favored across the adaptive radiation and blackwater lakes, most of which were colonized in replicate from marine ancestors and are almost exclusively inhabited by individuals with this red-shifted SWS2 allele. The evolution of a red-shifted SWS2 opsin thus likely facilitated the colonization of blackwater lakes and subsequent establishment in this extreme habitat. Tannin-stained blackwater is characterized by a red-shifted light spectrum with reduced transmission of short wavelengths [18]. A blue-sensitive opsin spectrally tuned to a longer wavelength will thus increase an individuals’ ability to detect any residual short wavelength light in blackwater. Such an adaptation mechanism is supported by genomic signatures of selection, by functional effect predictions and by genotype-environment associations across the adaptive radiation and in the selection experiment. Also, previously observed phenotypic differences [18] support this mechanism: cones expressing SWS2 in blackwater stickleback from some of these and other populations had an absorption spectrum red-shifted by ~10 nm [18], a stronger shift than is explainable by alternate chromophore use. The combined results from our study and Flamarique et al. [18] thus show that threespine stickleback adapted visual perception to blackwater habitats by spectral tuning of SWS2 key sites, causing a higher sensitivity to the remaining short wavelength light, and increased expression of LWS in double cones, maximizing the sensitivity to background light. Remarkably, the molecular mechanism underlying this recent adaptation in threespine stickleback recapitulates the duplication and divergence of SWS2 around 198 million years ago in the spiny-rayed fish ancestor [22, 53]. Amino acid replacements at SWS2 key sites are identical and thus convergent between the threespine stickleback SWS2 alleles and the SWS2A and SWS2B paralogs, respectively [18, 47, 53]. Such convergent spectral tuning at key sites of cone opsins has been found previously but exclusively at larger evolutionary timescales, such as between damselfish species [40], between butterflies and vertebrates [41], butterflies and bees [42], humans and poeciliid fish [31], or across the animal kingdom [43]. Convergent spectral tuning between such vastly different time scales—on one side, a microevolutionary, intraspecific level and on the other side, a 198 million-y-old duplication-divergence process—has not yet been shown to our knowledge. Not only the mechanism of spectral tuning at SWS2 is convergent, but also the environmental context: two other fish species inhabiting both tannin-stained blackwater and clearwater habitats, bluefin killifish and black bream Acanthopagrus butcheri, show expression divergence between the red-shifted SWS2A and blue-shifted SWS2B paralogs [24, 26, 37, 61]. Bluefin killifish populations occur either in blackwater or clearwater habitats [24, 37], and black bream live in clearwater as juveniles and migrate to blackwater habitats where they spend their adult life [26, 61]. In both species, the red-shifted SWS2A cone abundance is higher in blackwater habitats and the blue-shifted SWS2B cone abundance is higher in clearwater habitats, which is due to reduced SWS2B expression and an increased SWS2A expression relative to SWS2B, respectively [24, 61]. While the two key amino acids at sites 97 and 109 each are convergent between bluefin killifish paralogs and stickleback alleles living in either blackwater or clearwater (Fig 8, [37, 47]), black bream has substituted these with other amino acids but still shows the same function for the two paralogs (red- and blue-shift) [26] and thus ecological, phenotypic, and functional convergence. Adaptation to blackwater environment via SWS2 spectral tuning and therein improved visual capacities can have a multitude of consequences for survival and reproduction. The light environment in blackwater lakes is limited to downwelling, red-shifted light and thus visual detection of predators and prey is much reduced, leading to short action and reaction distances. Any improved detection of prey or predators, for example, via increased sensitivity to color contrast at residual short wavelengths, would be favored by natural selection. Bluefin killifish from blackwater environments indeed showed increased color contrast attention towards blue objects in blackwater [62]. Increased color contrast attention might also be favored by sexual selection: the “blue morph” in bluefin killifish is more abundant in blackwater, and blue males are preferred by individuals raised in stained water [63, 64]. Similarly, stickleback inhabiting blackwater systems have lost red nuptial throat color and instead show black throats, contrasting with the background and with blue eyes, and these two traits are under sexual selection by choosy females [17, 19]. Spectral tuning of blue-sensitive SWS2 may thus be under both natural and sexual selection in threespine stickleback and other blackwater-adapted fish species. Our selection experiment confirmed that the segregating SWS2 alleles are favored by selection in different light environments: after only 13 generations in a clearwater habitat, the red-shifted SWS2 allele associated with the blackwater sweep haplotype decreased in frequency while the alternate blue-shifted allele swept to high frequency, indicating a reverse, ongoing sweep in clearwater habitats. The direction of change is in line with both functional predictions and genotype–environment association across the adaptive radiation. The evolutionary change of 1.12 haldanes estimated from this allele-frequency shift is much larger than the change in feeding morphology or predator defense morphology traits, which show a mean change of 0.22 haldanes over 12 generations [48]. This could arise from inherent differences between genotype- and phenotype-based estimates, such as the increased variation due to a complex genetic basis and environmental effects in phenotypic estimates [65]. Change in allele frequency at SWS2 by 27% is comparable to the strongest relative changes in trait means: gill raker length was reduced by 43%, lateral plate three height by 18%, and lateral plate two frequency and dorsal spine length by 16% [48]. Selection on color vision thus led to similarly rapid or slightly faster evolutionary change compared to other, feeding and predator defense–related traits; and genetic and phenotypic change was in the direction predicted from independently evolved populations across the adaptive radiation, recapitulating the same habitat contrast. Repeated use of the same red-shifted SWS2 haplotype during replicated adaptation to blackwater lakes in Haida Gwaii suggest that these adaptive mutations have been present as standing genetic variation in the marine population prior to colonization. Indeed, the marine individual in our dataset is heterozygous for all SWS2 amino acid polymorphisms, confirming the presence of both red-shifted sweep and blue-shifted nonsweep haplotypes in a marine population (Fig 2). Also, freshwater populations outside Haida Gwaii, such as one individual from mainland British Columbia in our study (Table 1, Figs 5 and 7) and the reference genome, a female freshwater stickleback from Alaska, carry the same red-shifted SWS2 haplotype, while another mainland individual carries the blue-shifted SWS2 haplotype. Maintenance of two spectrally tuned opsin alleles as standing genetic variation might be a “microevolutionary rescue” solution to the loss of multiple functionally divergent SWS2 paralogs in the lineage leading to threespine stickleback, which could explain convergent evolution with the ancestral SWS2 paralogs. To maintain such divergent SWS2 alleles, more complex selection scenarios than selection in blackwater might be necessary, including disruptive or fluctuating selection or selection for a red-shifted allele in other freshwater habitats than blackwater lakes. Visual spectra in freshwater habitats rapidly change with depth, dissolved organic particles, and other biophysical properties, making more complex scenarios likely. Further study of cone opsin variation in additional freshwater and marine populations, taking ecological knowledge of visual environments into account, may provide better insight into the maintenance of variation and repeatability of adaptation to light spectra. By combining population genomic data, functional genomic analysis and a selection experiment, we have uncovered the genetic mechanisms underlying repeated adaptation of color vision to divergent visual environments in threespine stickleback. This mechanism of adaptation is convergent at the molecular, functional, and ecological level with other fish species that have used 198 million-y-old paralogs to adapt to similar blackwater environments. Convergent evolution at the same gene happened thus at vastly different timescales, involving two mechanisms: repeated de novo mutation, leading to convergent amino acid changes, and the reuse of standing genetic variation for repeated adaptation in an adaptive radiation. Our study thus supports the emerging view that mechanisms underlying adaptive evolution are often highly repeatable and likely predictable [4] and that evolutionary tinkering with the same, constrained toolset can lead to convergent adaptation, both within species and between distantly related groups.

Materials and methods Ethics statement Stickleback collection followed guidelines for scientific fish collection in British Columbia, Canada, under Ministry of Environment permits SM09-51584 and SM10-62059. Collections in Naikoon Provincial Park and Drizzle Lake Ecological Reserve were carried out under park use permits: 103171, 103172, 104795, and 104796. Sampling, sequencing, and variant calling Among the more than 100 stickleback populations previously studied from the Haida Gwaii archipelago [16], a subset of 25 populations was chosen to comprise the full range of biophysical attributes of the freshwater habitats on Haida Gwaii, including water spectra, lake area, bathymetry, and predation regime. Stickleback from these 25 freshwater sites, two freshwater sites in coastal British Columbia [66], and one marine site were collected between 1993 and 2012, using minnow traps or recovery from salmon stomachs (marine sample, Table 1, for coordinates, see [16, 66]; coordinates BKW2: 53.375089°N, −130.177378°W). We selected one to four individuals per population for whole-genome resequencing. From the selection experiment [48], we chose 12 fish from the source population Mayer Lake and 11 from the population introduced into Roadside Pond (equivalent to “Mayer Pond” in [48]), the latter sampled in 2012, 19 y or approximately 13 generations after the release of 100 Mayer Lake fish, assuming a generation time of 1.5 y being intermediate between Mayer Lake (2 y generation time) and Roadside Pond (1 y generation time). In total, 58 individuals, 56 females and two males, were resequenced to 6x depth using paired-end Illumina reads as described in [6] at the Broad Institute. We aligned reads to the Broad S1 reference [6] using BWA v0.5.9 [67] with parameters -q 5 -l 32 -k 2 -o 1 and recalibrated base qualities using the GATK v1.4 tools CountCovariates and TableRecalibration [68], with read group, quality score, cycle, and dinucleotide covariates in the recalibration model. This resulted in 2,992,040,331 aligned and recalibrated reads. Variants were called using GATK’s UnifiedGenotyper for each chromosome separately and all 58 individuals combined, with default parameters for SNP and indel calling, respectively. We removed variants with quality normalized by depth ≥ 2, read position rank sum test value ≥ −20, and allele-specific strand bias ≤ 200, using GATK’s VariantFiltration from the dataset. We recalibrated variants using the GATK’s VariantRecalibrator and ApplyRecalibration with a VQSR-LOD cutoff of 98.5%. Filtered and recalibrated variants were lifted over to an improved ordering of scaffolds in the reference stickleback genome [51] using Picard v2.2.1 (http://broadinstitute.github.io/picard). Also, we realigned base quality recalibrated reads to this improved reference using samtools v1.3 [69] and BWA v0.7.12 with the same alignment parameters as above, in order to enable read-backed phasing and read-based genotype likelihood–based analyses (see below), resulting in 2,935,821,595 aligned reads, which have been deposited on the NCBI short read archive under accession SRP100209. We obtained a set of high-quality SNPs by removing all variants failing variant recalibration, variants with quality < 45 and with a mean depth > 9.51 (= average mean depth plus 1.5 times the interquartile range of the mean depth distribution), variants with less than four reads of each allele, variants with more than two alleles, and indels, using bcftools v1.3.1 [69]. This dataset was partitioned by chromosome, and males (individuals in populations Banks 70, Laurel) were removed from the sex chromosome XIX partition. SNPs were further split into an “adaptive radiation” and “selection experiment” SNPs partition. The “adaptive radiation” SNP partition contained one randomly picked individual from each of the 28 populations except the transplant population Roadside Pond (Table 1) in order to perform further analyses with equal sample size for all natural populations. The “selection experiment” SNP partition contained all 12 and 11 individuals from Mayer Lake and Roadside Pond, respectively. In both adaptive radiation and selection experiment SNP datasets, genotypes with less than four reads and sites with more than 50% missing genotypes were removed using vcftools v0.1.15 [70], resulting in 15.3% and 16.2% missing genotypes, respectively. Both the adaptive radiation and selection experiment SNP datasets were phased and missing genotypes imputed with the read-backed phasing algorithm implemented in SHAPEIT v2.r790 [71]. Phase-informative reads covered 9.3% of all heterozygote genotypes and 32.7% of all graph segments. Population genomic analyses We scanned the genomic regions containing the four cone opsins for signatures of selective sweeps by using variation across the whole genome to identify outlier regions. We computed two haplotype-based statistics, integrated haplotype score iHS and H12 [49, 50], for the phased adaptive radiation SNPs. These statistics have been developed to detect signatures of incomplete hard and soft selective sweeps, based on extended haplotype homozygosity around an allele under selection compared to its alternate allele (iHS, [49]) or based on the haplotype frequency spectrum expected under a selective sweep (H12, [50]). Applied to the adaptive radiation dataset, these statistics will capture selective sweeps shared by multiple members of the adaptive radiation. iHS for each SNP in the genome was computed in selscan v1.1.0b [72] with default parameters and standardized in 5% allele frequency bins. In addition, we calculated the percentage of absolute iHS values > 2 in nonoverlapping 10 kb windows with more than 10 iHS estimates [49]. H12 was computed in bins spanning 81 SNPs using scripts published alongside the definition of H12 [50]. We used an outlier approach to identify significant iHS and H12 regions. We identified the top 0.1% genome-wide outliers for SNP-iHS, window-iHS, and H12 in recombination rate bins (<0.5, 0.5–2, 2–3.5, 3.5–5, >5 cM/Mb) because of the sensitivity of these statistics to variation in recombination rate. Local recombination rates were estimated from the “FTC x LITC”-cross recombination map published in [51] with a cubic splines smoothing approach described in [73]. For a significant outlier region indicating a selective sweep centered on opsins SWS2 and LWS, we used the top H12 estimates to identify the haplotype under selection or “sweep haplotype.” We visualized haplotype structure around a selective sweep in both adaptive radiation and selection experiment datasets using the extended haplotype homozygosity (EHH) statistic [74] calculated in selscan with default parameters. We further traced the evolution of the selective sweep region around SWS2 and LWS in the selection experiment. For this, we computed population differentiation (F ST ) between Mayer Lake and Roadside Pond as well as nucleotide diversity (π) and Tajima’s D (T D ) in each population across the genome and linkage disequilibrium (r2) in the selective sweep region for the unphased selection experiment SNPs. We first estimated the folded two-dimensional site-frequency spectrum (2D-SFS) from genotype likelihoods at all sites, from aligned reads with mapping-quality ≥ 17 and bases with quality ≥ 17 using angsd v0.911 [75, 76]. Using this 2D-SFS, we computed π and T D in 10 kb nonoverlapping as well as 10 kb wide, 2 kb step sliding windows across the genome for each population using angsd [76]. Then we estimated population allele frequencies with angsd and used them with the 2D-SFS to compute F ST in 10 kb nonoverlapping as well as 10 kb wide, 2 kb step sliding windows across the genome using realSFS from the angsd software package [76, 77]. As for iHS and H12, we identified the top 0.1% outliers among nonoverlapping windows, based on the genome-wide distribution of π and T D in recombination rate bins. We computed linkage disequilibrium (r2) across the selective sweep region for SNPs with minor allele frequency (MAF) ≥ 5% and maximum 20% missing data in both populations using vcftools. We identified synonymous- and nonsynonymous variation in the coding sequence of the four cone opsins SWS1, SWS2, RH2, and LWS in both the adaptive radiation and selection experiment SNPs. For the adaptive radiation SNPs, we tested whether coding variation at cone opsins was associated with the sweep haplotype identified above by using both alleles at each nonsynonymous SNP and chi-square tests with Bonferroni-corrected p-values. We also estimated the mean ratio of pairwise sequence divergence at synonymous and nonsynonymous sites (mean pairwise dN/dS) for all pairs of haplotypes in the adaptive radiation dataset using PAML v4.8 [78] and following the approach by Yang and Nielsen [79]. This statistic measures the relative frequency of segregating amino acid polymorphism to silent mutations [80]. We assessed whether any of the four cone opsins showed an unusually high frequency of amino acid changes by computing the distribution of mean pairwise dN/dS for all functional amino acid–coding genes on assembled chromosomes in our dataset (n = 17,846 genes). Genotype–environment association We tested whether genetic variation at the four cone opsins was associated with variation in light spectrum, using three environmental proxies of light spectrum, percent light transmission at 400 nm (T400), lake depth in meters, and log-transformed lake area in square meters. We assigned SNPs to up- and downstream regulatory regions, introns, exons, and 3′/5′-untranslated regions of each cone opsin gene using SnpEff v4.2 [81] and combined all SNP alleles per gene into a single multidimensional scaling (MDS) coordinate in R v3.3.1 [82]. We used the MDS coordinate as a response variable in a general linear model with three predictor variables: T400, lake depth, and log-transformed lake area. To test more specifically for an association with blackwater environment, we repeated the general linear model analysis with a categorical light transmission variable “clearwater” for lakes with T400 > 74% and “blackwater” with T400 ≤ 74%, following [16]. Significance of effects was determined after Bonferroni-adjustment for multiple testing. We qualitatively assessed which SNPs are most strongly correlated with blackwater habitat from single-SNP chi-square tests for each gene-associated SNP. For the selection experiment populations Mayer Lake and Roadside Pond, we estimated allele frequencies based on genotype likelihoods with angsd v0.911 [75] using raw-aligned reads of mapping quality > 17 for sites with quality > 17 and the GATK genotype likelihood model [68]. We computed allele-frequency changes for all variable sites in the genome and the empirical quantiles for absolute allele-frequency changes at SNPs surrounding SWS2 in Mayer Lake–based MAF bins of width 0.05. We also computed evolutionary change in haldanes at SWS2 key sites following equation 1 in [65], with raw allele frequency mean, standard deviations, and a generation time of 12.7 as input. Furthermore, we estimated the expected selection coefficient under a pure selection model, following equation 3.2 in [83], assuming incomplete dominance h = 0.5 and using a per generation allele-frequency change by dividing the observed allele-frequency change by 12.7 generations. Evolutionary history of SWS2 We reconstructed the evolutionary history of the cone opsin associated with a selective sweep, SWS2, using a Bayesian phylogenetic approach implemented in MrBayes v3.2.6 [84] and the same evolutionary model and run parameters as in [47]. For phylogenetic reconstruction, we used the two most common threespine stickleback SWS2 haplotypes, one associated and the other not associated with the sweep haplotype; an SWS2 sequence of blackspotted stickleback ([85], SRA-accession DRR013347); and both SWS2A and SWS2B paralogs from shorthorn sculpin ([47], genbank accession KM978046.1), medaka ([56], genbank accessions AB223056.1, AB223057.1), and bluefin killifish ([53], genbank accessions AY296737.2, AY296736.1). To get the blackspotted stickleback SWS2 sequence, we aligned whole-genome sequence data with a subset of the threespine stickleback reference sequence spanning the SWS2 coding sequence and 1 kb sequence up- and downstream of the start and stop codon, respectively, using bowtie2 v2.2.3 [86] with parameters -N 1 and -L 20, called genotypes using the GATK v3.5 tool UnifiedGenotyper [68] with default parameters and genotype likelihood model “BOTH,” converted the variants to FASTA format using the GATK-tool FastaAlternateReferenceMaker, and reintroduced missing sites using bedtools v2.25.0 [87]. We then aligned threespine stickleback and blackspotted stickleback SWS2 with the SWS2A and SWS2B paralogs of the other species manually in BioEdit v7.2.5 [88] and clipped the alignment to codons present in all sequences. We created the same alignment of codons for all 116 phased threespine stickleback haplotypes in our dataset to compute a neighbor-joining network using standard parameters in PopART v1.7 (http://popart.otago.ac.nz). We tested whether selection on the threespine stickleback SWS2 haplotype lead to a rapid accumulation of amino acid–changing mutations compared to synonymous mutations (“positive selection”) on this haplotype. We used the phylogeny from above, estimated branch-specific dN, dS, and dN/dS and performed a branch-site test for positive selection with a subset of the phylogeny containing only stickleback SWS2 and other species’ SWS2A paralogs [60, 89], using PAML v4.8 [78] and the threespine stickleback’s sweep-associated haplotype as the foreground branch. Following [47], we also tested for gene conversion between SWS2A and SWS2B paralogs in the threespine stickleback ancestor by calculating dS between threespine stickleback SWS2 haplotypes and shorthorn sculpin SWS2A and SWS2B paralogs in 30 bp sliding windows with 1 bp step size in DnaSP v.5.10.01 [90]. After excluding gene conversion, we assessed whether amino acid substitutions at opsin key sites in the stickleback SWS2 haplotypes paralleled the divergence of ancestral SWS2A and SWS2B paralogs. We also added a partial coding sequence of black bream SWS2A and SWS2B paralogs ([26], genbank accessions DQ354580.1, DQ354581.1) to assess potential molecular and functional convergence with two other species inhabiting both clear- and blackwater (black bream, bluefin killifish).

Acknowledgments We thank Bruce Deagle, Craig B. Lowe, Shannon D. Brady, Jason Turner, Kerstin Lindblad-Toh, and the Broad Genomics Platforms for help with sequences and samples.