In this article, we first introduce the framework for flexible, multivariate, multilocus simulation. We then validate the individual‐based simulations under Wright‐Fisher assumptions and evaluate expected results for simulations under a simple single landscape multilocus selection model. We then implement two examples of more realistic multilocus selection across complex landscapes with multiple loci and alleles linked to multiple environmental gradients. Our first example shows how multiple environmental variables can influence multilocus variation, while keeping the genotype space relatively simplistic (three loci). The second example shows how a suite of loci under different selection strengths can be simulated with varying gene flow in a landscape genetics framework, while keeping the environmental space relatively simplistic (one gradient environment).

Here, we incorporate multilocus selection into CDPOP using a linear additive model (Falconer & Mackay, 1996 ; Wade, Winther, & Goodnight, 2001 ), which enables the simulation of a broad range of genotype‐environment associations. In the context of CDPOP, we define multilocus selection as multiple loci responding to univariate or multivariate environmental variables, impacting the local fitness of individuals with a given multilocus genotype. If selection is strong and is acting on a small set of loci with large effects, this could result in pronounced differences in allele frequencies across environments. However, if selection is distributed across a large number of loci, each with weak individual effects, multilocus selection will produce more subtle signals of allele frequency differentiation across environments (Le Corre & Kremer, 2003 ). These weaker signatures of selection, indicative of processes such as selection on standing genetic variation, are common patterns of natural selection in real populations (Bay et al., 2017 ; Le Corre & Kremer, 2012 ; Messer & Petrov, 2013 ; Pritchard & Di Rienzo, 2010 ) and are primarily what we sought to simulate in this new module.

While these simple models of natural selection have provided an important baseline for understanding adaptive landscape genetics, such as validation with theory (Landguth et al., 2012 ), comparing methods for detecting genotype‐environment associations (Forester et al., 2015 ; Jones et al., 2013 ), optimizing and informing conservation research (Creech et al., 2017 ; Landguth & Balkenhol, 2012 ; Landguth et al., 2017 ), and understanding the emergence of reproductive isolation in landscapes (Cushman & Landguth, 2016 ; Landguth, Johnson, & Cushman, 2015 ), the module cannot model more than two loci under selection. More flexible models linked to theory are needed to better represent complex genetic variation in real systems. For example, adaptive traits often have a complex genetic basis that interacts with selection strength, gene flow, drift, and mutation rate (Yeaman & Whitlock, 2011 ) in a multivariate environmental context; however, our ability to simulate these processes across many adaptive and neutral loci (e.g., hundreds to thousands) in a landscape genetic context has lagged.

This program note introduces a multilocus selection module that has now been implemented in both CDPOP and CDMetaPOP. This module uses a linear additive model to more flexibly parameterize selection across any number of loci and environmental or landscape variables. In combination with existing features, this new module will allow users to evaluate the contributions and interactions between gene flow, demography, and selection‐driven processes across complex, multivariate environmental and landscape conditions.

Our landscape genetic simulation framework takes two current forms: CDPOP (Landguth & Cushman, 2010 ) and CDMetaPOP (Landguth et al., 2017 ). Both programs are individual‐based, spatially‐explicit, landscape demographic and genetic (‘demogenetic’) forward‐in‐time simulators. They simulate changes in neutral and/or selection‐driven genotypes through time as a function of individual‐based movement, complex spatial population dynamics, and multiple and changing landscape drivers modelled across continuous space. CDMetaPOP was a branch from CDPOP and initially developed for species living in dynamic riverine landscapes (‘riverscapes’). However, CDMetaPOP can simulate a range of spatially‐explicit processes that describe metapopulation theory across land, river, and seascapes, as well as accommodate simulations involving up to hundreds of thousands of individuals making it more computationally efficient than CDPOP. Both programs can be parameterized with empirical genetic, demographic, environmental, and/or spatial data sets.

The programs CDPOP (Landguth & Cushman, 2010 ) and CDMetaPOP (Landguth, Holden, Mahalovich, & Cushman, 2017 ) are tools for individual‐based simulation of mating, dispersal, and selection as functions of user‐specified landscape resistance and selection processes (see Box 1 for a detailed comparison of the two programs). A module introduced in 2012 (Landguth, Cushman, & Johnson, 2012 ) enabled CDPOP to model natural selection based on spatial environmental fitness surfaces for one or two diallellic loci under directional selection (i.e., Gavrilets, 2004 ; Wright, 1932 ). That version of CDPOP required a spatial environmental fitness surface for each genotype in the one‐ or two‐locus selection model. For example, in the single diallelic locus model, three relative fitness surfaces would be specified for the three genotypes ( AA , Aa , and aa ) from the two alleles, A and a , whereas nine surfaces would be required for the two‐locus model. Selection in this module is implemented through differential survival of offspring as a function of the relative fitness of the offspring's genotype at the location where the dispersing individual settled.

2 MATERIALS AND METHODS

2.1 Simulation program The spatial module of multilocus selection is built upon the existing framework of the individual‐based landscape genetics program, CDPOP (Landguth & Cushman, 2010). CDPOP simulates genetic exchange and population dynamics for spatially referenced individuals on a resistance surface, where mating and dispersal events are a probabilistic function of effective or ecological distance between locations. As mentioned previously, past versions of CDPOP (e.g., Landguth et al., 2012) modelled natural selection via spatial environmental fitness surfaces for either one or two diallelic loci under directional selection (Gavrilets, 2004; Landguth et al., 2012; Wright, 1932), where the genotype‐dependent viability of offspring was a function of their location on the fitness landscape. Here, we extend this approach to include options for simulating multilocus adaptive variation (i.e., more than two loci under selection) and multivariate environmental selection. Furthermore, this module can also be implemented within the branched program, CDMetaPOP (see Box 1; Landguth et al., 2017).

2.2 Modelling multilocus selection association with environmental gradients 1996 2001 F*) of an individual with a given multilocus genotype in an additive manner described by Equation (1) This new module incorporates multilocus selection from linear regression models (Falconer & Mackay,; Wade et al.,), enabling the extension of landscape genomics analyses to explicitly and fully investigate adaptive evolution in complex landscapes. As with previous versions of CDPOP, the user specifies the genotype for each individual at the initial time step (i.e., number of loci and number of starting maximum alleles per locus). Now, the user also has the option of choosing any number of loci and alleles under selection, as well as any number of environmental variables that affect fitness for each allele. In this regression model, alleles at multiple loci associated with multiple environmental variables affect the local fitness () of an individual with a given multilocus genotype in an additive manner described by Equation 1 The first term, b 0 , provides the intercept of the linear model. The summation terms refer to the n environmental variables, and a number of alleles considered at l loci in an individual. Finally, for a given value of an environmental variable (X i ), we calculate the b ijk effects of alleles A jk , on fitness. Since statistical linkage will be generated in additive models whenever two or more loci experience simultaneous selection, we do not include a separate term for linkage disequilibrium here (Wade et al., 2001). Physical linkage is not included. A fitness value, F, between 0 and 1 is obtained by rescaling Equation 1 by , where and are the maximum and minimum, respectively, calculated from Equation 1 for all possible genotype‐by‐environment combinations. Rescaling the lowest fitness to 0 ensures there are no negative fitness values. and are calculated before simulations begin on the entire genotype‐by‐environmental space given the user defined b ijk and X i . Within the simulation workflow, CDPOP implements selection through differential survival (1 − F) of an offspring given the relative fitness from Equation 1 at the location on the landscape where the dispersing individual settles. We provide a spreadsheet with CDPOP (betaFile_General.xlsx) that allows users to investigate the fitness impact of beta values in a simple two‐locus, two‐allele model with one or two environmental variables. Users can also specify the genetic basis of local adaptation by modifying the environmental variables to reflect antagonistic pleiotropy (alternate alleles favoured in different environments) or conditional neutrality (alleles favoured in one environment but neutral in another; Anderson, Lee, Rushworth, Colautti, & Mitchell‐Olds, 2013; Yoder & Tiffin, 2017). Because values for the environmental variables are spatially‐explicit and can have very different scales of variability, we require that a standardization (z‐score) is performed for each environmental variable (e.g., elevation, precipitation, land‐use categories).

2.3 Validation with global differential reproductive success Validation of the new multilocus selection module required simulations of allele frequency change compared with the theoretical allele frequency change shown by Wright (1935) for both the single and double diallelic locus selection models. The theoretical equations and derivations can be found in Appendix 1 (i.e., Landguth et al., 2012) and are referred to herein as the Wright‐Fisher model. For the single diallelic locus selection model, we used Equation 1, , and set the average effects, b 111 = 10 and b 112 = −10. X 1 was a uniform spatial selection surface (i.e., Wright, 1935 assumption) with all values of 1. Thus, the genotypes AA, Aa, and aa had relative fitness values of 1.0, 0.5, and 0.0, respectively. For the double diallelic locus selection model, we used Equation 1, , and set b 111 = 10, b 112 = −10, b 121 = 10, and b 122 = −10 with the uniform spatial selection surface, X 1 , having all values of 1. Thus, the nine genotypes had relative fitness values of 1.0, 0.5, and 0.0 for groupings (AABB, AABb, AaBB), (AAbb, AaBb, aaBB), and (Aabb, aaBb, aabb), respectively. For both the single and double diallelic locus selection model simulations, the Wright‐Fisher model was assumed (i.e., random mating, sexual reproduction with both female and male with replacement, offspring randomly disperse until a constant population is reached that has an equal sex ratio, no mutation, and nonoverlapping generations) with one exception: each mated pair produced two offspring to ensure a constant population size. We simulated individual genetic exchange across 100 nonoverlapping generations among 1,000 randomly spatially located individuals in a 1,024 × 1,024 gridded landscape for each selection model. All simulated populations contained an additional 50 diallelic neutral loci. The change in allele frequency for p 1 , the allele frequency for A, was produced to compare with the results for the theoretical change in allele frequency via the Wright‐Fisher selection models (Appendix 1). We ran 50 Monte Carlo replicates to assess variability in p 1 for each simulation.

2.4 Expectations with spatially‐variable differential reproductive success as a function of a single environmental variable Our next task was to determine how spatially‐variable selection implemented in the new simulation framework would affect spatial patterns of allele frequency. By removing a Wright‐Fisher assumption, we stepped away from true theoretical validation as described in the previous section and moved into assessing whether the simulations met expectations. First, we conducted the single diallelic selection model as shown previously with the same simulation parameters, but replaced the uniform spatial selection surface with a spatially‐variable selection surface. Using a 1,024 × 1,024 gridded raster, we created a categorical landscape that included an upper triangle with values of 1, a lower triangle with values of −1, and diagonal cells with a value of 0 (Figure 1a). Using Equation 1, , we now set the average effects, b 111 = 10 and b 112 = −10. Thus, individuals with the genotype AA would have a relative fitness values of 1.0 in the upper triangle area and 0.0 in the lower triangle area. Conversely, genotype aa would have relative fitness values of 0.0 in the upper triangle and 1.0 in the lower triangle. Individuals with the genotype Aa would have a relative fitness value of 0.5 regardless of where they settled on this landscape. To evaluate this spatial‐selection simulation scenario, we ran three simulations which varied how we initialized the genotypes by starting the simulations with (i) only AA, (ii) only aa, and (iii) random assignment. At the end of the simulations, we plotted the distribution of each genotype (AA, Aa, and aa) on the landscape. We therefore would expect simulations to produce (i) only AA individuals in the upper triangle, (ii) only aa individuals in the lower triangle, and (iii) Aa individuals occurring anywhere on the landscape, with occurrences of AA and aa in only the upper and lower triangle, respectively. Figure 1 Open in figure viewer PowerPoint Spatial selection landscapes with 1,000 simulated individuals. Values range from 1 to −1 represented by white to black, respectively. (a) Diagonal; (b) gradient and (c) habitat

2.5 Expectations with spatially‐variable differential reproductive success as a function of multiple environmental variables Our next set of simulations included selection on multiple loci and alleles as a function of multiple environmental gradients, and with restricted isolation‐by‐distance dispersal. Here, we considered three environmental variables that affect fitness as shown in Figure 1, with three loci and two alleles per locus operating in the selection process. Our first environmental variable was the previously described categorical landscape (Figure 1a). The second environmental variable was a gradient landscape with continuous values ranging from 1 to −1 from the North‐South (Figure 1b). The third environmental variable represented a fragmented landscape with equal proportion of values for one (e.g., favoured habitat) and −1 (e.g., nonfavoured habitat) created in the program QRULE (Gardner, 1999; H = 0.5 and p = .5; Figure 1c). Expanding Equation 1 to consider this example for three spatially variable selection gradients, three loci, and two alleles per locus requires 18 average effect sizes to be included (or three environmental variables * three loci * two alleles = 18 effect sizes), producing 27 possible genotypes (or [(a + 2 − 1)!/(2! * (a − 1)!]l where a is the possible alleles per locus [2 in this example] and l is the possible number of loci [three in this example]). For simplicity, we set b 111 = 10 and b 112 = −10 for the first locus and environmental variable, X 1 , b 221 = 10 and b 222 = −10 for the second locus and environmental variable, X 2 , and b 331 = 10 and b 332 = −10 for the third locus and environmental variable, X 3 , where X 1 , X 2 , and X 3 are the diagonal, gradient, and habitat variables shown in Figure 1, respectively. The remaining betas were set to 0. We plotted the genotype‐environment‐fitness relationship for this spatially complex example in which three loci are associated with three environmental variables (Figure 2). For illustrative purposes and because we cannot visualize the full dimensional space, we held the first environmental variable constant at X 1 = 1 and third environmental variable constant at X 3 = 1. We allowed the second environmental variable to vary across its continuous space, X 2 = [−1, 1]. Equation 1 is calculated for each possible genotype by environment combination given the betas specified above and rescaled based on and (60 and −60, respectively) to achieve values between 0 and 1. Thus, using these simplified effect sizes, the genotypes that have the first allele present in each locus should be favored in areas that have values of 1 (i.e., upper triangle, towards north in the gradient landscape, and habitat patches). Unlike the previous panmictic movement simulations, we restricted movement of the individuals in these simulations to follow an inverse‐square probability function constrained to a 25% maximum threshold of the entire landscape. We initialized all genotypes randomly at the start of the simulations. To evaluate the results of this simulation, we plotted individuals on the landscape coded by their respective copies of the first allele at each of the three loci under selection. We therefore would expect to see individuals with homozygous copies of each locus to be more successful in landscapes patterned towards “1”: upper triangle, towards the North, and in habitat patches. Figure 2 Open in figure viewer PowerPoint X 1 = 1 (diagonal landscape in Figure X 3 = 1 (habitat map in Figure X 2 = [−1, 1] (gradient landscape in Figure b 111 = 10, b 112 = −10, b 221 = 10, b 222 = −10, b 331 = 10, b 332 = −10, and 0 otherwise) and rescaled based on = 60 and = −60 to achieve fitness values between 0 and 1 Genotype‐environment‐fitness landscape for the spatially complex example in which three loci are associated with three environments. For illustrated purposes, we held the first environmental variable constant at= 1 (diagonal landscape in Figure 1 a) and third environmental variable constant at= 1 (habitat map in Figure 1 c). We allowed the second environment to vary across its continuous space= [−1, 1] (gradient landscape in Figure 1 b). Equation 1 is calculated for each possible genotype‐by‐environment combination given the betas specified (= 10,= −10,= 10,= −10,= 10,= −10, and 0 otherwise) and rescaled based on= 60 and= −60 to achieve fitness values between 0 and 1