Stabilising selection without cooperative binding

We consider a pair of transcription factors, labelled 1 and 2, that have K 1 and K 2 targets, respectively. A fraction β of the binding sites are at shared target genes, so that the number of binding sites at genes that are co-regulated by the pair is β(K 1 + K 2 ), as illustrated in Figure 1. Loss of function mutations occur at binding sites at a rate u l , and back mutations, which result in a functional binding site being gained at a target, occur at rate u g . An individual incurs a fitness penalty s, where 0≤s < 1, for each non-functional binding site, and fitness is assumed to be multiplicative across loci. Therefore the fitness of an individual that lacks i≤K 1 + K 2 of its required binding sites is w i =(1−s)i. The fitness landscape associated with our model thus has a single peak at i=0; and for each transcription factor binding site that is lost, fitness is reduced by an additional factor (1−s). Empirical estimates of the strength of selection on transcription factor binding sites suggest that typically Ns∼10 [18], suggesting that s is small. We assume that s is the same for all binding sites, an assumption which we relax in the Methods section.

We consider a population of N asexual individuals. The evolution of the population can be described by keeping track of the relative abundances of each “hamming class” [19–21]. Hamming class i corresponds to those individuals who currently lack i transcription factor binding sites. We denote the frequency of individuals in hamming class i by x i . In an infinitely large population, the evolution of hamming class i is then described by the differential equations [20, 21]

x ̇ i = ∑ j = 0 K 1 + K 2 w i w ̄ z i P ij , (1)

where w ̄ = ∑ i = 0 K 1 + K 2 w i x i , and P ij is the probability a genotype lacking j functional binding sites mutates to a genotype lacking i functional binding sites (see Methods). Previous work [19–21] has shown that at equilibrium, when rates of forward and back mutations are identical (u l =u g ), the solution to Equation 1 is a binomial distribution. In the more general case of a finite population, with u l ≠u g , we find that the equilibrium continues to be well approximated by a binomial distribution, with mean (K 1 + K 2 )a s . The term a s is the probability that a binding site will be non-functional in a randomly chosen individual at equilibrium. The probability a s depends on the strength of selection against non-functional binding sites, s, population size, N, and the rates of forward and back mutation, u l and u g (see Methods and [20, 21]).

The equilibrium distribution above describes how stabilizing selection determines the frequencies of functional binding sites in a population. The associated mean fitness for a pair of transcription factors that do not bind cooperatively is w ̄ = ( 1 − a s s ) K 1 + K 2 (see Methods), and the mean fitness contribution of each binding site is 1−a s s. We are typically concerned with the case in which u l ,u g ≪ s. In this case, when 2Ns > 1, a s can be approximated by

a s ≈ 1 2 Ns u l u l + u g + u l s

and otherwise by

a s ≈ u l u l + u g (2)

(see Methods). These equations have an intuitive interpretation: When 2Ns > 1 the first term describes the effect of genetic drift which tends to push the system towards its neutral equilibrium, a 0 =u l /(u l + u g ), and the second term describes the effect of selection. In the limit N→∞, a s equals u l /s, which is the standard result for the frequency of a deleterious allele in an infinite population under mutation-selection balance. When 2Ns < 1, evolution is nearly neutral and drift dominates, so the system is close to the neutral equilibrium a 0 .

Stabilising selection with cooperative binding

Here we modify our model to account for cooperative regulation by a pair of factors. This allows us to ask when cooperative regulation is favored by evolution. A mutation that results in cooperative binding between a pair of transcription factors has two effects on the fitness of a transcriptional circuit. For a target that is regulated by both transcription factors, we assume that cooperative binding mitigates the effects of deleterious mutations at transcription factor binding sites [7–9]. This results in a reduced fitness penalty for a mutation at the β(K 1 + K 2 ) shared targets, so that (1−s) is replaced by (1−hs) for some constant 0≤h≤1. Nonetheless, there are also (1−β)(K 1 + K 2 ) targets that are regulated by only one or the other of the transcription factors. We assume that the cooperative binding of the transcription factors causes pleiotropic mis-regulation at these targets (since the other transcription factor, which does not have a binding site at such sites, now binds to the first transcription factor through a physical interaction). This results in a fitness penalty t at each of the (1−β)(K 1 + K 2 ) targets that are not co-regulated. Fitness is again assumed to be multiplicative, so that the cost of pleiotropy associated with cooperative binding is ( 1 − t ) ( 1 − β ) ( K 1 + K 2 ) .

Provided u l ,u g ≪ 1, genes that are co-regulated and genes that are not co-regulated have equilibrium distributions described by independent binomial distributions with means a hs and a s respectively, which are approximated by Equation 2 (substituting hs for s appropriately, see Methods). We can now specify the conditions for the invasion of cooperative gene regulation. A mutation resulting in cooperative binding between a pair of factors will be favoured if the expected fitness of the mutant is greater than the equilibrium mean fitness. Using the expressions for mean fitness given above, this occurs when ( 1 − a s s ) β ( K 1 + K 2 ) < ( 1 − t ) ( 1 − β ) ( K 1 + K 2 ) ( 1 − a s hs ) β ( K 1 + K 2 ) . Assuming t,s ≪ 1, this expression can be simplified to give β > t t + s a s ( 1 − h ) . This means that, when the fraction of binding sites at shared targets, β, is greater than a threshold depending on s, h, t and a s , a mutation that results in cooperative binding can invade a population at equilibrium.

Similarly, a mutation that results in the loss of cooperative binding in a population where it is present will be favoured when ( 1 − a hs s ) β ( K 1 + K 2 ) < ( 1 − t ) ( 1 − β ) ( K 1 + K 2 ) ( 1 − a hs hs ) β ( K 1 + K 2 ) . Again assuming t,s ≪ 1, this expression can be simplified to give β < t t + s a hs ( 1 − h ) so that, when the fraction of binding sites at shared targets, β, is less than a threshold depending on s, h, t and a hs , a mutation that results in loss of cooperative binding can invade a population at equilibrium.

Since the first expression in Equation 2 is monotonically decreasing in s, and the second expression is independent of s, it is always true that a hs ≤a s , i.e populations that have cooperative binding accumulate more deleterious mutations, that result in weaker transcription factor binding sites, than populations that lack it. As a result there is a range of β for which both a population that lacks cooperative binding, and a population that has cooperative binding are not invadable by mutations that gain or remove cooperative binding respectively. In this range, the evolutionary dynamics of the system are bi-stable. In this range, we expect to find some genes that are regulated by pairs of transcription factors that act cooperatively and some that don’t.

Using the expression for a s given in Equation 2, and recalling that a 0 = u l /(u l +u g ) is the neutral equilibrium in a system dominated by drift, the threshold value of β above which selection favours a mutation causing cooperative binding in a population that lacks it, is given by

β > 2 Nt 2 Nt + a 0 ( 1 − h ) if 2 Ns > 1 t t + s a 0 ( 1 − h ) otherwise . (3)

Similarly, the threshold value of β below which selection favours a mutation resulting in loss of cooperative binding in a population that has it, is given by

β < 2 Nth 2 Nth + a 0 ( 1 − h ) if 2 Nhs > 1 t t + s a 0 ( 1 − h ) otherwise . (4)

These equations allow us to make a number of observations about the evolution of cooperative gene regulation (Figure 2, and see Methods). Beginning with Equation 3 for a population lacking cooperative binding, we see that when N and/or s is large, so that 2Ns > 1, the threshold number of shared targets βabove which cooperative binding becomes advantageous is independent of the strength of selection s (Figure 2a). However the threshold decreases as the mutation-buffering effect of cooperative binding increases (i.e. as h decreases, Figure 2b). As population size N increases, selection becomes more efficient and the threshold value of β increases (Figure 2c). Finally, the threshold also increases with the cost of pleiotropy t (Figure 2d). In contrast, when N and/or s is small, so that 2Ns < 1, drift dominates and the threshold number of shared targets βis independent of population size N (Figure 2c). However the threshold decreases with the strength of selection s (Figure 2a), because when drift dominates the number of deleterious mutations is at the neutral equilibrium, and increasing s increases the impact of each mutation on overall fitness.

Figure 2 Evolutionary parameters that permit cooperative regulation. Evolutionary parameters that permit the evolution of gene regulation by cooperative transcription factors. Threshold number of shared targets for gain (black) and loss (red) of cooperative binding to be advantageous in a population at equilibrium under stabilising selection. The black line shows the value of βabove which a new mutation that results in cooperative binding will invade in a population that lacks cooperative binding. The red line shows the value of βbelow which a mutation resulting in loss of cooperative binding will invade, in a population that has cooperative binding. For values of βthat lie in the gray region, the dynamics are bistable: a population with cooperative binding will preserve it, and one without binding will not gain binding. The threshold fraction of shared targets varies with (top left) strength of selection, s, (top right) strength of cooperativity in reducing the effects of deleterious mutations 1/h, (bottom left) the cost of pleiotropy t and (bottom right) the population size, N. Lines show our analytic equations (Equations 2 and 3), and points show the results of 105replicate Monte-Carlo simulations. Parameter values (unless stated otherwise) are u l =2×10−7, u g =10−7, K 1 + K 2 =100, s=10−3, h=10−1, t=10−4and N=104. Full size image

Similarly, from Equation 4 for a population with cooperative binding, we see that when N and/or hs is large, so that 2Nhs > 1, the threshold number of shared targets βbelow which cooperative binding becomes disadvantageous is independent of the strength of selection s (Figure 2a). As before, the threshold decreases as the mutation buffering effect of cooperative binding increases (i.e. as h decreases, Figure 2b) and the threshold increases with population size N (Figure 2c), and the cost of pleiotropy t (Figure 2d). In contrast, when N and/or hs is small, so that 2Nhs < 1, drift dominates and the threshold number of shared targets βis independent of population size N (Figure 2c), but decreases with the strength of selection s (Figure 2a). The size of the bistable region is largest when s is large and h is small, and for intermediate values of N and t, as shown in Figure 2. As this analysis demonstrates, there is a broad range of possible evolutionary outcomes and, crucially, cooperative binding can evolve under a wide range of circumstances despite the deleterious pleiotropic effects associated with physical interactions among transcription factors.

Adaptation of transcriptional circuits under positive selection

When cooperative binding is present, under stabilising selection, transcription factor binding sites at co-regulated genes are better able to tolerate mutations (i.e a hs > a s ). Under positive selection for a novel expression phenotype, this may speed adaptation, since greater mutational robustness generates greater genetic diversity and can help speed adaptation (Figure 3a) [22]. This may occur, for example, when adaptation involves change in the transcription factor that regulates a target gene [7–9, 11], through turnover of transcription factor binding sites [23–25]. We use our model to quantify the extent to which cooperative binding among transcription factors accelerates the adaptive rewiring of transcriptional circuits under positive selection.

Figure 3 A schematic cartoon of rewiring. A schematic cartoon of rewiring with (left) and without (right) cooperative binding. Selection favours a change in the regulation of target genes from the red TF to the green TF. Rewiring requires an initially deleterious mutation at the red binding site before a green binding site can be acquired. The fitness of the different states is shown on the left hand side for each case. The reduced fitness of the intermediate state is less when cooperative binding is present than when it is absent. Full size image

We study adaptive change that involves replacement of an existing transcription factor by a new one that confers higher fitness. We assume that the target gene must first suffer an initially deleterious mutation at its existing binding site before a newly adaptive binding site can be acquired (Figure 3) [8, 9, 11]. The newly adaptive binding site is produced from binding sites that have already mutated at a rate u r . The expected waiting time for such a gene to produce a newly adaptive binding site therefore depends on the number of binding sites in the population that harbor a deleterious mutation, which is proportional to a s when cooperativity is absent and a hs when it is present. Since a hs > a s , this number is greater when cooperative binding is present than when it is absent.

The ratio of waiting times before a newly adaptive binding site arises, t r ∗ /t r (for populations without, t r ∗ , or with, t r , cooperative binding), quantifies the degree to which cooperative binding of transcription factors accelerates adaptation under positive selection. This ratio is given by a hs /a s (Figure 4, see Methods). As Figure 4 shows, provided Ns > 1 (i.e. provided deleterious mutations at binding sites are not nearly neutral), rewiring of transcriptional circuits is significantly accelerated by cooperative binding among transcription factors. Thus, a population that has cooperative binding among transcription factors under stabilizing selection, can also experience an accelerated rate of adaptation.

Figure 4 Cooperative binding accelerates adaptation. Cooperative binding accelerates adaptation under positive selection. The ratio of waiting times before the arrival of novel adaptive binding sites for populations without ( t r ∗ ) and with (t r ) cooperative binding. Provided Ns > 1, cooperative binding reduces the adaptation time up to 10-fold, compared to populations that lack cooperative binding. The line shows our analytic expression, and points show the result of 105replicate Monte-Carlo simulations. Parameter values u l =2×10−7, u g =10−7, K 1 + K 2 =100, h=10−1, t=10−4, N=104, u r =10−7. Full size image

Cooperative binding and the fraction of shared targets in yeast

Our model predicts that, under stabilising selection, cooperative binding will be favoured when the fraction of targets shared by a pair of transcription factors exceeds a certain threshold. In order to test this prediction, and to get some idea of the degree of overlap that is required for cooperative binding to arise in natural systems, we inspected pairs of transcription factors in Saccharomyces cerevisiae. A total of 186 pairs are reported as participating in cooperative binding [26], based on a combination of ChIP-chip data, transcription factor knockout data, and direct experimental evidence. Using the set of genes regulated through a transcription factor binding site for a total of 204 yeast transcription factors [27, 28], we determined the fraction of overlapping targets, β, for all pairs of transcription factors (Figure 5). It is important to note that, typically, studies that systematically look for cooperative gene interactions take into account the number of targets shared by a gene pair. Therefore, to minimise the risk of circularity in our analysis, we have used separate datasets to determine cooperative gene interactions, and to determine regulatory targets. The mean fraction of overlapping targets for genes identified as participating in cooperative binding was 10-fold greater (0.21) than the mean fraction of overlapping targets at genes that do not bind cooperatively (0.02) which is highly statistically significant (p < 2×10−16, Wilcoxon test). This supports the prediction of our population-genetic analysis, and it suggests that a sizeable overlap in targets is required before cooperative binding becomes advantageous.

Figure 5 Number of shared targets. Fraction of targets that are shared between pairs transcription factors in S. cerevisiae[26–28]. (left) The fraction of targets that are shared among paris of transcription factors that lack cooperative binding and (right) the fraction of targets that are shared among transcription factors that bind cooperatively. The fraction of targets that are shared is larger among cooperative factors (p < 2×10−16, Wilcoxon test). Full size image

Cooperative binding in the yeast sex determination network

The ability of cooperative transcription factors to facilitate adaptation also has empirical support, from observations in the sex determination networks of different yeast species [7–9]. The acquisition of a protein-protein interaction between the mating factor MATα2 and Mcm1 was able to buffer the deleterious effects of mutations that strengthened Mcm1 binding sites [7]. Prior to the emergence of a protein-protein interaction, sex determining genes were activated only in the presence of Mcm1 and MATa2 together [7]. The buffering effects of the protein-protein interaction allowed Mcm1 binding sites to acquire strengthening mutations such that sex determining genes became activated by Mcm1 alone. As a result, MATa2 became redundant and was lost [7]. The result was a significant upstream reorganization of the yeast sex determination network without the need for any parallel changes to the downstream output of the network. Similar patterns, in which acquisition of cooperative binding between transcription factors is followed by changes to the regulation of their shared targets, are observed across the yeast transcriptome [8], and support the prediction of our analysis of positive selection on transcriptional networks.