Open-ended evolution on hard fitness landscapes from VCSPs

December 26, 2018 by Alexandru Strimbu

There is often interest among the public and in the media about evolution and its effects for contemporary humans. In this context, some argue that humans have stopped evolving, including persons who have a good degree of influence over the public opinion. Famous BBC Natural History Unit broadcaster David Attenborough, for example, argued a few years ago in an interview that humans are the only species who “put halt to natural selection of its own free will”. The first time I read this, I thought that it seemed plausible. The advances in medicine that we made in the last two centuries mean that almost all babies can reach adulthood and have children of their own, which appears to cancel natural selection. However, after more careful thought, I realized that these sort of arguments for the ‘end of evolution’ could not be true.

Upon more reflection, there just seem to be better arguments for open-ended evolution.

One way of seeing that we’re still evolving is by observing that we actually created a new environment, with very different struggles than the ones that we encountered in the past. This is what Adam Benton (2013) suggests in his discussion of Attenborough. Living in cities with millions of people is very different from having to survive in a prehistoric jungle, so evolutionary pressures have shifted in this new environment. Success and fitness are measured differently. The continuing pace of changes and evolution in various fields such as technology, medicine, sciences is a clear example that humans continue to evolve. Even from a physical point of view, research shows that we are now becoming taller, after the effects of the last ice age faded out (Yang et al., 2010), while our brain seems to get smaller, for various reasons with the most amusing being that we don’t need that much “central heating”. Take that Aristotle! Furthermore, the shape of our teeth and jaws changed as we changed our diet, with different populations having a different structure based on the local diet (von Cramon-Taubadel, 2011).

But we don’t even need to resort to dynamically changing selection pressures. We can argue that evolution is ongoing even in a static environment. More importantly, we can make this argument in the laboratory. Although we do have to switch from humans to a more prolific species. A good example of this would be Richard Lenski’s long-term E-coli evolution experiment (Lenski et al., 1991) which shows that evolution is still ongoing after 50000 generations in the E-coli bacteria (Wiser et al., 2013). The fitness of the E. coli keeps increasing! This certainly seems like open-ended evolution.

But how do we make theoretical sense of these experimental observations? Artem Kaznatcheev (2018) has one suggestion: ‘hard’ landscapes due to the constraints of computational complexity. He suggests that evolution can be seen as a computational problem, in which the organisms try to maximize their fitness over successive generations. This problem would still be constrained by the theory of computational complexity, which tells us that some problems are too hard to be solved in a reasonable amount of time. Unfortunately, Artem’s work is far too theoretical. This is where my third-year project at the University of Oxford comes in. I will be working together with Artem on actually simulating open-ended evolution on specific examples of hard fitness landscapes that arise from valued constraint satisfaction problems (VCSPs).

Why VCSPs? They are an elegant generalization of the weighted 2SAT problem that Artem used in his work on hard landscapes. I’ll use this blog post to introduce CSPs, VCSPs, explain how they generalize weighted 2 SAT (and thus the NK fitness landscape model), and provide a way to translate between the language of computer science and that of biology.

A constraint satisfaction problem (CSP) is a mathematical question which asks for a set of objects whose state must satisfy certain given constraints. A CSP can be represented by 3 sets: a set of variables X, a set of domain values D that can be taken by the variables and a set of constraints C. In biological terminology: X is the set of loci, D is the space of alleles, and C is the epistatic interaction between genes (alleles at loci). Each constraint takes some variables from X and defines a relation that needs to be satisfied. The number of variables taken by each constraint represents the arity of that constraint. The arity of a CSP instance is the max arity over all its constraints. This is analogous to the degree of epistasis in biology. Binary (i.e. arity 2) constraints are like pairwise epistasis, and higher arity constraints are like higher-order epistasis. We can also represent binary constraints in the form of a graph with an edge between any two genes that interaction, Artem calls this a gene-interaction network. For constraints with higher arity, we will have hypergraphs where edges can join multiple variables (See Table 1).

A classic example of a CSP is the problem of map colouring, in which the variables represent some countries, the domain values are some colours and the constraints define that the countries neighbouring each other should be coloured differently. Note that this problem is related with the 4-colour theorem which states that we only need 4 colours for a map colouring. However, the latter assumes a planar graph, while the former could be extended to a generalized graph.

A concrete example would be the problem of colouring the map of Romania. The variables are represented by Romania’s regions (T – Transylvania, MU – Muntenia, MO – Moldavia, D – Dobruja, C – Crisana, MA – Maramures, B – Banat, O – Oltenia) (X = {T, MU, MO, D, C, MA, B,O}), the domain is represented by some colours (for example D = {Red, Yellow, Blue}), and the binary constraints represent 2 regions that are neighbours, so they should have different colours (C = {C ≠ T, C ≠ MA, C ≠ B, MA ≠ T, MA ≠ MO, MO ≠ T, MO ≠ MU, MO ≠ DO, DO ≠ MU, MU ≠ T, MU ≠ O, O ≠ T, O ≠ B, B ≠ T}). The constraint graph and a possible colouring are in the figures below:





Valued constraint satisfaction problems (VCSPs) are a generalization of CSPs in which we do not require to satisfy all the constraints, but to maximize the weight of satisfied constraints. Each constraint has a weight, and when that constraint is satisfied, we get a reward equal to that weight. The objective is to find solutions that maximize this reward. In a biological setting, we think of the total reward associated with a variable assignment as the fitness of the corresponding genotype.

An example of a VCSP instance would be a version of the map colouring problem presented above. Suppose that we only have two colours to work with. This means that we do not have enough colours for a proper colouring of the entire map. In other words, the CSP instance has no solution.

Instead, we care more about some 2 regions to be coloured differently than others. In the case of the map of Romania, each constraint will also be given a weight, for example (C = {(C ≠ T, 2), (C ≠ MA, 5), (C ≠ B, 5), (MA ≠ T, 1), (MA ≠ MO, 5), (MO ≠ T, 2), (MO ≠ MU, 5), (MO ≠ DO, 4), (DO ≠ MU, 5), (MU ≠T, 1), (MU ≠ O, 5), (O ≠ T, 2), (O ≠ B, 5), (B ≠ T, 1) }). In words: we prefer adjacent regions to be coloured differently, but we care 5 times more about C and MA having different colours than we do about MA and T having different colours.

Since we only have two colours, and needed 3 to colour the map before, we might consider recolouring yellow by red in our quest for a good solution. This would give us the solution in the figure at above right with a reward of 38.

But this isn’t a particularly good colouring. Had we chosen to replace yellow by blue instead then we could achieve a reward of 40.

But this still wouldn’t be the best that we can do.

In fact, with this particular set of weight functions, a good solution has weight 41 and looks like (B – blue, C – red, MA – blue, MO – red, T – blue, MU – blue, DO – red, O – red). You can see it in the figure at above right.

Computational (VCSP) terms Evolutionary terms Variable Locus Single variable assignment Gene All variable assignment Genotype Constraints Gene interactions Constraint weight Gene interaction’s fitness contribution Constraint graph Gene interaction network Arity Degree of epistasis Total Reward Fitness

Table 1: Terminology translation between Computer Science and Biology

The reason that I want to focus on VCSPs is because they can express the reduction that Artem used when proving that the NK-model is PLS-complete. He focused on a special kind of VCSPs: weighted 2-SAT, and showed how they can be encoded within the NK-model. Thus, he showed that a weighted 2-SAT instance with variables x 1 ,…, x n , clauses C 1 , …,C m , and positive integer costs c 1 , … c m can be encoded in the NK-model by building a landscade with m + n loci,with the first m labelled b 1 , …, b m , and the next n labelled x 1 , …, x n . Kaznatcheev (2013,2018) defined the corresponding fitness effect on the locus as:

We can use similar tricks to encode VCSPs in the NK model. Although it would be easier if we generalized the NK model by not demanding that the number of constraints is equal to the number of variables.

Also, while Artem discusses hard families of weighted 2SAT instances, I will focus on specific hard instances of VCSPs and their fitness landscape. In this way, I can find out more about the complexity of evolution, especially the rate at which these populations increase in fitness and if they manage to find local peaks. Of course, to do this, I will need to build a framework for simulation which I will discuss in my next blog post. Please do come back and read more when I do!

What I foremostly want from the project will be to get to know more things about how research is done and to find if this is something that I want to do as a career. Also, I want to learn new things about evolution, how it works and how I can use what I learned so far at university to model it. Last, but not least, I hope to be able to get some interesting results from my research and that what I do will be useful for Artem and for others in this field. Thus, I aim this project in service of answering: can theoretical computer science produce open-ended evolution?

References

Benton, A. (2013). Sir David Attenborough is wrong, humans are still evolving. Filthy Monkey Men: September 16.

Kaznatcheev, A. (2013). Complexity of evolutionary equilibria in static fitness landscapes. arXiv: 1308.5094.

Kaznatcheev, A. (2018). Computational complexity is an ultimate constraint on evolution. bioRxiv: 187682.

Lenski, R.E., Rose, M.R., Simpson, S.C., Tadler, S.C. (1991). Long-term experimental evolution in Escherichia coli. I. Adaptation and divergence during 2,000 generations. The American Naturalist, 138(6): 1315-1341.

von Cramon-Taubadel, N. (2011). Global human mandibular variation reflects differences in agricultural and hunter-gatherer subsistence strategies. PNAS 108(49): 19546 – 19551.

Wiser, M. J., N. Ribeck, and R. E. Lenski. 2013. Long-term dynamics of adaptation in asexual populations. Science 342: 1364-1367.

Yang, J., Benyamin, B., McEvoy, B.P., Gordon, S., Henders, A.K., Nyholt, D.R., Pamela A Madden, P.A., Heath, A.C., Martin, N.G., Montgomery, G.W., Goddard, M.E., & Visscher, P.M. (2010). Common SNPs explain a large proportion of the heritability for human height. Nature Genetics 42:565-569.