Approaching cultural evolutionary dynamics in this model starts from the premise that cultural information is algorithmic. The term algorithm is used in a general sense to mean a procedure or recipe that consists of an input problem (which may be empty) and an organized series of steps that results in a solution (Christian and Griffiths, 2016; Mayfield, 2013). Computer programs are algorithmic because they provide a set of specific instructions for transforming a given input into an output that can then be stored. Sorting algorithms, for instance, take a list of randomly ordered elements and transform this into a new sequence based on some predefined order (such as numerical or lexicographic). Similarly, cultural information stored in recipes, grammars, and motor sequences is algorithmic; a set of mental instructions can be used to cook tomato soup, learn a language, and produce Oldowan flakes (Arthur, 2009; Charbonneau, 2015).

In this sense, cultural information exists as generative procedures inside the minds of individuals, and is manifest in populations as observable behaviors or tangible artifacts. Through a repeated process of production and learning, individual minds are causally linked across space and time to form traditions (Ferdinand, 2015; Kirby and Hurford, 2002; Morin, 2016; Sperber, 1996). Thinking of cultural information as algorithmic allows us to formulate constraints on cultural evolutionary dynamics in terms of solutions and input problems. Solutions exist as the physical manifestations of culture and input problems are the specific functional challenges.

Cultural evolutionary dynamics can therefore be modeled as a process of searching and sampling the space of both solutions and problems. If a search process is biased to find solutions that better approximate a given input problem, then we can think of this optimization process as cultural adaptation. Alternatively, if the search process seeks out novel input problems for a specific solution, then this process of repurposing solutions is a form of cultural exaptation. To capture these processes, a model is constructed in which solutions and problems are represented as binary strings of \(N\)-length. Modeling solutions and problems in this way affords (potentially) unbounded searches over solution and problem spaces.

Cultural adaptation

Input problems are a key constraint on cultural evolutionary dynamics via cultural adaptation: a search optimization process over the space of solutions that results in an improved fit between a solution and an input problem. Limitations on the design of a solution exist in the form of functional constraints, i.e., how well adapted a solution is at solving a problem. This refers to the specification of the input problem (building a solution for cutting meat) and the ways in which it constrains the possible outcomes by creating an adaptive target (useful meat cutting solutions need to induce a certain level of shear stress).

Cultural adaptation is conceptualized here as a process of improving the fit between an input problem and a solution string. The Levenshtein distance (LD) allows us to formally measure this fit:

$${\mathrm{LD}}_{s,p}(i,j)=\left\{\begin{array}{ll}{\mathrm{max}}\,(i,j)\qquad\qquad\qquad\qquad\qquad\,{\text{if}}\,\ {\mathrm{min}}\,(i,j)=0,\\ {\mathrm{min}}\left\{\begin{array}{l}{\mathrm{LD}}_{s,p}(i-1,j)+1\\ {\mathrm{LD}}_{s,p}(i,j-1)+1\\ {\mathrm{LD}}_{s,p}(i-1,j-1)+{1}_{{s}_{i}

e {b}_{j}}\end{array}\right.{\mathrm{otherwise}}.\end{array}\right.$$ (1)

where \({\mathrm{LD}}_{s,p}(i,j)\) is the distance between the \(i\)th element of solution \(s\) and the \(j\)th element of problem \(p\). As such, the Levenshtein distance between two strings tells us the minimum number of single-element edits (insertions, deletions or substitutions) required to transform one string into the other. Fewer transformations between a problem and a solution acts as a proxy for higher levels of optimization. Solution-problem mappings needing more transformations are less optimized than those with lower values. A fully optimized solution therefore corresponds to \({\min }_{s,p}{\mathrm{LD}}(s,p)=0.0\) as a solution and its input problem are identical.

Cultural exaptation

Cultural exaptation was defined earlier as a process where solutions used for one problem are repurposed to solve a novel problem. It was also argued that this can be framed as a search process over the space of possible problems. One recurrent observation is that exaptation normally occurs in domains where there are functional overlaps between the original and novel input problems (Arthur, 2009; Mastrogiorgio and Gilsing, 2016). Co-opting technology in such a way also implies that repurposed solutions are to some extent optimized for solving their original input problem. In some senses, Viagra was well-designed for the purpose of inducing vasodilation; it just happened to be better suited for encouraging blood flow to certain regions as opposed to others.

Introducing a high-dimensional problem space is required if we are to simulate the process of cultural exaptation as a search process over the space of problems. The problem space here forms a connected graph consisting of all possible permutations of an \(N\)-length binary string. Connected nodes represent problems that differ from one another by a single Levenshtein distance (e.g., \(001\) is a neighbor of \(00\), \(011\), and \(0010\), but not \(111\)). Movement through this space is therefore restricted: agents only have the option of moving to a single neighboring problem at a given time-step (see Fig. 1).

Fig. 1 An illustration of movement through a small portion of the problem space. Here, an agent starts at input problem \(10\), and moves to other problems within the space (black directional lines). Gray lines and problems represent possible problems an agent could move to given their current input problem. Full size image

Structure of model

All model runs took place over \(100\) generations and every generation comprised of \(10\) time steps. A single run was initialized with a fixed population of agents (\(N=100\)) who were randomly assigned to input problems of length \(\ell (p)=2\) and provided with randomly generated starting algorithms that produced solutions in the range \(\ell (s)=[2,4]\). The agents in the model attempt to solve their current input problem by searching for possible solutions. As previously described, the success of any given fit between a solution and a problem is operationalized as the Levenshtein distance.

At each time step, an agent generates a pool of possible solutions from asocial and social sources (see Fig. 2). On the basis of the optimization strength (\(\lambda\)), as well as the current input problem, one of these solutions is then adopted and assigned to an agent’s memory as their stored solution. The exploration threshold (\(\Theta\)) interacts with the current fit between a solution and an input problem to determine whether or not an agent moves to a new problem. If the solution is well-fitted to the problem, then an agent will remain with the current input problem. Otherwise, if the solution is a poor fit for the problem, then an agent will relocate and attempt to solve a new problem. Following 10 time steps, all agents in a population die and their currently stored solution is inherited by newly created offspring agents (i.e., a 1:1 replacement rate). Crucially, this reflects the intergenerational transfer of cultural information, as inherited solutions also undergo transmission from parent-to-child, which additionally means these solutions are subject to simplicity constraints during reconstruction (see section Social transmission mechanisms).

Fig. 2 The process an individual agent performs within a single time-step. Step 1: Agents use a series of mechanisms to indirectly alter a solution via the underlying algorithm. The stored algorithm refers to the graph that currently occupies an agent’s memory (as determined by the previous time step). This stored algorithm is acted upon by three asocial mechanisms of modification, invention, and subtraction. Asocial mechanisms can only make single modifications per time-step. Transmission occurs when an agent receives an algorithm from another agent within the population. Step 2: Each of these mechanisms generate a pool of solutions by translating the algorithm into a bit string (i.e., a solution). Step 3: Which of these solutions is adopted depends on whether optimization is biased or stochastic (as determined by \(\lambda\)). If optimization is biased, an agent compares each solution to their input problem and chooses the one with the best fit (otherwise, if the choice is stochastic, then a solution is randomly chosen from the pool). Step 4: An agent uses their current solution-problem mapping to motivate whether or not they consider a novel problem (as determined by the exploration threshold \(\Theta\)). Step 5: This movement is restricted to local problems (i.e., those that differ from the current input problem by single-edit substitutions). Full size image

Topology of the problem space

In the model, input problems are procedurally generated and stored on the basis of the movements by individual agents. The topology of this problem space is decomposable into three general properties: the difficulty of specific input problems, the size of the problem space, and the interconnectedness between problems.

Differences in difficulty reflect a general observation that not all problems are equal in terms of tractability. Getting from the Earth to the Moon requires solutions that are orders of magnitude more complex than fishing for termites with a stick (unless the termites happen to be on the Moon). Difficulty, in this sense, is indirectly referencing constraints on the search process over solutions, i.e., termite fishing is easier to learn and more readily innovated than a Moon-capable rocket. For this model, longer input problems \(\ell (p)\) increase the number of permutations in the space of possible solutions, which translates into a more computationally intensive search process for finding an optimal solution. Furthermore, even in instances where there are two problems of the same length, one problem can be more predictable than the other. By containing computable regularities, predictable problems are more amenable to concise descriptions than less predictable counterparts (for fuller formal treatments, see: Cover and Thomas (2012); Li and Vitányi (2008)).

Computational constraints are also relevant for our second topological property: that the size of the problem space grows as a function of \(\ell (p)\). Enumerating all possible permutations for \(\ell (p)=4\) results in a smaller space (\(16\) problems) than when \(\ell (p)=10\) (\(1024\) problems). Whereas input problem difficulty acts as a computational constraint on the search process over solutions, the size of the problem space is a computational constraint on searching across problems: Exhaustively traversing a problem space becomes less tractable as the size increases. To illustrate, two maximally distant problems in \(\ell (p)=4\) (e.g., \(0000\) and \(1111\)) are more distant than two maximally distant problems in \(\ell (p)=8\) (e.g., \(00000000\) and \(11111111\)).

Finally, the third topological property recognizes that the relatedness between input problems introduces a source of path-dependency. As movement through this space is restricted to single edit jumps, the current input problem limits what problems will be considered in the immediate future. For instance, a problem of \(0100\) is nearer to \(0101\) than \(0111\) in terms of the minimal number of substitutions required to transform one problem into another. Movement between input problems of different lengths is additionally constrained by a fixed probability. In particular, movement to a longer problem has a fixed probability of \(P({\mathrm{Longer}})\,=\,0.3\), which can be thought of as a cost on unconstrained movement towards increasingly longer input problems. Without this cost, movement through the problem space would be heavily biased towards longer input problems (because there are generally more longer input problems than problems of the same or shorter length).

Representing solutions

Solutions in this model represent technological artifacts and are generated using directed graphs (for a similar approach, see Enquist et al. (2011)). This approximates two features of technological solutions: the cultural artifact (a bit string) and the underlying algorithm (a graph). Graphs were initially constrained so that agents start with solutions of lengths \(\ell (s)=[2,3,4]\). Formally, a graph \(G\) consists of a triple \((V,E,\Omega )\) where \(V\) is the set of nodes \(v\in V\), \(E\) is the set of edges \(e\in E\), and \(\Omega\) is a function mapping every edge to an ordered set of values \(\Omega :E\to {\mathbb{N}}\). Each node comprises of a value in the interval \([0,1]\) and each edge is assigned a bit of either \(0\) or \(1\). A single bit is derived from the average of two nodes that are connected via an edge and rounded the nearest integer. As edges are directed, any node can connect to another node within \(V\). Two exceptions are no loops (i.e., nodes that connect to themselves) and no duplicate edges (i.e., a directed connection can only exist once). \(\Omega\) arranges the set of edges to produce the bit string (the solution) and is determined by an ordinal value (Fig. 3).

Agents have access to two general processes for generating solutions: asocially (via mechanisms of innovation) and socially (via within group transmission).

Asocial generative mechanisms

Generating solutions refers to the introduction of novelty and diversity into a population via asocial mechanisms. In this model, changes to a solution are done indirectly via the graph-based procedure, with agents having access to three general mechanisms for innovating (see Fig. 2):

Invention introduces a new bit by creating and then connecting a new node to an existing one. New nodes are assigned a randomly generated value in the range \([0,1]\).

Modification changes a pre-existing solution by connecting two existing nodes to form a new edge.

Subtraction shortens a bit string by randomly removing an existing edge and ensures that innovation is not unidirectional.

A general assumption is that these generative mechanisms are restricted: Agents can only create or remove a single bit. Imposing such limitations approximates the idea that innovations are often introduced via limited experimentation within a restricted search space. Some solutions are easier to discover than others because they require less time and resources to produce (given a starting state). Similar notions are present in Tennie and colleagues’ Zone of Latent Solutions (Tennie et al., 2009): here, solutions that are reachable via asocial means have a high probability of being independently (re-)invented.

Social transmission mechanisms

Transmission is the movement of information between individuals and corresponds to how individuals learn from others via observation and teaching (Gergely and Csibra, 2006). Two types of social transmission are present in this model: a vertical transmission process of inheritance and a within-group process of horizontal transmission. Vertical transmission happens at each generation (every \(10\) time-steps) and takes place between a parent agent (who dies) and a newly created child agent. The choice of \(10\) time-steps is arbitrary, but it does capture the finite lifetimes of individuals and recognizes that the contributions of a single individual in a given generation are generally circumscribed (especially when considering long timescales). As the name suggests, within-group transmission takes place between individuals at a given generation, and involves one agent learning a cultural algorithm from another randomly selected agent within the population.

Both forms of transmission are indirect (agents transmit algorithms, not solutions), reconstructive (solutions are generated using the underlying algorithm), and biased (reconstructions are biased towards efficient representations). This aligns with the general idea that transmission is an inductive process guided by both the input data and the prior cognitive biases of learners (Chater and Vitányi, 2003; Culbertson and Kirby, 2016; Griffiths and Kalish, 2007; Kirby et al., 2007). Transmitting an algorithm is thus analogous to learning a recipe or procedure and is instantiated here as a process of reconstructing the shortest path between nodes. Dijkstra’s algorithm is used to construct a graph distance matrix \(({d}_{ij})\) that computes all distances from \({v}_{i}\) to node \({v}_{j}\). The shortest path is one which visits all connected nodes in the shortest number of steps. This assumes graphs are directed with equally weighted edges and that the starting point is the first node in the graph (as determined by \(\Omega\), see section Representing solutions).

Strength of optimization ( \(\lambda\) )

Optimization is modeled as an individual-level decision making process over the pool of solutions derived from social and asocial sources. The goal for agents is to find a solution that improves the fit with the current input problem. One advantage of the approach used here is that it explicitly builds a bridge between individual-level processes and population-level outcomes (Derex et al., 2018). This presents a notable departure from some recent cultural evolutionary models of cumulative culture in which individual-level processes are ignored in favor of solely focusing on the population-level distribution of cultural traits (Enquist et al., 2011; Lewis and Laland, 2012).

Manipulating the strength of optimization (\(\lambda\)) allows us to directly investigate the extent to which this decision-making process is biased or stochastic. The current model examined the following parameter values for \(\lambda\): \([0.0,0.2,0.5,0.8,1.0]\). When the strength of optimization is at maximum (\(\lambda =1.0\)), agents choose a solution based solely on its ability to optimally solve the current input problem. The pool from which these solutions are chosen is restricted to the currently stored solution and variants generated via asocial or social means. A maximally biased choice is one where an agent compares the Levenshtein distance of an input problem (\(p\)) with each solution (\(s\)) in the pool \(X\) and selecting the most optimized one:

$$\mathop {\text{min}} \limits_{s\in X}{\mathrm{LD}}(s,p)$$

As the strength of optimization is decreased, stochastic factors play an increasingly prominent role in determining which solution is or is not adopted. If the strength of optimization is \(\lambda =0.0\), the process of choosing solutions is purely stochastic, i.e., there is no preference for solutions based on the Levenshtein distance, whereas a \(\lambda =0.8\) means that on average \(80 \%\) of agent’s productions will be biased and \(20 \%\) will be stochastic.

Exploration threshold ( \(\Theta\) )

An exploration threshold (\(\Theta\)) is introduced to capture how the level of optimization limits exploration of the problem space. This aims to model situations where solutions resist repurposing due to pressures on maintaining existing functionalities. Agents consider alternative problems if the (normalized) Levenshtein distance between a solution and the current input problem is above this threshold:

$${\text{Repurpose}}\,=\,\left\{\begin{array}{*{20}{l}}{P}_{\text{pos}},&{\text{if}}_{{norm}}{\text{LD}}(s,p)\,{>}\,\Theta\\{\text{Stay}},&{\text{otherwise}}.\end{array}\right.$$ (2)

where \({P}_{\text{pos}}\) is the set of possible problems an agent can explore in a localized region of the problem space. Possible problems are those which are a single Levenshtein distance from the current input problem. Exploration of alternative problems takes place when the \({}_{{\mathrm{norm}}}{\mathrm{LD}}\) of the current solution \(S\) and input problem \(P\) is greater than the exploration threshold \(\Theta\). The following parameter values were examined: \(\Theta =[0.2,0.4,0.6,0.8,1.0]\).

Considering a range of parameter values allows us to investigate how the strength of optimization interacts with these different thresholds. When the threshold is high (e.g., \(\Theta =0.8\)), exploration of the problem space is restricted to a narrow range of poorly optimized solutions, as agents only repurpose solutions for solution-problem mappings with an \({}_{{\mathrm{norm}}}{\mathrm{LD}}\ >\ 0.8\). Having a high \(\Theta\) makes it relatively easy for optimization dynamics to inhibit the rate of exploration: a minimal amount of optimization is required to maintain the current function of a solution. Conversely, lower thresholds (e.g., \(\Theta =0.2\)) encompass a wider range of possible fits, as agents now repurpose solutions for mappings with a \({}_{{\mathrm{norm}}}{\mathrm{LD}}\ >\ 0.2\). Due to the demands for more optimized solutions, and the increased rates of exploration, low thresholds make it difficult to maintain an existing function.

Solution complexity \({H}_{L}(S)\)

Solution complexity is measured as the product of Shannon Entropy (Cover and Thomas, 2012; Shannon and Weaver, 1949) and the length of a solution:

$${H}_{L}(S)=-\sum\limits_{n=1}^{n}P({S}_{i})\,{\mathrm{log}}_{2}P({S}_{i})\,\ell (S)$$ (3)

where \({S}_{i}\) is a binary value found within a solution, \(P({S}_{i})\) is the probability of value \(i\) given a solution string \(S\), and \(\ell (S)\) is the length of the solution. \({H}_{L}(S)\) is therefore the average amount of information within a specific solution string of \(N\)-length. In this sense, \({H}_{L}(S)\) acts as a proxy for solution complexity: lower \({H}_{L}(S)\) strings are less complex than ones with a higher \({H}_{L}(S)\).

This assumes complex solutions are longer strings where the distribution of bits is close to uniform (i.e., \(1\) bit) and provides a relatively simple way of capturing simple solutions (i.e., those that are closer to \(0\) bit). However, a well-recognized limitation of this approach is that it fails to discriminate between strings of equal length, where one forms a highly ordered sequence (e.g., \(01010101\) or \(00001111\)) and the other approximates an algorithmically irregular sequence (e.g., \(01101001\)).