The rest of this study is organized as follows. In Section 2 we discuss the issues of conventionality and optimality, and how these are tackled in models drawn from the various disciplines mentioned above. We first motivate and then describe the exemplar‐style framework which we have used for our model replications in Section 3 , before discussing each replication in more detail in Section 4 , along with the adjustments we have made for comparison, and which aspects of each model are necessary for the development of optimal communication. In Section 5 we propose that three fundamental principles—the creation and propagation of referential information, a bias against ambiguity, and a mechanism leading to information loss—determine whether any system is able to bootstrap functional communication. Finally, this leads into a discussion in Section 6 regarding how this can help us interpret the various theories of the emergence of communication outlined above.

We then employ an additive approach to this framework: We first add the characteristic features of each model in terms of interaction and learning . None of these basic instantiations reliably lead to optimality; as such, we then investigate which particular mechanisms are responsible for doing so. By adding each mechanism in isolation, we are able to investigate exactly which are responsible for driving the behavior of each model. The subsequent direct comparison reveals that the apparent diversity of mechanisms driving the emergence of functional learned communication is overstated in the literature; in fact, the same fundamental processes underpin all of the current accounts.

The emergence of functional learned communication has been studied across a number of seemingly loosely related disciplines, including Classical and Evolutionary Game Theory (e.g., Lewis, 1969 ; Nowak, Krakauer, & Kingdom, 1999 ; Skyrms, 2010 ), Artificial Life (e.g., Steels & Loetzsch, 2012 ), Cognitive Science (e.g., Barr, 2004 ), and Evolutionary Linguistics (e.g., Oliphant, 1996 ; Smith, 2002 ). The assumptions made and conclusions drawn in these various fields regarding the prerequisites for functional communication appear on the surface to be quite different, if not mutually incompatible. To cut through a rather confusing mesh of approaches, models, and results, we have created a framework to replicate a representative selection of the approaches outlined above. Having done this, we then identify a basic framework —an urn‐model—which strips the individual models back to the simplest set of common underlying mechanics.

Our focus in this study is therefore: What are the necessary social and psychological adaptations which allow populations to develop, via processes of learning and use, functional learned communication systems? Semiotic experiments (such as surveyed by Galantucci & Garrod, 2011 ) demonstrate that human subjects can rapidly bootstrap communicative conventions across a range of modalities and interactive conditions. Moving beyond the laboratory, the recent emergence of indigenous sign languages (e.g., Nicaraguan Sign Language and Al‐Sayyid Bedouin Sign Language: Senghas, Senghas, & Pyers, 2005 ; Sandler, Meir, Padden, & Aronoff, 2005 ) is a compelling reminder that functional communication systems are able to self‐organize in human populations in the absence of any explicit, centralized coordination. Presumably, the same mechanisms also underlie the development of other human signaling conventions, including all other human languages. Identifying these mechanisms—which must be particular to human cognition and interaction—will shed light on what enables Homo sapiens to be such a fundamentally communicative species.

Innately specified communication systems are presumably the product of natural selection (Maynard Smith & Harper, 2003 ). As such, the major questions concern the nature of the evolutionary route to signaling, as well as the selective pressures involved. The emergence of learned communication, on the other hand, is less well understood. First, we might ask when and why a learned system would replace an innately specified signaling system (Lachlan, Janik, & Slater, 2004 ; Ritchie & Kirby, 2006 ). Second, socially learned communication systems are potentially shaped by an entirely different set of pressures. In a learned communication system, unlike its innate equivalent, natural selection cannot directly tune the structure of the signaling system; rather, socially learned signaling systems are shaped by the processes through which they are learned and used (see literature review in next section). Their functional properties are then determined by the nature of the learning and usage mechanisms involved (which are themselves potential targets for biological evolution). Understanding the nature of these mechanisms is crucial to understanding when and how a learned communication system such as human language might evolve.

Human language provides a uniquely flexible and expressive system of communication, but we are not the only species capable of communication. Signaling behavior is found throughout nature: Virtually every species has a means of communicating about, for example, the presence of food or predators, potential as a mate, or presence of a competitor. However, only human communication displays such open‐endedness (Tomasello, 2010 ) in the number of signals learned, the contexts in which they are elicited, and the responses they effect. This flexibility arises because the basic building blocks of human language—words—are learned socially, by observing word use by others. In contrast, the articulatory form of signals in the vast majority of animal communication systems are, as far as we know, not socially learned. For instance, among our closest relatives, the form of alarm calls (distinctive calls used to warn conspecifics of the presence of particular types of predators) are thought to be largely genetically determined (Fedurek & Slocombe, 2011 ); even among the other apes, while the decision to employ a given call may be intentional (Slocombe et al., 2010 ), there is only limited evidence for any flexibility or group‐level variation in the form of those calls (Crockford, Herbinger, Vigilant, & Boesch, 2004 ; Seyfarth & Cheney, 1986 ). There are of course obvious exceptions; for instance, many bird species are capable of both learning and innovating songs (Podos, Huber, & Taft, 2004 ). However, vocal learning in birds (and in other animals where it has been observed, such as cetaceans, elephants, and bats: Janik, 2014 ; Poole, Tyack, Stoeger‐Horwath, & Watwood, 2005 ; Boughman, 1998 ) is most probably a case of convergent evolution, rather than reflecting some ancestral cognitive capacities shared by the extremely distant common ancestor of all vocal learning species.

2 Past approaches to the emergence of functional learned signaling

For any form of communication to be functional, it must be conventional; in particular, there must be consensus within a population about how signals are produced and interpreted. Conventions are widespread in human populations and extend far beyond the communicative domain (we have, for instance, conventions about what to wear to work, what side of the road to drive on to get to work, what times one should work at, and appropriate language at work).

In his classic study, Lewis (1969) analyzes convention as a type of game‐theoretic coordination problem: Two or more agents have a choice of behaviors, and coordinating those behaviors leads to mutual benefit. Even working under the assumption that such coordination provides a mutual benefit, the mechanisms leading to the establishment of conventions are not immediately obvious. Lewis proposed a critical role for common knowledge (Lewis, 1969; p. 56): All agents are aware of a set of propositions, each agent knows that every other agent also knows those propositions, and so on recursively ad infinitum. Populations of agents can employ this knowledge to create conventions by making rational choices targeting maximal individual payoffs. However, in the case of these simple conventions (where an atomic choice is made from an unordered set of alternative behaviors, such as the side of the road we drive on), Vylder (2008) shows that such sophisticated reasoning is unnecessary: Whenever agents strongly amplify observed behavior, population‐wide agreement on a single convention is assured. (This is where agents sample from each other's behavior, and the suite of behaviors is represented as a ranked probability distribution. In strongly amplified copying, the ratio of likelihoods between any two subsequently ranked behaviors is strictly increased in favor of the more highly ranked one.)1

However, being conventional is not enough to ensure a functional communication system. Lewis's Signaling Game (1969) is the canonical problem in the emergence of learned communication. In its most basic form, the Signaling Game involves a single signaler and a single hearer. The signaler must communicate one of two possible world‐states to the hearer with two available signals, and the hearer has a choice of two possible responses. Each world‐state has a corresponding “matched” response which triggers a mutual payoff; mismatched responses provide no payoff. Lewis showed that even this simple game has several Nash equilibria, where a Nash equilibrium is any state where the best payoff for any given player is to continue with her current strategy, leading to a global stasis where no further change in play can occur. Only two of these Nash equilibria are optimal strategies, designated by Lewis as signaling systems, which guarantee that the hearer will select the appropriate response based only on the signaler's signal. The others—pooling equilibria—are stable but non‐optimal strategies; for example, if the signaler sends the same uninformative signal for every world‐state and the receiver always chooses the action with the greatest average payoff. This ambiguity of signal‐to‐response mapping will be referred to below as homonymy.

Moving beyond the simplest scenario of two world‐states, two signals, and two responses results in a drastic increase in the number of possible system states, and the chance of multiple partial pooling equilibria: These are stable states that are a mixture of informative strategies and pooled, non‐informative ones (see Fig. 1). As such, in addition to being conventional, a functional communication system must be at least somewhat informative; in Lewis's terms, it must allow the hearer to select the correct response with greater than chance frequency. The most functional systems are optimal; in Lewis's terms, such systems require that all world‐states must map to at least one signal, and each of those signals must be unambiguously associated with a matched response. Identifying how conventional, optimal, learned signaling systems develop has therefore become the benchmark problem in this field; the existence of many non‐optimal stable states in the face of Lewis's rational behavior suggests that the reliable development of optimal signaling occurs via some other means. In the sections that follow, we review the proposals made in various fields as to what that mechanism might be.

Figure 1 Open in figure viewer PowerPoint An example of a partial pooling equilibrium. The first two states, signals and actions, are pooled and mutually uninformative: The speaker always produces the Signal A and the receiver always uses Action 1. State 3, however, leads to the informative Signal C and hence Action 3.

2.1 Payoff‐based accounts Game‐theoretic accounts are driven by the idea of a payoff, which is instantiated through either increased evolutionary fitness (e.g., Nowak et al., 1999) or reinforcement learning (e.g., Skyrms, 2010), where individuals modify their behavior in response to the payoffs they receive. Nowak et al.'s model involves the natural selection of cultural variants (Boyd & Richerson, 1985) and rests on two assumptions: First, the fitness of individuals (i.e., the number of their offspring) is determined by their communicative success within a population, and second, the resulting children learn from their parents (with some error), thereby inheriting their communication system via social learning. Numerical simulations show that, under these conditions, while some populations evolve optimal systems, many stabilize at partial pooling equilibria, some signals being associated with more than one meaning. However, these suboptimal states occur less as the chance of learning error increases: Error knocks the systems out of previously stable states. Note that the mechanism at play here—natural selection—is the same as that invoked to explain the evolution of signaling systems which are not socially learned, the only difference being that the mechanism through which behaviors are inherited is social learning, rather than genetic transmission. Skyrms (2010) surveys a type of reinforcement learning devised by Roth and Erev (1995). Many forms of reinforcement learning (e.g., Bush & Mosteller, 1953) are essentially “memoryless”: Each available behavior can be completely characterized by a single (probabilistic) value which describes its current state. The effect of any learning experience recalculates this value using a parameterized function. Roth–Erev reinforcement, on the other hand, models memory as a collection of tokens which gradually accumulate over a learner's lifetime; as such, calculating both behavior and the effects of learning must take into account not just the relative proportion of memory tokens, but their absolute count as well (experiences in later life contribute relatively little to the overall store of tokens, whereas early experiences have larger effects). Skyrms (2010) motivates the Roth–Erev model by showing that learners in a non‐signaling scenario—one where they must learn to modify their behavior to maximize their expected returns when presented with an initially unknown distribution of world‐states—are able to escape pooling equilibria by using Roth–Erev reinforcement learning, whereas parametric forms of reinforcement learning (Bush & Mosteller, 1953) are not. Incorporating this into the basic Lewis signaling game, Skyrms (2010) describes a simple iterative strategy: Each time a pair of agents successfully communicate, the associations involved are strengthened by both. A proof of convergence to optimality using Roth–Erev learning in the minimal signaling game (two world‐states, signals and responses) is given in Beggs (2005). However, Barrett (2006) shows that including more states, signals, or responses immediately increases the possibility of non‐optimal equilibria, and that simple reinforcement no longer leads to guaranteed convergence on optimal signaling. Barrett (2006) goes on to propose two solutions to this problem: the addition of negative reinforcement (also known as punishment, the term we shall use henceforth), where unsuccessful associations are decremented, and forgetting (which is further investigated in Barrett & Zollman, 2009). Both strategies greatly increase the likelihood that optimal signaling develops and guarantee it for certain parameter regimes. Also using Roth–Erev reinforcement (but looking at pragmatic implicatures rather than signaling games), Franke and Jäger (2012) investigate the effects of lateral inhibition: After successful communication, competing associations are dampened. With this effect included, they show via simulation that optimal states are reached far more quickly. We draw attention to the role of lateral inhibition here, as it plays an important factor in several other models described below.

2.2 Interaction‐based accounts The program of artificial life research as set out in Steels (2012) and the neural network populations of Barr (2004) place a critical emphasis on the fundamental roles of feedback and alignment. Agents interact with each other multi‐modally: Repeated attempts at local alignment ultimately lead to a globally functioning communication system. As part of a larger program to investigate the evolution of language, Steels (2012) shows that multi‐modal negotiations between embodied robotic agents situated in a complex environment lead to the development of multiple levels of language‐like structures. The seminal Naming Game described in Steels and Loetzsch (2012) is a core element in this process. A population of agents is situated in an environment containing a number of objects. Pairs of randomly chosen agents are presented with a limited context of objects observable by both parties, with one particular target object chosen for a designated signaler to attempt to communicate to the hearer. The agents then execute a scripted series of actions. A signal is sent, the hearer's interpretation is checked, and the intended referent is indicated by the speaker in the event of failure. Both agents then potentially adjust their internal representations by strengthening and weakening associations, with these weight adjustments determined according to the particular scenario (success or failure) which has just occurred—these updates can be carried out by speaker, hearer, or both agents involved in the interaction. In addition, agents possess the ability to innovate terms for previously unseen objects, chosen from a very large signal space. This is a potentially critical difference with the other models discussed in this section, which assume limited signal spaces, and in which processes which eliminate homonymy are critical to establishing optimal signaling. In contrast, in a typical Naming Game simulation involves an initial stage in which the number of terms for any given object explodes, before a single term wins out for each, as the result of gradual lateral inhibition of competing terms. A consequence is that homonymy is very rare: “optimality” for these games tends to be defined not in terms of successful communication, but by when the lexicon is reduced to a minimal size. De Vylder and Tuyls (2006) show that, as shown with simple conventions in Vylder (2008), convergence on a minimal, unambiguous, conventional lexicon is guaranteed if agents utilize a strongly amplifying imitation function (as described in Section 2). Baronchelli (2010) further shows that hearer update (i.e., hearers updating their internal representations based on the success or failure of an interaction) is critical in the development of optimal lexicons, while speaker update plays a lesser role. Barr (2004) looks at the role of common knowledge in the emergence of conventional communication. Employing populations of interacting agents (both neural network based and simpler association based), he showed that not only was common knowledge (about the signaling behavior of the population as a whole) unnecessary, but that population‐wide convergence on a single system was significantly more likely when agents used only the information from individual interactions. The neural network model is rather sophisticated, but it includes a type of parametric reinforcement learning similar to Bush and Mosteller (1953) (outlined in Section 2.1). Also included is a form of lateral inhibition, (although described as a mutual exclusivity bias) which acts to promote one‐to‐one signal/meaning mappings. Barr's simulations reliably lead to states of optimal signaling: In a second set of results, Barr aims to counter a possible objection to his neural networks: as they sample over time from the whole population, they could be argued to be accruing a type of common knowledge. To this end, he uses a modified association‐based model (based on Steels, 1997) which employs a “stay/switch” strategy—agents stick to successful strategies with some chance of switching to less successful ones. This model includes a type of memory where agents can be restricted to knowledge of their last n interactions. In the end, both types of population, neural network and stay/switch, reliably arrived at global convergence on an optimal system.2 A further observation was that stay/switch populations proved more efficient at developing globally optimal signaling when their memories were highly restricted, providing another strong counterexample to common knowledge‐based explanations. 2.2.1 Reinforcement versus feedback learning: An aside It is worth clarifying the differences between reinforcement and feedback accounts, as they actually share much in common. Another complicating point is that the models in Barr (2004) and, to a lesser extent, Steels and Loetzsch (2012) are described at different times in terms of both Reinforcement and Feedback learning. One of the main factors distinguishing reinforcement (in its classic form) and feedback involves the availability of referential information. This describes how agents associate meanings with signals, both for when signals are sent and interpreted. In fact, in classic signaling games, referential information is irrelevant. Mutually available “meanings” are split into two: world‐states perceivable by the speaker, and actions taken by the receiver. However, as every state has a single matched action which triggers a payoff event, we can (for the sake of direct comparison) temporarily overlook this distinction and see matched state/action pairs as directly equivalent to meanings in the other models. In any case, in reinforcement accounts the equivalent of referential information is only made available after a successful interaction, and it is provided by the environment: Signaler and receiver know that the intended and interpreted meaning have coincided, because they receive reinforcement from the environment; more subtly, in the event of failure the absence of positive reinforcement informs each party that his or her choice has been unsuccessful. In feedback learning, on the other hand, the environment cannot provide this information. Instead, the agents themselves must furnish it via “pointing” behavior: Simple social interactions, presumably via another modality, which are able to resolve reference. As such, although there is a near equivalent to the reinforcement described above, it is analyzed in terms of the interaction between the agents; the receiver must point at its interpreted referent, providing Interpretation Feedback, and the speaker must either indicate whether the receiver has selected the correct meaning (what we shall term Yes/No Feedback) or provide richer information by indicating its intended referent (henceforth Referential Feedback). With Yes/No Feedback, then, the situation resembles reinforcement learning in that full referential information is only made available after communicative success. The real difference between reinforcement learning and Yes/No Feedback learning is seen after failure. In Reinforcement learning, the speaker can only know that his or her intended signal/meaning association was unsuccessful. Similarly, the hearer is only aware that his or her interpreted association failed. This is also true for Referential Feedback, but due to the availability of Interpretation Feedback, extra information (about how the hearer interpreted the signal) is reliably available to the speaker. In Reinforcement learning this information is only available after successful communication. Referential Feedback plays a similar role, as it provides full information about the speaker's intended referent to the hearer; again, with reinforcement learning this is only available after success. It is the availability of this extra information in feedback learning that allows for more subtle strategies than in reinforcement learning. With Reinforcement learning, agents must somehow promote successful associations and inhibit failed ones. This remains the case with Feedback learning, but speakers and hearers have reliable sources of referential meaning which are independent of communicative success. How this information is used, of course, depends on the particular model. Interestingly, then, although Barr (2004) describes what must be a feedback model—in that alignment is verified through interaction—the interaction itself is placed in a black box. Because of this, the model uses an exact equivalent to the reinforcement dynamic: The extra information potentially available is not actually used. In models such as that of Steels and Loetzsch (2012), the interaction has a more fine‐grained realization which is incorporated into the model, making the extra sources of information potentially usable. The question, then, is to determine what role those extra sources of information do play; this will be dealt with in Section 4.

2.3 Observational learning accounts A third strand of work has focused on the evolution of communication via iterated learning (Kirby, 2001) — repeated cycles of production and observational learning, often but not always with population turnover. In generational turnover models, one generation of learners learns from behavior produced by the previous generation of learners and goes on to produce behavior which is observed and learned from by a subsequent generation of learners; alternatively, new agents acquire a signaling system by observing the existing population produce and/or interpret signals, then replace an older member of the population, implementing a gradual turnover of the population. This observational learning paradigm typically de‐emphasizes the role of communicative interaction (see e.g., Oliphant, 1996; Smith, 2002): Agents are assumed to be unmodified by any further interaction after an initial phase of learning, and signaling conventions can therefore only develop during this initial stage of sampling and learning. For this reason, the models place a critical emphasis on the learning process itself. Furthermore, unlike the reinforcement and interaction‐based models discussed above, these observational learning models typically do not include any referential uncertainty: Learners learn from observing meaning‐signal pairs, rather than signals produced in some context which leaves its intended meaning unclear. The models in Smith (2002) investigate how individual learning biases shape the evolution of signaling systems in populations through iterated learning. In this study, learners are modeled as simple associative networks, who adjust association weights after each learning exposure according to a particular learning rule: Smith varies these learning rules parametrically, to explore both the properties of learning at the individual level and the consequences of these individual‐level processes for the signaling systems which develop in populations. Smith used three criteria to classify the effect of each learning rule: whether it produced agents capable of (a) learning, (b) maintaining, and (c) constructing optimal signaling systems; each criterion is a strict subset of the previous one. A property shared by all constructor‐type rules is an implicit bias against homonymy in the form of lateral inhibition. Learners with such a bias are less likely to successfully learn homonymous meaning‐signal mappings, and over many episodes of learning this bias eliminates homonymy entirely, leading to optimal signaling. Learning rules which are neutral to homonymy are sometimes capable of constructing functional signaling, but usually converge on suboptimal pooling equilibria. In contrast, biases against synonymy alone do not contribute toward the development of optimal systems, although they are required for the learning of optimal systems under certain assumptions about the relative size of the meaning and signal spaces (K. Smith, 2004). Oliphant and Batali (1996) adopt an alternative approach within the observational learning framework, exploring a rational approach which they dub obverter. Their work starts from the observation that the rational approach to signaling is to maximize the chances of being correctly understood, while rational receivers will attempt to maximize the chance of correct interpretation. Obverter signalers leverage this fact by calculating which signal is most likely to be correctly interpreted as their intended meaning, based on the observed reception behavior of the population; similarly, obverter reception involves identifying which meaning is most likely to be signaled using the received signal, again based on observations of the population's production behavior. Oliphant and Batali first show that when agents have perfect information about the signaling behavior of the population (e.g., through unlimited observation of the production and reception behavior of that population), the communicative accuracy of a population will necessarily increase with every new generation of learners who apply the obverter approach to production and reception, eventually leading to convergence on an optimal system. Numerical simulations show that approximating this perfect knowledge, by estimating the population's signaling behavior from a limited number of observations of population behavior, is sufficient to guarantee optimal communication.