Abstract Polymorphism has fascinated evolutionary biologists since the time of Darwin. Biologists have observed discrete alternative mating strategies in many different species. In this study, we demonstrate that polymorphic mating strategies can emerge in a colony of hermaphrodite robots. We used a survival and reproduction task where the robots maintained their energy levels by capturing energy sources and physically exchanged genotypes for the reproduction of offspring. The reproductive success was dependent on the individuals' energy levels, which created a natural trade-off between the time invested in maintaining a high energy level and the time invested in attracting mating partners. We performed experiments in environments with different density of energy sources and observed a variety in the mating behavior when a robot could see both an energy source and a potential mating partner. The individuals could be classified into two phenotypes: 1) forager, who always chooses to capture energy sources, and 2) tracker, who keeps track of potential mating partners if its energy level is above a threshold. In four out of the seven highest fitness populations in different environments, we found subpopulations with distinct differences in genotype and in behavioral phenotype. We analyzed the fitnesses of the foragers and the trackers by sampling them from each subpopulation and mixing with different ratios in a population. The fitness curves for the two subpopulations crossed at about 25% of foragers in the population, showing the evolutionary stability of the polymorphism. In one of those polymorphic populations, the trackers were further split into two subpopulations: (strong trackers) and (weak trackers). Our analyses show that the population consisting of three phenotypes also constituted several stable polymorphic evolutionarily stable states. To our knowledge, our study is the first to demonstrate the emergence of polymorphic evolutionarily stable strategies within a robot evolution framework.

Citation: Elfwing S, Doya K (2014) Emergence of Polymorphic Mating Strategies in Robot Colonies. PLoS ONE 9(4): e93622. https://doi.org/10.1371/journal.pone.0093622 Editor: Stephen R. Proulx, UC Santa Barbara, United States of America Received: July 18, 2013; Accepted: March 6, 2014; Published: April 9, 2014 Copyright: © 2014 Elfwing, Doya. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: These authors have no support or funding to report. Competing interests: The authors have declared that no competing interests exist.

Introduction If you come to any more conclusions about polymorphism, I should be very glad to hear the result: it is delightful to have many points fermenting in one's brain, and your letters and conclusions always give one plenty of this same fermentation. - Charles Darwin, letter to Joseph Hooker, 1846 [1] Polymorphism has fascinated evolutionary biologists since the time of Darwin [2], [3]. Polymorphism is defined as that there exist more than one distinct phenotype of a species occupying the same habitat at the same time [4], [5]. Polymorphism does not include continuous variations, but only discrete variations or in the case of continuous traits, such as body size and color, strongly bimodal or multimodal phenotype variation distributions. The existence of more than one distinct phenotype of a species demands an explanation, because the theory of natural selection predicts that the fittest phenotype should drive the other, lesser fit phenotypes to extinction. In general, polymorphism is maintained if the “fitness curves” of the polymorphic phenotypes intersect, where the crossover-point is an evolutionarily stable state, realizing a polymorphic evolutionarily stable strategy (ESS) [6], [7]. Common features of the evolution and the maintenance of behavioral polymorphism include: 1) that time or resources can be invested in more than one activity that contributes to the fitness; 2) that the individuals have rules about how to allocate time and resources among the alternative activities; and 3) that there is a trade-off between the activities that contribute to the fitness, i.e., the allocation of time and resources invested in one activity could be invested in another [8]. Frequency-dependent selection [7], [9], [10] is considered the most important explanation for the maintenance of polymorphism in a population. Frequency-dependent selection occurs when the fitness of the phenotypes depends on their frequencies in the population, and the fitness curves intersect at a crossover frequency where the phenotypes are equally successful. Alternative mating strategies (or alternative reproductive behaviors) [11]–[13] is the area of biological research most closely related to this study. Different mating tactics has been observed in a wide variety of species, both in males (e.g., [14]–[16]) and in females (e.g., [17]–[19]). However, there are relatively few cases where the differences in mating behavior have been confirmed to have a genetic basis [20]–[26], and even fewer studies that have suggested equal average reproductive success, i.e., shown crossing of the fitness curves, of alternative phenotypes [21], [22], [27]. The use of robot evolution experiments to study biological phenomena has gained traction in recent years [28], as a complementary approach to biological studies and theoretical models. In comparison to biological studies, robot evolution has the advantage that the evolution of hundreds of generations of robot controllers can be completed within hours or days. The experiments can easily be repeated for different parameter settings and environmental conditions, which allows for quantitative testing and analysis of robustness and stability. In comparison with theoretical and numerical models, robot models can capture the often complex physical interactions between the agent and the environment, including other agents. Floreano and Keller with different co-authors have used robot evolution experiments to investigate the emergence and reliability of communication [29]–[32], to quantitatively test Hamilton's rule for the evolution of altruism [33], and to test the influence of genetic architecture and mating frequency on the division of labor in social insect societies [34]. A distinctive feature of our earlier proposed embodied evolution framework [35] is that there is no explicit fitness function or algorithm for selecting individuals for recombination and mutations. Instead, offspring can only be created by the physical exchange of genotypes between two mating robots. In general, the choice of selection method requires careful consideration when using artificial evolution to study ESSs. A strong theoretical assumption underlying ESS analysis is that the population is infinitely large. Fogel et al. [36]–[38] demonstrated in simulation experiments, using the Hawk-Dove game, that for finite populations the results differed, at best, significantly from the theoretical ESS values and, at worst, bore no resemblance to the ESS. They, therefore, questioned the usefulness of ESSs to explain real biological phenomena in populations with limited population sizes. In response, Ficici and Pollock [39] showed that the difference between the theoretical ESS predictions and the observed simulation results could be accounted by the two selection methods used by Fogel et al.. The purpose of this study is to demonstrate that evolutionary stable alternative mating strategies can emerge in a small robot colony without any predefined mating preferences as a result of the trade-off between the resources spent on energy conservation and the resources spent on courtship of mating partners. As alternative mating strategies is a natural precursor for the evolution of sexual dimorphism, this line of research has the potential of increasing our understanding of the emergence of different sexes. To investigate the ecological conditions for evolution of alternative mating strategies, we performed artificial evolution experiments, in simulation, with a small colony of Cyber Rodent robots [40] using our proposed embodied evolution framework [35]. We performed the experiments in simulation because of the infeasibility of running hundreds of generations of evolution in hardware. In previous work [35], we have shown that learned and evolved behaviors in simulation have similar performance and behavior when transferred to the hardware setting. Each individual interacted in small groups of four robots and during its lifetime of 288 seconds an individual experienced three periods of group interactions, where the participants in each group were randomly selected. We placed the four robots in an arena (2.5×2.5 m) with 4 to 16 energy sources. The robots were equipped with two wheels, an infrared port for the exchange of genotypes, and a camera that could detect energy sources, and tail-lamps and faces of other robots (Figure 1). The robots could execute three basic behaviors, foraging, waiting (for a potential mating partner), and mating, which were learned by reinforcement learning [41]. The mating strategy, i.e., the selection of basic behaviors, was controlled by a linear neural network and the (five) neural network weights were adapted by the evolutionary process (Figure 2). From a biological point of view, the population consisted of simultaneous hermaphrodites, who could reproduce offspring by mating (i.e., an exchange of genotypes with a mating partner). For each of the individuals involved in a mating event, the probability of reproducing offspring was linearly dependent on the individual's internal energy level (see Methods and [35]). This created a trade-off, where, in relative terms, an individual could maximize its fitness by maximizing either the frequency of mating events or the energy level at the mating events. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Figure 1. Two physical robots with six energy sources. The Cyber Rodent robots used in the experiments were equipped infrared communication for the exchange of genotypes and cameras for visual detection of energy sources (blue), tail-lamps of other robots (green), and faces of other robots (red). https://doi.org/10.1371/journal.pone.0093622.g001 PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Figure 2. The neural network controller. The control architecture consisted of a linear artificial neural network. The output of the network was the weighted sum ( ) of the five network inputs ( ) and the five evolutionarily tuned neural network weights ( ). In each time step, if the output was less or equal to zero then the foraging module was selected, otherwise the mating module was selected. The basic behaviors were learned from by reinforcement learning with the aid of evolutionarily tuned additional reward signals and meta-parameters. The foraging module learned a foraging behavior for capturing energy sources. The mating module learned both a mating behavior for the exchange of genotypes, when a face of another robot was visible, and a waiting behavior, when no face was visible. https://doi.org/10.1371/journal.pone.0093622.g002

Discussion In this study, we demonstrated that polymorphic mating strategies can emerge in a small robot colony under homogeneous evolutionary conditions, without a selection scheme or an explicit fitness function that promoted a certain outcome. Our study is, to our knowledge, the first to demonstrate the emergence of polymorphic ESSs within a robot evolution framework. This gives further evidence that artificial robot evolution (for an overview see [28]) can be a feasible and a valuable approach for investigating hypotheses of biological phenomena. The importance of specific details of the genetic algorithm and the structure of the genotype were illustrated in this study. A condition for the evolution of the polymorphic ESSs consisting of foragers and trackers was the small proportion of genotype controlling the mating strategy, in combination with the relatively low crossover rate. The mating strategy was controlled by only 5 out of the 51 genes, located at the beginning of the genotype, and with a crossover rate of 0.1 there was only a 0.8% probability that an offspring would have a mating strategy controlled be a mixture of genes from both parents. A much more frequent mixing of the mating strategy genes would have made it more difficult or even impossible to evolve and maintain separate genetic traits in the same population. An assumption underlying evolutionary game theory [7], is that the payoffs that agents are assumed to be without noise. It is therefore very encouraging that evolutionarily stable polymorphic ESSs could emerge in our experiment with a small population size and with large variances in the performance of similar, and even identical, individuals. The lifetime learning of the basic behaviors by reinforcement learning introduced additional stochasticity. Even in the last part of the evolutionary process, a few individuals failed to capture any batteries or engage in any mating activity. This was usually caused by that the individual got trapped in a corner of the environment and failed to learn how to navigate out of it. The forager strategy in the evolved polymorphic populations can be seen as a cheater strategy. To achieve high fitness, a forager relies on that all the other individuals in the environment (i.e., trackers) will adjust their behaviors according to the trajectory of the forager. The forager, therefore, avoids the cost of searching for mating opportunities. There exists a rich literature on the potential of cheating in hermaphrodite mating systems (for an overview see [44]). Usually, cheating refers to the attempt of individuals to take on the male role over the female role in mating encounters to avoid the cost offspring reproduction. The most similar study to ours was conducted by Rold et al. [45]. They co-evolved a population of predefined male and female robots. The robots, as in our experiments, remained alive by capturing energy sources and reproduced by physical mating, consisting of touching a robot of the opposite sex. The only difference between males and females was that the males remained reproductive throughout their lives, while the females became non-reproductive for a fixed period of time after an reproductive mating event. In their experiment, the reproduced offspring were not the result of a genetic exchange between mating robots. Instead, the males and females were evolved separately with the number of reproductive mating events used as fitness objective. The evolved behaviors of the males and females had distinct differences and their behaviors corresponded to observed behaviors of males and females in biological studies. Interestingly, the evolved behaviors of the males and females also matched the behavior of the foragers and trackers, respectively, in our study. The males opportunistically ate all the food they could find while looking for reproductive females. The reproductive females were less active and adopted a mating strategy of waiting for males to mate with them. This give some support to a hypothesis that polymorphic mating strategies, emerged due to basic trade-off between the resources spent on energy conservation and the resources spent on courtship of mating partners, is a precursor of sexual dimorphism. In our experiments, polymorphism could arise because the foragers and trackers optimized, in relative terms, different parts of the fitness function (Equation 1). The foragers maximized their own energy level, , by spending all their lives foraging for energy sources except for when a a potential mating partner was directly visible, while the trackers maximized the mating frequency, , by spending considerable amount of their lives waiting for potential mating partners. The evolution of “proto-sexes” is a research venue we plan to explore in future work. In the current experimental setup, both the sender and receiver can reproduce offspring at the mating events and the cost of mating is equal, whether offspring are reproduced or not (see Methods). A more biological plausible setup would be that only one of the agents took on the female role, e.g., the receiver, and also bore the main cost of reproducing offspring. The goal would then be to investigate if, and in such case under which conditions, a breeding system with distinct male and female roles could evolve from an initial population of hermaphrodites without any predefined mating preferences, and maybe even more exotic breeding systems such as androdioecy (males and hermaphrodites) and gynodioecy (females and hermaphrodites).

Methods Four Cyber Rodent mobile robots [40] were placed in a 2.5×2.5 m arena, together with 4, 6, 8, 10, 12, 14, or 16 energy sources (Figure 1). The task of the robots were to survive by maintaining their internal energy level through foraging of energy sources and by reproduction of offspring through physical exchange of genotypes by infrared communication. We performed the experiments in a simulation environment, developed to mimic the features of the real Cyber Rodent hardware platform. The robots were equipped with a camera system with color blob detection, used to extract the distances and relative angles to nearest energy source (blue), the nearest tail-lamp of another robot (green), and the nearest face of another robot (red). Mimicking the real robotic hardware, the field of view of the simulated vision system set to . Within an angle range of , the robots could detect energy sources up to 2 m, tail-lamps up to 1.5 m, and faces up to 1 m. Outside this range, the detection capability decreased linearly down to 0.2 m for the maximum angles. We performed 1000 generations of evolution and for each energy source density, we ran 10 evolution experiments. To be able to conduct robot evolution experiments with only only four robots, we utilized time-sharing in subpopulations of 20 individuals within each robot. Each individual in a subpopulation took control, in random order, of the robot for three time-sharings of 400 time steps, i.e., the total lifetime was 1200 time steps. An individual had a maximum internal energy level ( ) of energy units. Each time step, the energy level decreased by unit and a capture of an energy source increased the energy level by units. At birth, an individual had an internal energy level of units. If an individual's energy was depleted, then the individual died and was removed from its subpopulation. When a robot captured an energy source it disappeared from its current position and reappeared in new, randomly selected, position. We did not apply an explicit fitness function or a centralized selection process, instead offspring were created by a mating. The individuals controlling the robots could create offspring by a physical exchange of genotype through infrared communication. The infrared communication ports were located slightly to the right of center in the front of the Cyber Rodent robots, directed straight forward. In the simulation environment, the maximum range of the communication was set to 1 m and the angle range was set to . An individual could initiate the infrared communication by executing a predefined action selected by the mating behavior. For a mating event to be successful, both robots had to be within each others mating range before and after the individuals controlling the robots executed the actions of their currently selected reinforcement learning modules. The probability, for each of the two individuals involved in a mating event, of reproducing offspring was linearly depended on the individual's energy level ( ). A reproductive mating event created two offspring in the individual's subpopulation by applying one-point crossover with a probability of . The genes of the two newly created genotypes were then mutated with probability of , by adding value from a Gaussian distribution with zero mean and a standard deviation of 0.1. After all individuals in a subpopulation had survived for a full lifetime or died a premature death, a new subpopulation was created by randomly selecting a fixed number, i.e., 20 in our experiments, of the offspring reproduced during the last generation. The genotype consisted of 51 real-valued genes: 1) 5 genes controlling the mating strategy by encoding the weights of the top-level neural network that selected basic behaviors (Figure 2); 2) 42 genes determining the parameters of the additional reward signals for the basic behaviors in the form of potential-based shaping rewards [46]; and 3) 4 genes determining the meta-parameters of the reinforcement learning algorithm. The five-dimensional input to the neural network consisted of a constant bias of 1 ( ), the individual's internal energy ( ), and the inverse distances to the nearest energy source ( ), tail-lamp ( ), and face ( ). The sensory inputs were linearly scaled to a range of . If a visual target was not visible, the corresponding input value was set to −1. In each sensory-motor cycle (time step), the output of the neural network ( ) determined which of two reinforcement learning modules that was selected. If the output was greater than zero the mating module was selected, otherwise the foraging module was selected. After a successful mating event, whether it reproduced offspring or not, an individual could not select the mating module again until it had captured an energy source or until time steps had passed. During this time, the tail-lamp was turned off. In the case when only an energy source and a tail-lamp were visible, the energy thresholds for the selection of the mating module, , was given by (11)which depended on the distance to the closest energy source ( ) and the distance to the closest tail-lamp ( ). In order to derive the average energy threshold , we computed the mean of over 676 values of and (26 equidistant values between 0 and 1 for each of the two sensory inputs). The reinforcement learning modules learned their behaviors from scratch in each generation with the aid of evolutionarily tuned potential-based shaping rewards and meta-parameters. The foraging module executed a foraging behavior using the relative angle and the distance to the nearest energy source as state variables. The mating module executed either a mating behavior or a waiting behavior, depending on the current sensory inputs. If a face of another robot was visible, the mating behavior was executed using the relative angle and the distance to the nearest face as state variables. Otherwise, the waiting behavior was executed using the relative angle and the distance to the nearest tail-lamp as state variables. The behaviors were learned by the Sarsa reinforcement learning algorithm [47], [48] with tile coding [48] and potential-based shaping rewards [46]. The global reward for the reinforcement learning modules was set to +1 for a successful mating event and +1 for a capture of an energy source, otherwise the reward was set to 0. The additional experiments, conducted to investigate the evolutionarily stability of the emerged polymorphic ESSs, were performed in a similar manner as the evolution experiments. The only difference was that, in each generation, the subpopulations were created by randomly selecting the genotypes of the different phenotypes from the final generation of the evolution experiment according to the predefined phenotypes ratios. For a detailed description of our embodied evolution framework and algorithm specifics, see [35].

Author Contributions Conceived and designed the experiments: SE KD. Performed the experiments: SE. Analyzed the data: SE. Wrote the paper: SE KD.