Numerical simulations are carried out on an L × L square lattice with N = L2 vertices and equal connectivity <z> = 4. The results shown were obtained for communities of N = 10000 individuals. The equilibrium results, including frequencies of cooperation f c and average strategies <x>, are averaged over the last 10% time steps after a transient period of 300 time steps. This procedure is repeated 100 times for 100 independent random realizations of the game considered. The frequency of cooperation (or cooperation level) f c used to evaluate the system performance is defined as the percentage of Cs of all the actions taken by all individuals during one time step t. As there are 8TN actions in 4NT-IPD games, f c is calculated via

where w(i) denotes the number of Cs during the interactions between agent i and its four neighbors.

As shown in Figure 1(a), with a small fraction (p s = 0.05) of PSO-inspired shills introduced, the average cooperation level of the original population is greatly promoted, despite the fact that much fiercer conflict arises between individual interests and collective welfare when the temptation to defect b increases from 1 to 2. This should be largely attributed to the global search ability of PSO, by which the shills can find more successful strategies and spread them to the neighbors. In such a case, the temptation to defect b has a negligible effect on the value of f c in the equilibrium, which, as a matter of fact, only diminishes from 0.944 for b = 1 to 0.933 for b = 2. On the other hand, although spatial reciprocity and iterated interactions favor the emergence and maintenance of cooperation to some extent, they cannot guarantee a desirable level of cooperation in more defection-prone environments as the game intensity b grows. Expectedly and as shown by red circles in Figure 1(a), we observe a gradual decline of f c , which finally drops below the initial level (f c0 = 0.5) of cooperation as b approaches its maximum limit of 2. It is also important to note that direct reciprocity emerging from iterated interactions does effect the evolutionary outcome, as we find that similar results cannot be obtained for one-shot IPD with b = 1.

Figure 1 Comparison of evolution characteristics between cases with (p s = 0.05) and without shills (p s = 0). (a): average level of cooperation in dependence of b. (b): average Markov strategy of the population as a function of b. (c): average payoffs of one action in dependence of b. (d): evolution of cooperative behavior concentration over time step t for b = 2. We observe that while the spatial structure and iterated interactions contribute to the promotion of cooperation in PD games compared with one-shot games in a well-mixed population, soft control with a small proportion (p s = 0.05) of shills can further enhance cooperation levels even in strongly defection-prone environments, resulting in an average strategy of TFT. All trajectories are averaged over 100 independent realizations of the game considered, whose parameter setting is L = 100, T = 50, ω = 0.95 and the maximum evolution generation G max = 300. Full size image

As each individual adopts a stationary Markov strategy (p 0 , p c , p d ) (see Methods), we examine the average state <x> = (<p 0 >, <p c >, <p d >) reached by the population in the equilibrium in Figure 1(b), each element of which denotes the average probability for a random individual to cooperate in the first stage of the IPD, the conditional probability to cooperate when the last move of the opponent is C or D respectively. For p s = 0.05 and 1<b<2, the individuals come to a consensus of Markov strategy approximate to (1,1,0), which can be viewed as TFT. Actually, TFT has been proven the simplest and the most successful strategy in the IPD game, via which one offers to cooperate initially, punishes defectors and reward cooperators in the successive rounds with “an eye for an eye”. Hence, desirable social welfare can always be achieved in such a case, as shown by blue squares in Figure 1(c). On the contrary, the original population without shills results in a worse strategy, in which one tends to cooperate with smaller probabilities as b grows. As a consequence, the average payoff an individual receives for each action diminishes from 0.9 to 0.4, as shown by the red circles in Figure 1(c). To have a better understanding of the evolution process, we plot the cooperation frequency as a function of time steps t in Figure 1(d) for both cases. Due to the random initialization for strategies, the evolutionary curves both start from f c = 0.5, an equal probability for each agent to cooperate or defect in the first generation. Similar to the cases in most literature15,59, the evolution courses follow the pattern of endurance and expansion: defectors take advantage of the random arrangements at the early stage and can thus make the greatest profits by exploiting cooperators. It enables defectors to spread across the population such that only little clusters of cooperators exist and a decline of f c is observed in the critical time step. Restarting from the lowest point, where an ordered distribution of cooperators and defectors is reached, clusters of cooperators begin to expand until a new equilibrium between cooperation and defection is achieved, so we see a rise of cooperation levels which gradually converge to a stable state.

It is also found that, without soft control, the outcome of one random run of the simulation is more dependent on the initial strategy profile and its distribution, as is shown in Figure 2. For p s = 0, the stationary outcomes of five runs diverge significantly from the average performance value in Figure 2(b), whereas the dramatic deviation is removed by the proposed soft control mechanism with p s = 0.05 in Figure 2(a). This finding is easy to understand if we analyze the evolutionary dynamics of the two cases. On the one hand, as the players in the original population adopt the updating rule of unconditional imitation (see the Methods section for the detailed definition), which means strategies are produced by copying the old ones and thus no new acting tactics are created, the strategy diversity without soft control is highly dependent on the initial strategy profile. As a consequence, the outcome of one random simulation makes little sense in representing the evolution dynamics, for it is more sensitive to randomness. Yet, as more runs of simulations are carried out, the result gradually converges to a value which can be used to approximate the average cooperation level within a specific parameter setting. On the other hand, with PSO inspired soft control, new strategies are adaptively generated to maximize the potential payoff by considering simultaneously the most profitable strategy in the past and the best strategy of the neighbors, enabling the strategy variety to be less dependent of randomness brought about by the initialization operation. In sum, while promoting cooperation in strongly hostile environments, the proposed approach can lower the impact of random factors by adding to the strategy diversity in the population.

Figure 2 Evolution of the cooperation level over time steps with and without soft control in 5 random runs for b = 2. (a): p s = 0.05, ω = 0.95. (b) p s = 0. The blue, thicker trajectories indicate average performance of five simulation runs and the rest represent outcomes of 5 independent runs. The red dotted line in panel (a) shows the average cooperation frequency with noises included in the strategy updating process for shills. It can be observed that without soft control, the outcomes of one-shot run is dependent on randomness with a large value of the standard deviation. On the contrary, soft control not only makes the outcome undisturbed by randomness, but is also robust to noises in the updating process. Full size image

The effect of shills in the boost of cooperation can be partly comprehended by the comparison between the evolution courses of cooperation frequencies for the whole population and the individuals within the neighborhood of shills. Focusing on Figure 3, we have two important findings: firstly, while both cases experience a decreasing phase of cooperation due to invasion of defectors in the early stage, the individuals adjoining to shills do maintain a higher level of cooperation; secondly, however, as cooperative behavior spreads with the evolution carrying on, the existence of shills hampers its further expansion. These results demonstrate that, although shills facilitate propagation of cooperation by exploring the strategy space, especially in the endurance period when the individuals are mostly defectors, the current parameter setting (p s = 0.05, ω = 0.95) cannot guarantee a global best strategy for shills in the equilibrium. Since the shills imitate the updating mechanism in PSO, the results can be interpreted as follows: unlike the normal agents who greedily switch their strategies to the best one within the neighborhood or remain unchanged, shills conduct effective search for potentially better strategies in a continuous 3-dimension space, which helps to increase the fraction of cooperation in shills and their neighbors in the early stage. On the other hand, as ω = 0.95 is used to balance the weight between one's best strategy in the history as well as the most profitable strategy within its neighborhood, shills tend to take into account more history information when updating strategies, which plays a less important role for payoff improvement in the current situation. As a result, while the average strategy of the whole population converges approximately to TFT, the shills result in an reacting rule <x> = (0.58,0.55,0.26) as shown in Figure 3(b), which makes the average payoff of individuals within shills' neighborhood lower than the average level of the whole system. Hence, we argue that, while shills facilitate the propagation of cooperative behavior across the network, relying more on history information (ω = 0.95) does not contribute to maintenance of cooperation within their neighborhood.

Figure 3 Evolution characteristics of shills. (a): evolution of the frequency of cooperation for different individuals for p s = 0.05, ω = 0.95 and b = 2. The blue line represents the frequency averaged over the whole population, while the red-dotted curve corresponds to the average cooperation of individuals within the neighborhood of shills. (b): average strategy of shills in the equilibrium as a function of the temptation to defect b. While shills promote the propagation of cooperative behavior across the network, relying more on history information (ω = 0.95) cannot maintain cooperation within their neighborhood. Full size image

This reasoning is fully supported by the results shown in Figure 4, where we plot the dependence of the frequency of cooperation on both ω and p s with b = 2. For p s <0.25, all 0<ω<1 induces a higher level of cooperation compared to the average value f c = 0.4556738 achieved by the original population, while the promotion of cooperation is guaranteed only by ω not exceeding a threshold ω c for p s >0.25. This should be attributed to the different impacts of ω and p s on the evolutionary outcome. Firstly, for a fixed value of ω, adding more shills does not contribute to further enhancement of cooperation and there exists a moderate boundary value that warrants the best promotion of cooperation. When p s is small (e.g., p s = 0.05), the cooperation level is enhanced jointly by the few shills and the majority of normal agents, among whom the shills act as pioneers by exploring potentially profitable strategies and the normal agents within the neighborhood of shills spread the successful ones by unconditional imitation. Although more shills being incorporated adds to the strategy diversity of the fundamental population, it however slows down the strategy propagation at the same time, for the proportion of normal agents is reduced. Therefore, it becomes more difficult for the population to reach a consensus without effective diffusion of strategies by enough normal agents. For this reason, as p s grows from 0.05 to 0.4, it takes the network a much longer time to stabilize and the stationary state shows up as an equi-amplitude oscillation process as shown by the red line in Figure 5(a). However, the amplitude of the oscillation decreases as more neighborhood information is used when ω = 0 is chosen. It is also worthwhile to note that our results verify the judgment made by the authors in48, as they believe there will be a critical value of shill numbers to achieve the best soft-control goal. Secondly, for a given fraction p s of shills, smaller values of ω lead to stronger boosts of cooperation in contrast with larger ones as shown both in Figure 4 and Figure 5(b), suggesting that it benefits the population as a whole to learn from others in such a dynamical environment. In particular, ω = 0 can always results in the situation of global cooperation, in which f c = 1. This is different from the case in6, where all players follow the same updating rule of PSO and the most significant benefits are warranted by ω = 0.99 in scenarios strongly unfavorable for cooperation. The reason can be found when we compare our model of soft control with the optimization process using PSO. When searching for an optimal solution to a particular problem in the feasible space, each particle is faced with a static environment in term of the fitness function, i.e., each strategy corresponds to a fixed payoff. However, in the evolutionary game scenario, the payoff each individual receives depends on both its own strategy as well as the strategy of its opponent. In other words, the best strategy in the history is not necessarily a good choice for the current situation. In such a case, history information plays a less important role in updating strategies of shills towards the highest cooperation promotion of the population. On the other hand, the strategies in the neighborhood provide more useful information. As a consequence, assigning higher weights to the collective knowledge of the neighbors via small ω proves an effective way of inducing cooperation in such defection-prone environment.

Figure 4 Dependences of cooperation frequencies on p s for different values of ω with b = 2. Each curve with markers shows the frequency of cooperation in the equilibrium as a function of p s for different ω and the blue dotted line represents the average cooperation level f c = 0.456738, which is achieved by the original population without shills introduced. It can be observed that: (i) for all 0<ω<1, cooperation cannot be further promoted by adding more shills; (ii) for all p s <0.4, smaller ω facilitates the promotion of acooperation compared with larger values. We draw the conclusion that assigning higher weights to collective knowledge of the neighborhood swarm is a better choice in strategy updating for cooperation enhancement, whereas simply sticking to one's history memory results in low cooperation level. The results are averaged over100 independent realizations and the parameter setting is: L = 100, T = 50, G max = 300. Full size image