We start by presenting the stationary overconfidence level f O and bluffing level f B as a function of resource-to-cost ratio r/c, obtained on square lattice, as shown in Fig. 1(a). It suggests that increasing r/c does not noticeably change f O , but decreases f B , especially when system bias δ is relatively large. Lifting δ also significantly reduces overconfidence level f O for moderate punishment probability p (p = 0.5). Note that positive values of δ induce extra conflicts and thus boost the chances of centralized sanctions. Therein it seems that the values of r/c have little impact on the stabilization of overconfidence, regardless of whether punishment is rare or frequent. Meanwhile, boast behavior (f B ) slightly decreases as r/c increases. Importantly, the results for regular random graph with k = 4 are in accordance with those for translation invariant square lattice. Thus it seems that the structure of interactions does not play a prominent role as long as the average degrees k are identical. The value of k, however, could play a decisive role on the evolution of deception profile. To explore this effect, we investigate the impact of r/c on f O and f B under extreme conditions (p = 1, δ = 1) on homogeneous networks with different k values (k = 4; 8; 16). As shown in Fig. 1(b) overconfidence almost goes extinct irrespective of the values of r/c and k, showing that enough sanctions can effectively reduce the general level of overconfidence. Meanwhile, f B drops sharply to a minimum value as r/c ascends when k = 4, in contrast to larger degrees as k = 8 or k = 16. In other words, having more available neighbors partially offsets the effect of punishment on boasters.

Figure 1 Stable overconfidence level f O and bluffing level f B as a function of resource-to-cost ratio r/c for different values of p and δ on homogeneous networks as indicated in the legends. Data presented in Panel (a) are obtained on square lattices with periodic boundary conditions, while results depicted in Panel (b) are obtained on regular random graphs with different values of degree (k = 4, 8, 16) when optimal punishment is applied (p = 1, δ = 1). Other parameters: . Full size image

We next evaluate the impact of probability of punishment p and system bias δ on general overconfidence level f O and bluffing level f B (see Fig. 2). Besides homogeneous networks, summarized in Fig. 2(a,b), we also explored the possible impact of interaction heterogeneity by considering BA scale-free networks, shown in Fig. 2(c). To avoid additional effects we used the same average degree used for random graph in Fig. 2(a). It can be observed that at any given value of δ, increasing punishment rate p will slightly reduce both f O and f B . Meanwhile, for any given p, both f O and f B drop with δ monotonously, signalling that δ plays a decisive role in restraining the deception behaviours (both overconfidence and bluffing). This behaviour is based on the fact that large δ ensures frequent conflicts between competing players, which will reveal their real abilities. In the other extreme case, negative δ < 0 parameter values inhibits conflicts, which results in a prompt fixation into a high overconfidence and high bluffing deception profile (this case is not shown in figures). Moreover, another common trait of color maps independently of the applied topologies is that f B always evolves to a higher level than the corresponding f O , highlighting that natural selection provides higher bluffing level than overconfidence when other factors equal. Furthermore, the comparison of Fig. 2(a,c) illustrates that network heterogeneity can apparently elevate average bluffing intensity, f B . It also illustrates that the heterogeneity of interaction topology helps to restrain overconfidence for relatively large δ values. Interestingly, increasing k of homogeneous networks is capble to lift bluffing level f B while overconfidence f O is slightly reduced (see also Fig. 1(b)).

Figure 2 Color maps depicting the overconfidence level f O (left column) and the bluffing level f B (right column) on the punishment probability (p) - system bias (δ) plane. Data presented in Panel (a) are obtained on regular random graph with k = 4. Panel (b) are obtained on regular random graph with k = 8, while results depicted in Panel (c) are obtained on BA scale-free network with . Note that δ < 0 immediately leads to f O → 1 and f B → 1 regardless of the applied topology (not shown). Other parameters: r/c = 2.5, . Full size image

For better understanding the possible influence of sanctioning mechanism on the evolution of deception profile (α, β), we monitor the time evolution of α and β values on a square lattice without and with punishment (shown in Fig. 3(a) and in Fig. 3(b), respectively). Figure 3(a) shows how the probability distribution of f(α, β) pairs evolve in time in the absence of punishment, when only imitation of deception profiles is possible. It can be observed that the small β values die out first, signalling that boast is most favored by natural selection. Later, when only large β values are present, those players become more successful who apply higher α values. As a result, the whole population will be trapped into a large (α, β) pair after sufficiently long relaxation (t = 100000 MC steps). In fact, once fixation occurs the evolutionary process stops. Here, f O and f B can then be determined by means of averaging over the final states that emerge from different initial conditions. As we conclude, a high α–high β combination survives when there exist only imitations, which is in accordance with our previous observations14. However, fixation never happens when sanction determines the evolution (see Fig. 3(b)). In the early stage almost half of the population is punished, hence low α–low β combinations will form the majority of f(α, β) distribution. Later, as time passes, a dynamic balance emerges between low α–low β combinations and a moderate α–high β pairs. The specific position of the latter depend on the actual values of δ and p parameters. In general, the punishment plays a “shunting” role here, undermining the stabilization of overconfidence and bluffing in the whole population. Importantly, these results hold for any homogenous networks besides square lattice. For strongly heterogeneous networks, sometimes more than one α − β pair can survive around strong hubs even without punishment, which is in agreement with related works where other player-specific profiles evolved41,48.

Figure 3 Time evolution of the α − β profile, as obtained on square lattices (a) without punishment (p = 0) and (b) with punishment (p = 1). From top left to bottom right we have presented the temporal distribution of (α, β) pairs at different MC steps, as indicated. The comparison illustrates that punishment undermines the fixation of overconfidence and bluffing. Other parameters: , (a) r/c = 3, δ = 0; (a) r/c = 3, δ = 1. Full size image

After realizing the significant impact of sanctions on the evolution of deception profile, next we are interested in the targets of such punishments. More precisely, we wonder whether the real inferiors’ deception profiles are minimized on homogeneous networks with different k values. For this reason we measure separately the average real capability of those players who are punished and those players who are not. The ratio of their averages is denoted by R ability . Similarly, we also measure the average payoff of the mentioned subclasses and their ratio is denoted by R payoff . These ratios are depicted in Fig. 4(a) for different random regular networks, where we gradually increase the degree k. Apparently, R ability < 1 indicates that on average, players having lower real γ abilities are punished more frequently. At the same time, R payoff < 1 values highlight that the mentioned small-γ group benefit less than their higher ability opponents. Increasing the degree of nodes, both R ability and R payoff raise unambiguously, showing that enhancement of connections narrows the real capability- and payoff-gap between the punished players and those who are not punished. In other words, punishment is directed principally towards who are really weak, but this selective impact is gradually weakened as each one has more neighbors. Furthermore, for a deeper insight, it is worth studying the influence of real capability on the evolution of overconfidence and bluffing. Note that the real ability γ of each player remains unchanged during updating. We mark by R O ([a, b]) the ratio of the average overconfidence level of those individuals whose γ values are in the [a, b] interval compared to the whole population. For simplicity, we divided the [0, γ max ] interval into 10 subclasses. Similarly, R B ([a, b]) denotes the ratio of bluffing level for the same subpopulation. Our results for k = 4 random regular graph are summarized in Fig. 4(b). The plot suggests clearly that both R O and R B ascend with γ and exceed the ratio 1 once γ > 0.5. Note that homogeneous networks with other k values show similar tendency. Thus we conclude that players with high ability are inclined to evolve to a higher state of both overconfidence and bluffing because they have a higher chance to collect resource without conflict. Furthermore, if conflict is inevitable and competitors should reveal their real abilities then the mentioned players still have a higher chance to win.

Figure 4 Some representative ratios plotted in histogram forms. Panel (a) depicts R ability and R payoff ratios as a function of degree k on regular random networks. Here R ability (R payoff ) represents the ratio of the average real capability (the average payoff) of those individuals who are punished to those who are not punished. Panel (b) depicts R O and R B as a function of real ability interval on regular random networks with k = 4. Here R O denotes the ratio of the average overconfidence level of individuals whose real capability fits in corresponding interval of f O . In a similar fashion, R B represents the ratio of the average bluffing level of individuals whose real capability belong into the corresponding interval of f B . Other parameters: p = 1, δ = 0.8, . Full size image

Lastly, it is instructive to investigate the impact of upper limits α max and γ max on the evolution of f O and f B values. By keeping γ max = β max = 1, α max > 1 means that excessive overconfidence intensity is allowed for competitors. γ max > 1, when α max = β max = 1, however, implies that real abilities of players are significantly higher compared to the changing α or β values. We note that β max > 1 is not taken into consideration, for extravagant boasting could be easily recognized from real facts. For appropriate comparison, f O is normalized, , when α max > 1 is applied. As demonstrated in Fig. 5(a), the possibility of sanctioning results in drastic reductions in the normalized overconfidence level as α max is increased. It suggests that punishment can effectively restrain excessive overconfidence, but is unable to decrease bluffing level significantly. However, without punishment (p = 0), raising α max gives rise to intensive conflicts that help competitors to recognize others’ real capabilities. Therefore, and f B monotonously decrease with α max and finally converge to 0.5, which equals to the initial value of average bluffing intensities. We stress that the results presented in Fig. 5(a) are robust and remain valid if we use other interaction topologies. Increasing γ max drives the evolution toward “neutral drift” because peer biases, such as overconfidence and bluffing, become second-order important in resource competitions when real abilities dominate. Importantly, however, f O and f B may fluctuate heavily in heterogeneous networks, showing that the existence of strong hubs might influence significantly the evolution both overconfidence and bluffing.