The dynamic system represented by game Rock, Paper, Scissors (RPS) is both a physical reality in the animal world (specifically amongst three species of side-blotched lizards1) and serves as an important paradigm for assessing the degree of rational decision making inherent within non-cooperative environments across species (e.g.2,3). In a typical RPS game, participants reveal a three-alternative choice at the same time: Rock, Paper or Scissors. The winner (and loser) is calculated by the rule Rock wins over Scissors (the Scissors are ‘blunted’), Paper wins over Rock (the Rock is ‘covered’) and Scissors wins over Paper (the Paper is ‘cut’). The specific relationships between elements (non-transitive dominance relations4) dictate a crucial aspect of the game space, in that there is no singular strategy that guarantees success (or evolutionarily stable strategy; Maynard-Smith & Price, 1973, cited in3). As such, when it is played in a potentially infinitely recursive manner, the various responses may enjoy periods of temporary dominance and the lack of a definitive strategy becomes particularly apparent when children can often serve as formidable opponents to adults5.

One aspect of the game that has received particular interest results from the unique Nash equilibrium of RPS5, where “an equilibrium point is a pair of strategies that are best replies to each other, a best reply being a strategy that maximizes a player’s payoff, given the strategy chosen by the other player” (3, p. 140). During RPS, players should adopt a mixed-strategy equilibrium wherein multiple items are played stochastically2. In other words, each of the three items should be played with random distribution but equal probability (33.33%). The disadvantages of not following this strategy are made clear by6, who show that if a computer opponent plays one item more often than another (e.g., Rock) then human participants will play the appropriate counter-item with increased frequency (e.g., Paper). Under this scheme, the computer opponent would be said to be playing a strategy that could be dominated and serves as an example of irrational (not to mention evolutionarily unsound3) decision making. Therefore, precisely constructed game environments such as RPS supply researchers with a baseline against which deviations from rational decision-making may be observed and predicted. As7 (p. 55) state: “equilibrium strategies specified by game theory provide a precise yardstick to quantify how human behaviors deviate from the normative predictions for rational players. One can therefore try to identify factors responsible for the discrepancy between the normative predictions and observed behaviors”.

The first indication of irrationality in RPS comes from the observation that specific items in game systems may be naturally favoured. Decision making can be influenced by saliency8, where primary salience refers to selecting responses that more readily come to mind (c.f., availability heuristic9) and secondary salience refers to adopting a strategy whereby one assume the opponent is operating on the basis of primary salience. Evidence of primary salience in RPS comes from10 who reported that across 300 rounds, participants selected Rock 35.66%, Paper 32.12% and Scissors 32.23% of the time and a similar bias for Rock was also reported by4 with Rock 36%, Paper 33% and Scissors 32%. Therefore, there might be a simple influence at the item level that draws individuals away from adhering to a mixed-strategy equilibrium and, ultimately, rationality3. Similarly, secondary salience might also contribute to RPS performance according to the observations of6: my opponent is predominantly playing Rock so I will predominantly play Paper. However, such strategies cannot be evolutionarily stable due to the recursive nature of the game: eventually an opponent may eventually adjust their strategy according to your overplaying of Paper.

Ref. 4 also showed that participants implement an apparently successful rule-based strategy of if I win then I stay with my current item, if I lose then I shift to a new item (‘win-stay, lose-shift’; see also7, for a similar strategy in monkeys). In contrast to the memorial and cognitive demands of the mixed-strategy equilibrium (frequency counting three-alternative choices across hundreds of rounds to ensure that each item is played 33.33% of the time), tendencies to adopt rules like ‘win-stay, lose-shift’ become “psychologically plausible for human subjects with bounded rationality” 4(p. 5). Principles such as Thorndike’s Law of Effect (Thorndike, 1911, cited in11) and the matching law (Hernstein, 1961, cited in6), where the proportion of response matches the degree of reinforcement, seem to fit well with an account where participants maintain their current course of action in the light of success (‘win-stay’) but change their current course of action in the light of failure (‘lose-shift’; see also12, for a similar criterion of progress in the context of problem solving). Although necessary on the basis of limited human cognition, predictable consequences as a function of winning (reinforcement; stay) or losing (punishment; switch) also run the risk of being dominated. Furthermore, neural activity associated with reinforcement and punishment in the context of RPS trial outcome are found throughout the cortex and to a much larger degree than previously thought, with additional specific areas distinguishing between win and loss (accumbens, caudial ACC and transverse temporal region) and also between stay and switch (medial frontal cortex and caudate13). The cortex-encompassing activity associated with trial outcome has the required distributed nature to impact on numerous cognitive processes.

However, switch heuristics such as win-stay lose-shift remain underspecified at the level of item selection and two additional categories of response change suggest themselves (after4). First, participants may choose to downgrade their response across trials, defined as selecting the item in trial n + 1 that would have been beaten by their item at trial n (e.g., Rock followed by Scissors; also ‘descending’14 or ‘left-shift’15). Alternatively, participants may choose to upgrade their response across trials, defined as selecting the item in trial n + 1 that would have beaten their item at trial n (e.g., Rock followed by Paper; also ‘ascending’14 or ‘right-shift’15). Due to the cyclical nature of the relationships between items (see Fig. 1a) it would be possible to repeat the strategy of upgrading or downgrading across multiple consecutive trials. Of additional importance are the cognitive implications of draw trials, which are also currently underspecified. It may seem reasonable to assume that draw trials should be less arousing than win or loss trials and so have less on an impact on subsequent performance, but a recent study provided no evidence that this was the case13.

Figure 1 (a) Schematic showing the cyclical nature of upgrading or downgrading responses in Rock, Paper, Scissors, (b) Graph showing strategy at trial n + 1 as a function of item selection at trial n, (c) Graph showing strategy at trial n + 1 as a function of outcome of trial n, (d) Graph showing the strategy adopted between trial n + 1 and n + 2 as a function of the strategy adopted between trial n and n + 1. Full size image

To further investigate the heuristics underlying RPS performance, human participants played 225 rounds of RPS with a computer opponent operating according to the mixed-strategy equilibrium. Human participants were not made aware of the specific strategy of the computer at the time of testing, given that the absence of instruction is important in understanding real-world decision making where information also tends to be incomplete6. Response proportions across consecutive trials were examined in terms of the item selected at trial n (Rock, Paper, Scissors), the outcome at trial n (win, lose, draw) and the strategy subsequently deployed at trial n + 1 relative to n (stay, upgrade, downgrade). The distribution of responses across three trials were also examined in terms of the strategy deployed between trial n and n + 1 (after4) and between trial n + 1 and n + 2 (stay, upgrade, downgrade). Any interactions revealed between these levels would undermine the view of human decision making as rational and, importantly, define the item-based and outcome-based conditions under which such violations could be predicted.