Humans excel in their ability to cooperate among unrelated individuals1 but it is clear that there is also enormous variation between other species with respect to their cooperative tendencies. A recent major research focus is to understand the mechanistic basis of cooperative behaviour, particularly the cognitive and physiological processes underlying decision-making2,3,4. Vertebrate brain structures involved in social decision-making are highly conserved. Most importantly, all vertebrates have a so-called social decision-making network, which consists of the social behaviour network and the mesolimbic reward system5,6. This network appears to be highly sensitive to the dopaminergic system5,7, making dopamine a prime candidate for the modulation of cooperative behaviour.

Dopamine (DA) is a neurotransmitter involved in a variety of neurochemical and neurohormonal actions that affect and modulate animal behaviour and cognition4,6. Dopamine is involved in reward and risk assessment, behaviour reinforcement8,9 and anticipatory responses to reward-associated stimuli8 as its release signals the outcome of an action as appetitive or aversive10,11. Thus, DA is key to associative learning12. First, DA signals the delivery of an unexpected outcome (reward or punishment) which is usually preceded or paired with specific stimuli13. Later, through repeated encounters, individuals learn to associate the outcome with the preceding stimuli and the dopaminergic response progressively shifts to this earlier event-predicting stimuli rather than responding to the outcome itself11,14,15,16. This gradually enables animals to anticipate outcomes in current interactions by recalling previously learned associations, which results in appropriate decision-making17. Moreover, DA signalling suffers a depression (DA transmission decreases momentarily) whenever the event happens contrarily to the prediction and the expected outcome fails to occur18. This decrease may elicit a distinct behavioural response: for example, in humans, the omission of an expected reward can lead to emotional distress19, while in other mammals, birds and teleost fish it may induce aggressive behaviour20,21,22. Nevertheless, signalling environmental changes is key for learning and decision-making, as these allow for an evaluation of the behavioural adjustments needed in order to achieve the expected outcome once again23. As such, anticipation is crucial for deciding between different courses of action available18, as different options entail uncertain final outcomes. A prime context in which correct anticipation is crucial is cooperation between unrelated individuals that is based on investments. The classic theoretic game model to describe such cooperation is the iterated prisoner’s dilemma24. In this 2-player game, mutual cooperation yields higher payoffs than mutual defection but defecting yields a higher payoff than cooperating, independently of the partner’s action. Thus, there are incentives both to cooperate and to defect and an individual’s best decision will depend on the partner’s previous strategy25. Similar conflicting incentives exist in many other potentially cooperative interactions26. A good example is the marine cleaning mutualism involving the Indo-Pacific bluestreak cleaner wrasse Labroides dimidiatus. As summarised elsewhere27, these territorial cleaner fish remove ectoparasites from visiting ‘client’ reef fish. Interactions are best described as a repeated game; clients are estimated to visit cleaning stations typically 5–30 times per day, with maximal estimates above 100 visits28. A conflict of interest exists because cleaners prefer to eat client mucus, which constitutes cheating as it is detrimental to the client. Cheating is visible to the human observer through clients performing body jolts in response to cleaner wrasse mouth contacts27. As a consequence of cleaner wrasse food preferences, clients have to make cleaners feed against their preference to obtain a good service. How this is achieved depends on the clients’ strategic options in this repeated game. For predatory clients, the mere threat of reciprocation (trying to eat a cheating cleaner) is apparently enough to cause high service quality, while non-predatory client species either punish cleaners through aggressive chasing or leave and switch to a different cleaner for their next inspection, which constitutes the threat of departure27,29. In response, cleaners flexibly adjust their cheating frequency to a variety of parameters, which include client’s control mechanisms, the presence of bystanders, the presence of a co-inspecting cleaner partner, the client’s value as a food source and also the cleaner’s own physiological state27,30,31,32. Furthermore, cleaners can improve their service quality by providing a form of physical contact (known as tactile stimulation or massages) to clients, touching them with their pectoral and (especially) pelvic fins. Cleaners use tactile stimulation in a variety of contexts but usually when the outcome of the interaction is not certain: to build relationships with new clients, to reconcile after a cheating event, to prolong interactions with clients about to leave and as a pre-conflict management strategy with predators33,34. Clients apparently benefit from receiving tactile stimulation as it lowers baseline and acute stress levels (i.e. cortisol levels35). Thus, in marine cleaning mutualisms, two elements of behavioural negotiation are used by partners to resolve the conflict over cooperative payoffs: a) the use of threats (reciprocity or departure) and b) the use of tactile stimulation to encourage clients to stay at cleaning stations as a type of negotiation29. Overall, game theory has successfully been used to predict and explain partner control mechanisms in this system36. Regarding cleaner wrasses’ behavioural adjustments, game models should consider how physiological constraints (for example, the existence of stressed cleaners32) may limit the expression of some of these decision rules.

Here, we aimed to investigate the relevance of the dopaminergic system for the cleaners’ service quality during cleaning interactions and how these individuals respond to changes of perception elicited by DA level shifts. Only a few studies have examined the role of the DA system on the modulation of fish behaviour, mostly on locomotor activity37, brain responses to light and hydrostatic pressure38, feeding behaviour39, coping with unpredictability40, learning and nicotine41, gene expression and neuroendocrine signalling42,43,44,45 and learning performance in a cooperative context46. Only some of the above-cited studies employed drugs aimed at the Dopamine D1 and D2 receptors, that were previously developed for mammals, which were successfully used in fish to test for putative effects on behaviour or gene expression37,46. For example, in cichlids, the effects caused by the use of a non-selective DA agonist that activates both D1 and D2 receptors on locomotor activity were blocked by the D1 antagonist (SCH-23390) but not by the D2 antagonist (metoclopramide). Also, the effects of several D1 and D2 related drugs produced distinct neuroendocrine and brain expression responses42,43,44,45. Using cleaners, Messias and colleagues46 showed that there is a direct involvement of the D1 receptor pathways on their natural ability to learn. As in these previous studies, we exogenously administered a D1 receptor agonist (D1a - SKF38393), an antagonist (D1an - SCH23390), a D2 receptor agonist (D2a - Quinpirole) and an antagonist (D2an - Metoclopramide), as well as a control (saline) to female cleaner wrasses in situ. As this mutualistic system occurs in a biological market27,47, efficient dopaminergic transmission could play a role in the modulation of cleaners’ willingness to negotiate with clients over the occurrence and duration of interactions as well as cleaners’ willingness to cooperate rather than cheating27. High increases in DA transmission via administration of agonists are connected with pathological gambling48 and excessive risk-taking49. Hence, we predict D1a and D2a to decrease cleaners’ cooperative investment levels and increase their cheating frequencies. Since D2 receptors can also be found pre-synaptically (i.e. as auto-receptors) in some areas of the brain, it is also possible that D2 stimulation leads to risk-avoidance behaviour by overstimulating the pre-synaptic receptors50. Similarly, DA receptor blockade induces risk-avoidance behaviour through an increase in sensitivity to negative stimuli49,51,52. We thus also predict that DA antagonists would cause cleaners to seek clients to clean more and provide more tactile stimulation to entice clients to stay longer, with the possibility that blocking the D2 autoreceptors might lead to abnormal DA transmission. Regarding cheating by cleaners, a perceived reduction in the probability of expected outcomes would mean a reduced ability to maintain the interaction with clients and a lower likelihood to obtain food. Such perception may, either lead to an extension of negotiation, where high rates of tactile stimulation lead to a reduction of cheating, or to just abandoning negotiation – with cleaners foraging as much and as quickly possible, which would mean immediate cheating by feeding on clients’ mucus53.