A total of 34 great apes participated in this study: twenty-three chimpanzees (mean age = 20 years, 3 months; range = 5 years, 6 months to 48 years, 0 months; 16 females, seven males), five bonobos (mean age = 18 years, 11 months; range = 9 years, 0 months to 31 years, 2 months; three females, two males), and six orangutans (mean age = 19 years, 3 months; range = 4 years, 8 months to 33 years, 5 months; five females, one male). Twenty of these apes also participated in the Krupenye et al. [ 18 ] study, but the current studies were conducted before that one. The apes were housed in social groups at the Leipzig Zoo, Germany. Six additional chimpanzees (five females, one male) and two bonobos (both females) participated in the initial training but had to be dropped from the study because they failed the training criterion (see description below). One additional bonobo (male) was trained but had to be dropped because he could not be separated from the group for testing.

The apes were housed socially, separated by species, in groups of at least six individuals at the Wolfgang Köhler Primate Research Center (WKPRC) in the Leipzig Zoo, Germany. Each group of apes has access to an indoor area (between 230 and 430 m 2 ) and an outdoor area (between 1680 and 4000 m 2 ) furnished with various climbing structures, shelter and natural vegetation. At night, the apes sleep in several series of cages (between 40 and 50 m 2 ). In addition to experiments, the apes are provided with a special enrichment program, including various kinds of tools and foraging containers. Several times per day, the apes are fed a diet consisting primarily of vegetables, fruits, and cereals, with regular additions of eggs and meat (depending on the species). Test sessions took place in the participants’ sleeping enclosure (see above for sizes). The participants were used to being separated in adjacent enclosures from their group members for testing. They were not food deprived for testing, and water was available throughout all test periods. They were not distressed and were free to stop participating at any time. No animals were sacrificed, and German law on animal rights and ASAB/ABS guidelines [ 20 ] were followed throughout. An internal WKPRC ethics committee, consisting of Zoo Leipzig personnel and MPI-EVA academic research staff, approved the studies. IRB approval was not necessary because Germany requires no special permission for the use of animals in purely behavioral or observational studies (TierSchGes §7 and §8).

Right before the response phase, the experimenter tries to open the box which he believes contains his object (false-belief condition) or which he knows is empty (true-belief condition). In both conditions this is the same empty box; all that differs is the experimenter’s belief about what is in it.

Materials for this study were two plastic boxes (30 x 22.5 x 20 cm, one yellow, one blue; otherwise identical), each with a hinged lid, a handle on top, and a plexiglass front. The lids of the boxes could be locked and unlocked by moving a bolt horizontally on the front, see S1 Fig . When the lid was open, the locking mechanism was not visible from the top because it was covered (so the experimenter could not see it during the test). Both boxes were put on sliding tables opposite each other which were attached to the outside of participants’ enclosure, so that they could be moved into and out of the participants’ reach, see Fig 1 . The plexiglass front was visible to participants when they moved to either side of the testing booth during the response period (i.e., not, in contrast, while being centered right before test, as in Fig 1 ) whereas the assistant and the experimenter could only see the top and the back side of the boxes, and thus not their contents. In one type of training trials (Training on empty boxes) these boxes were empty. In the other type of training trials (Training on boxes containing an object) a bunch of four keys was put into the boxes. At test, instead of the keys a novel object (a small orange box, 10cm x 7cm x 7cm, filled with stones that made noise when shaken) was put into the boxes.

Design and procedure.

Apes participated in three rounds of testing, each of which consisted of training on empty boxes, training on boxes containing an object, and two test trials. The delay between the training on empty boxes and the training on boxes containing an object was one to four weeks in each round. The two test trials were run on two consecutive days immediately after the training on boxes containing an object. Participants were randomly assigned to one of the two conditions (see below) for all test trials (between-subjects design, n = 17 for each condition). The physical set-up was identical in all training and test trials (except for the object that was placed in the boxes). The training trials were identical for all participants independent of condition.

Training on empty boxes. These training trials were conducted to make sure that the participants were able and motivated to unlock empty boxes. At the beginning both boxes were empty and unlocked. The assistant sat on a stool centered behind the two boxes and called participants by name. While they watched, she first opened the lid of the box on the left. She then closed the lid and locked it by moving the bolt horizontally. To demonstrate that the box was locked she pulled on the handle on top of the lid (three short pulls) and subsequently pushed the box into participants’ reach. Participants could unlock the box by moving the bolt in the other direction. When participants unlocked the box, the assistant pulled back the box, opened the lid, and made a pleasant vocalisation (“Mm hm”). Participants were given a grape from the assistant’s chest pocket and the trial was over. This procedure was repeated four times for a total of five training trials on this box. The assistant then repeated the same procedure with the box on the right.

In the rare case that participants did not unlock the box within 10 seconds, the assistant pulled the box out of participants’ reach, pulled the handle again (three short pulls), and pushed the box back into participants’ reach. If necessary, this was repeated two more times, each after 10 seconds. If participants did not unlock the box in any of these four attempts the assistant pulled back the box, unlocked it, and opened the lid with a neutral face, and this training trial ended. The assistant then proceeded with the next training trial. In order to pass this training session, participants needed to unlock a box at least once in the ten training trials; otherwise they were excluded from the study.

Training on boxes containing an object. The procedure of these training trials was similar to that of the training on empty boxes except that the assistant now put a bunch of keys into the box each time before she closed the lid. After participants unlocked the box, she opened the lid, took the keys out, made a pleasant vocalisation (“Mm hm”), gave participants a grape, and proceeded with the next trial. Participants received three consecutive trials on each side. After this training all participants had unlocked a maximum of ten empty boxes and six boxes that contained an object (and a minimum of one each).

Test. The basic procedure of the test followed that of Buttelmann, Carpenter, and Tomasello [15] with human infants. First, participants received two more training trials with boxes containing an object on each side, as in the Buttelmann et al. study (and to give them an equal number of training trials with and without objects in the box). After this, the assistant ensured that both boxes were unlocked and left the scene (i.e., went back behind the stool and turned her back). The experimenter (E) entered the room, sat down centered behind the two boxes (with the side of the yellow box counterbalanced across participants), and showed the novel object (the small orange box) to participants. E played with it for a little while to ensure that the object was equally enhanced for all participants. Then E opened the lid of one box (counterbalanced), put the object into it, said and waved goodbye to the participants and the assistant, and left the room. For all of these actions, E commented to himself about what he was doing to keep the procedure similar to the original test with human infants. What followed differed between conditions:

In the true-belief condition the assistant returned and stood centered behind the two boxes. E also returned and stood behind the assistant so that he could witness what happened next. The assistant called both participants and E by their names and commented on all of her subsequent actions. She opened the lid of the box that contained the object, took it out of the box, closed the lid, and locked the box. Then she opened the lid of the other box, put the object in, closed the lid, and locked that box. E expressed that he was paying attention to the whole process by saying “Aha” whenever a lid was lifted and the object was taken out of or put into a box. He always turned his back when the assistant locked a box so that later, at test, it would be plausible that he did not know how a locked box could be unlocked. After that the assistant centered participants with food (e.g., raisins, grapes, or food pellets) and left the scene.

The procedure of the false-belief condition was similar to that of the true-belief condition with the exception that E stayed outside until the assistant had transferred the object to the other box and, therefore, did not witness the switch. In this condition, the assistant behaved “sneakily,” and looked furtively at the door to the room whenever a lid was opened and the object was taken out of or put into a box to ensure that E was not watching. Then the assistant centered participants, E re-entered the room, and the assistant left the scene. In both conditions the assistant ensured that participants witnessed her actions throughout the procedure and paused if participants became distracted.

What followed was identical in both conditions: E stood behind the empty box (the one he had originally put the object in) and unsuccessfully tried to open it by pulling the handle (three short pulls), showing effort. He then showed a helpless gesture (i.e., raised his shoulders, held up both hands, palms up, made an uncertain facial expression, and said uncertainly, ‘‘Hm”), still leaning towards the empty box. He then sat down on the stool, called participants by name (“[Name], look!”), pushed both boxes into participants’ reach simultaneously, and bent his head down, centered between the two boxes, so that he could not inadvertently provide any gaze or facial cues for participants. As soon as the boxes were in participants’ reach, it was their turn to unlock a box. If participants unlocked a box, E pulled back both boxes, opened the lid of the box participants had unlocked, and looked into it, saying something like “That is how the box opens” or “Oh, the object is in here now.” He then left the room without ever touching the object in the box. Finally, independent of which box was unlocked by participants, the assistant returned and rewarded them with a grape from her pocket in the center. Participants received another two training trials and another test trial on the following day. Since the whole training and testing procedure was repeated twice, participants received three rounds of testing, each consisting of the training sessions and two days of testing (i.e., each with two training trials and a test trial), for a total of six test trials.

During test trials, in order to ensure that participants 1) watched E’s attempt to open a box and 2) started in their choice of which box to unlock from a centered position, we required that they be centered whenever E was pulling on the handle of the empty box and when he pushed both boxes into participants’ reach. If participants left the center area after the pulling, E stopped and stepped back. Participants were re-centered by the assistant, and E pulled the handle again. If participants did not choose a box within ten seconds, E pulled back the boxes, looked at the participants and called them by name, showed the helpless gesture, and pushed both boxes back into the participants’ reach. This was repeated each time ten seconds passed without a choice by participants for a maximum of seven such ten-second periods. In the rare cases that participants did not choose at all, the trial was counted as “no choice” (blank, see S1 Table). If participants chose a box (i.e., they touched the bolt) but did not successfully unlock the lid, E pulled both boxes back and tried to open the chosen box. Since the box was still locked (e.g., the bolt was not moved far enough by the participants), E showed the helpless gesture again, looking at the participants, and pushed the boxes back into the participants’ reach to give them another chance to be successful. Importantly, E did not look at either box after he had pulled the handle of the empty box until the trial was over. The only exception was when participants chose a box without successfully unlocking it, and E subsequently tried to open it (see above). Note that at this point of the response period participants had already chosen a box by touching the locking mechanism, which was the main measure (as in Buttelmann et al.’s [15] study).

The response period, starting from when E first pulled on the box, was 90 seconds (as in [15]). During this time, participants chose a box in 88.2% of the trials. If participants could not be centered at all and E could not pull on the empty box the trial was stopped after 3.5 minutes and re-conducted either later on the same day or the next day. If centering was also impossible during the second attempt or if the second attempt was not possible because of the testing schedule (i.e., no more testing time was available for that round) the trial was counted as “no choice” (blank).

We coded which box participants touched first. The touch had to be directed at the locking mechanism, and participants did not need to unlock it, although they did do this in most cases. Thus, only first response data for each trial were analyzed. To assess inter-rater reliability, a naïve coder independently coded 100% of trials blind to condition. Perfect agreement was achieved, Cohen’s Kappa = 1.00. All p values reported are two-tailed.