Some readers might believe that we are over-stating the case; after all, is not there nearly a scientific consensus that apes have difficulty following human cues, such as gaze and pointing, to locations of hidden food? Table 1 lists a number of representative studies that directly compared apes with human children, and all reported an advantage to humans in the cognitive capabilities allegedly under test. This tabulation reveals that testing environments, pre-experimental task-relevant preparation, sampling protocols, testing protocols, and age at testing were all systematically confounded with species classification, exactly paralleling the pervasive deficiencies of the intelligence testing protocols of a century ago, as identified by Gould (1981). In all of the studies in Table 1, the apes were tested in cages, whereas the humans were not tested in cages—there were, thus, systematic differences in testing environments, and none of these studies made any apparent attempt to match testing environments across the groups. To accept the reported findings at face value, a reader must assume that engaging participants through cage bars or cage mesh has no effect on performance, an assumption that is unwarranted (see Kirchhofer et al. 2012, for evidence of the suppressive effect of physical barriers on performance in dogs, Canis familiaris).

Table 1 Representative claims of evolutionarily based human uniqueness in social cognition based on direct ape–human comparisons are confounded with systematic group differences in testing environment, task preparation, sampling protocols, testing procedures, and/or age of subjects at testing Full size table

In all of the studies listed in Table 1, institutionalized apes were compared with non-institutionalized human children. The vast majority of the apes involved were isolated from early intensive exposure to human nonverbal conventions of give and take and of daily exposure to nonverbal reference to entities, whereas none of the human children were so isolated. Hence, the human children had had extensive task-relevant preparation when challenged to use human nonverbal cues, such as ostensive gaze or pointing gestures, for example, to find hidden objects. Yet the researchers cited in Table 1 universally concluded that the humans’ superior performances were attributable to their evolutionary, and not their developmental histories. Even when the apes outperformed humans, as in Povinelli et al. (1999), the scientists interpreted their chimpanzees’ (Pan troglodytes) superior performance as evidence for the animals’ inferior understanding of visual attention. In some of these studies, we find a few individual apes who had been enculturated from an early age. For example, in Tomasello et al. (1997), two of the apes, Chantek, an orangutan (Pongo pygmaeus), and Erika, a chimpanzee, had been raised in human cultural environments, whereas the remaining apes in that study were institutionalized from birth. When computing the average performance of the apes with the human children (two-and-a-half to 3 years old), the authors found a statistically significant difference between humans and apes in using a pointing gesture to find hidden objects, favoring the humans. But when Leavens (2014) compared the children with the only two apes who had had commensurate task-relevant preparation, he found that the apes performed comparably. Thus, as Leavens (2014) noted, there is a systematic and methodologically problematic tendency to artificially suppress the results of non-humans by averaging performance data between (a) apes that have had significant task-relevant preparation (enculturated apes) and (b) apes that have been denied this preparation (institutionalized or sanctuary-housed apes). Thus, in order to accept these reports that apes have difficulty understanding deictic gestures or producing deictic gestures, a reader must assume that experience with the daily use of deictic gestures is not relevant to performance in understanding or producing deictic gestures—an unwarranted assumption (Leavens 2006; Leavens et al. 2008; Lyn 2010; Lyn et al. 2010).

A related and systematic confound with species classification is sampling protocol. It is rudimentary that failure to match sampling protocols introduces a confound into any group comparison. Thus, for an example, Kirchhofer et al. (2012) compared pet dogs with chimpanzees in their understanding of experimenters’ pointing gestures—the specific task was that the subjects were to fetch the objects to which experimenters pointed. As noted by Hopkins et al. (2013), the dogs were recruited through advertisements, introducing a self-selection procedure for the dog owners, whereas the apes were opportunistically sampled from a zoo and from a sanctuary. Although Kirchhofer et al. interpreted the superior performance by the dogs as evidence for the effects of artificial selection (breeding), in fact, it is ambiguous whether the selective histories or the different sampling protocols account for the group differences observed. For another example, Liszkowski et al. (2009) selected only human infants who had demonstrated prior use of pointing, but did not apply this same selection criterion to their ape subjects. Similarly, with reference to the studies listed in Table 1, where apes were compared to humans, the apes were always opportunistically sampled from captive populations, whereas the children were recruited, primarily, through advertisements. It is categorically ambiguous, therefore, whether the group differences reported in the papers listed in Table 1 are attributable to systematic differences in evolutionary histories or to differences in sampling protocols.

With respect to test procedures, none of the studies listed in Table 1 administered the same procedures to the apes and to the humans. For example, Povinelli and Eddy (1996) were unable to teach human two- to seven-year-old children to point to experimenters (p 109, fn 6),Footnote 1 so instead of requiring the same gestural response from the children and the apes that were compared in that study, the authors required the children to indicate their choice of experimenter by placing their hands on a handprint provided for them. Here, the experimenters were unable to elicit pointing gestures from human children, but claimed cognitive superiority for these same children, compared with apes who pointed to an experimenter through a hole in a plexiglas barrier. In that study, it is ambiguous whether the children outperformed the apes because of their alleged cognitive superiority (as Povinelli and Eddy claimed) or because the experimenters administered an easier task to the human children. For another example, van der Goot et al. (2014) measured whether apes and human children locomoted to the closest possible proximity to unreachable toys (human children) or food (mostly adult apes), but the humans were tested at distances of less than 2 m between themselves and the unreachable toys, whereas the apes were (inexplicably) presented with unreachable food at a distance of 6 m. They found that roughly half of the children stayed in situ and pointed to the toys without locomoting to proximity with the toys, whereas none of the 10 apes gestured to food without first traversing the 6 m to the closest proximity with the food before gesturing. van der Goot and her colleagues interpreted this group difference as evidence for a uniquely human capacity to discern a state of psychological common ground, but because two different procedures were administered to the two groups (apes and human children), it is unclear whether the apes were more likely than the children to move to proximity with the unreachable entities because of their evolutionary histories or because of some or all of the many systematic procedural differences. In an observational study, Leavens et al. (2015) found that when they presented unreachable food to 166 chimpanzees at distances approximating those used with human infants in van der Goot et al., then like the children in van der Goot et al., approximately half of the apes communicated from a distance, and half moved to proximity with the food before signaling about it (note that approximately matching just one procedural feature, distance, led to statistically indistinguishable response profiles between humans and apes, despite many procedural differences between Leavens et al. 2015 and van der Goot et al. 2014). Thus, all direct ape–human comparisons that have reported human superiority in cognitive function have universally failed to match the groups on testing environment, test preparation, sampling protocols, and test procedures, including those that tested subjects’ comprehension and production of communicative gestures (Table 1), although we provide only a few examples, here.

Moreover, as repeatedly noted by Bard and her colleagues (e.g., Bard and Leavens 2014; Bard et al. 2014a), none of these studies matched the apes with the humans on age at testing (Table 2); indeed, in only one of these studies, that by Povinelli and Eddy (1996), was there even any overlap in age between the apes and the humans. For example, Liszkowski et al. (2009) compared 12-month-old human children with apes that were, on average, 19 years old, reporting that humans, but not apes, communicated about absent entities. van der Goot et al. (2014) compared 12-month-old human children with apes that were, on average, nearly 18 years old, concluding that humans, but not apes, communicate with gestures from a distance. Again, it is ambiguous whether the group differences reported by these authors cited in Table 1 are attributable to differences in evolutionary histories, as the authors claimed, or to the systematic differences in life history stage at which these subjects were tested—not one of these studies validated their protocols on humans that were age-matched to the apes, again, with the possible exception of Povinelli and Eddy (1996).

Table 2 Age differences at time of testing are confounded with species classifications in direct ape–human comparisons: representative studies Full size table

These studies (Table 1) failed to control for systematic group differences in environment, task preparation, sampling protocols, testing procedures, and age, yet these researchers not only concluded, often implicitly, that these confounds were irrelevant, through asserting that species classification (i.e., evolutionary history) was the only relevant factor, they also managed to convince a number of reviewers and editors that these confounds were not relevant to their findings of group differences. The journals in which the papers listed in Table 1 were not, by and large, obscure journals: they included Science, Psychological Science, and Child Development—prestigious journals with large international readerships. Thus, manifestly, reviewers and editors at some of the most influential scientific journals believed that these researchers had identified an influence of evolutionary history on the cognitive underpinnings of, among other things, apes’ and humans’ understanding and production of communicative gestures. Yet, when we cursorily examine some of the uncontrolled variables in these studies, we find that not one of these papers has isolated evolutionary history as the singular factor in the group performance differences reported in these papers. This pervasive collapse in experimental and interpretive rigor is not unprecedented, as Gould (1981) so elegantly noted in relation to the virtually unquestioned assumption, 100 years ago, that northern Europeans had the highest average intellectual capacity in our species. This tacit understanding seemed to have the effect of “blinding” researchers to the multitudinous confounds that existed alongside their racial group classifications. Our reading of the contemporary literature on comparative social cognition leads us to assert that there are similarly numerous and universal confounds of method with species classifications. It is our contention, here, that there is no methodologically sound report of an essential difference between apes and humans in their abilities to use or comprehend simple gestural cues, due to the systematic confounds listed in Table 1. This is not to claim that there could not be a such a demonstration in the future, but it seems clear from the many uncritical citations of alleged ape–human differences in the ability to use and comprehend simple deictic gestures, like overt gaze and pointing, that many contemporary researchers have abandoned any critical evaluation of these empirically unfounded claims. It seems possible, in view of the chasm that exists between evidence and belief that we document here, that there may be a deep, yet unwarranted commitment to the ideas (a) that comprehension and production of pointing, understanding of visual attention, understanding common ground, or discrimination of false belief require sophisticated reasoning abilities and (b) that humans uniquely possess these hypothetical reasoning abilities.