We collected data during focal follows (Altmann 1974) of mothers and dependent offspring orang-utans within the Sabangau peat-swamp forest in Borneo, Indonesia, at the Borneo Nature Foundation (BNF) research site in conjunction with the Centre for International Co-operation in Management of Tropical Peatland (CIMTROP). The study site is a 500-km2 protected area known as the Natural Laboratory of Peat Swamp Forest (NLPSF), which is managed by the University of Palangkaraya for the purposes of scientific research. A base camp is located at 2°19′S and 114°00′E, 20 km SW of Palangkaraya. Unlike most forests in Kalimantan, the Sabangau forest is not impacted by high levels of fragmentation, making it one of the largest continuous areas of peat-swamp forests left on the island (Morrogh-Bernard et al.2003). This forest supports the largest population of orang-utans in the world at a density of two or three individuals per km2 (Husson et al.2009; Morrogh-Bernard et al.2003; Singleton et al.2004; Wich et al.2008).

Subjects

We followed 16 orang-utans over the course of the study; these included 7 mother-dependent offspring pairs (Table I) and 2 older semiindependent offspring (Georgia, the maternal sibling of Gretel, and Isabella, the maternal sibling of Indy and Ima) that were occasionally encountered when interacting with the focal pairs. Following Rijksen (1978) and Morrogh-Bernard et al. (2002), we define age/sex groups as follows: infants (0–3 yr), juveniles (3–6.5 yr), adolescents (6.5–10 yr), adult females (females with young), unflanged males (adult), and flanged males (adult). Using these categories we included 7 adult females, 2 adolescents (both female), 6 juveniles (4 males and 2 females), and 1 female infant. Six of the adult females had home ranges within the study site at the NLSPSF. We first encountered the seventh female and her offspring in spring of 2016, and believe that she was new to the area because of potential displacement from the nearby forest fires that occurred in September and October of 2015.

Table I Focal orang-utan mother–offspring pairs followed Full size table

Data Collection

Standardized behavior and video data collection, based on the field data collection procedures by the Leakey Foundation Orang-utans Compared Workshop in San Anselmo, CA (Morrogh-Bernard et al.2002), started in May 2014. We collected data in May 2014–July 2016, yielding a total of 681 h of recorded footage. We followed orang-utan pairs from nest to nest with a team of two or three observers for a minimum of 5 days per month if encountered and not lost. One or two people recorded primary behavioral, proximity, and vocalization data on both mother and offspring, while the remaining observer took video recordings. The majority of video data recorded were from orang-utans in an arboreal context. All ground observations were at a minimum distance of 10 M. minimum distance for arboreal observations was 5 m, but 10–20 m was more typical. Where any orang-utan behavior appeared to be directed toward observers on approach or while moving to find an observation location, we stopped and/or increased our observation distance. Variation in observation distance and conditions impacts our ability to observe signals; for example, quieter vocalizations or subtle movements may be missed at greater distances. As a result, we employ ad libitum rather than continuous sampling of signals (Altmann 1974).

We recorded video and audio data using a Canon Powershot sx50 HS or Panasonic DMC FZ-1000 video camera and a Velbon up-400 monopod. The use of the built-in microphone on the video camera has significant limitations for the accurate collection of vocal signals, particularly within a noisy arboreal rainforest environment. As a result our vocal data are biased toward calls that were either louder or more acoustically distinct than the surrounding environment. Despite the limitations on the number and type of calls that could be coded, they still represented almost double the number of gestural signals recorded. (The coding of gestural signals typically excludes 20–40% of potential cases where they do not meet the criteria for intentional use; see, e.g., Genty et al.2009; Kersken et al.2018.) As a result we felt that it was important to include the vocal data to highlight the importance of vocal signals in orang-utan communication, but we are cautious in our analysis and interpretation of them..

Video Analysis

Following Genty et al. (2009), we scanned videos clips for “potentially communicative” episodes before coding. Essentially this meant we isolated any circumstance in which at least two individuals were present and at least one individual was not occupied in a solitary activity such as self-grooming or sleep, resulting in 52 h of footage (hours of footage per individual: range = 0.33–15.94, mean = 5.2 ± SD 4.5; see Electronic Supplementary Material [ESM] Table SI). We coded all vocal and gestural signals used to initiate social interaction. Facial gestures were included here; however, facial expressions could not be coded consistently given visibility in the arboreal habitat. We coded facial expressions ad libitum where possible but they were not included in subsequent analyses. Vocal signals originate from the mouth or throat, and can be altered by the use of hands or foreign objects such as leaves (Hardus et al.2009; van Schaik 2003). As specific calls used by orang-utans vary by location, known as “call cultures” (Wich et al.2012), we classified calls using a condensed compiled ethogram adapted from BNF protocols and previous studies of captive and wild orang-utans (Table II). We defined gestural signals as discrete, mechanically ineffective physical movements of the body observed during periods of intentional communication (Cartmill and Byrne 2010; Hobaiter and Byrne 2011a). Discrete movements have a clear start and end point, typically distinguished by a pause or change in speed or direction of movement (Kita et al.1997). We initially classified gestures following the repertoire described in Byrne et al. (2017; updated in Hobaiter and Byrne 2017), which included gestures previously seen in all four great apes in both captivity and in the wild. Example videos are available at http://www.greatapedictionary.com Given the high level of facial muscle control found in orang-utans (Caeiro et al.2013), we then extended the gesture list to include previously described orang-utan facial displays that were distinguished from facial expressions by the evidence for their intentional use (Cartmill 2008).

Table II Orang-utan vocalizations recorded Full size table

The exploration of intentional communication in either human or nonhuman primates is challenging, as it requires decoding a signaler’s intention: an invisible cognitive state, from the signaler’s observable behavior. The criteria for doing so were adapted from early explorations of language development in young children. Bates and colleagues (Bates et al.1975) distinguished illocutory acts, in which an infant employed a conventionalized signal toward a recognizable goal, from perlocutory acts, in which a signal changed a recipient’s behavior, but without any evidence that this effect was intended by the signaler. Tomasello and colleagues (Tomasello et al.1985) adapted Bates’ criteria for use with nonhuman apes, and subsequent studies of intentional communication in nonhuman animals have employed similar criteria.

We define intentional communication as including at least one of three criteria: 1) The signaler orients its body and gaze toward the recipient (Call and Tomasello 2007; Cartmill and Byrne 2010); 2) the signaler waits for a response from the recipient followed by repeating the gesture if the desired response is not obtained (Call and Tomasello 2007; Leavens et al.2005a; Tomasello et al.1994); and 3) in the absence of a response that in other cases is satisfactory, the signaler employs persistence toward a goal, such as modifying the gesture depending on recipient response, or lack thereof, or using the gesture in conjunction with other gestures or communicative behavior (Cartmill and Byrne 2007; Leavens et al. 2005; Tomasello et al.1994). We require each case of potential gesture use to meet at least one of these criteria to be considered a case of intentional gesture.

Coding of gaze direction is challenging in a natural setting, particularly from arboreal subjects. Following Hobaiter and Byrne (2011a, 2011b), we included an individual as directing its gaze toward a recipient where gaze was visible, or where head movements indicated that it was tracking recipient movements (in the way, for example, that gaze direction can be inferred while standing behind someone watching a tennis match, where the person’s head movements track the ball’s). Further details of the coding are provided in the text that follows.

Gestural signals are typically categorized by modality into three groups corresponding to silent-visual, audible, or contact (e.g., Hobaiter and Byrne 2011a). All gestures include a visual component, audible gestures always included an audible component as a result of the action (cf. silent-visual gestures that occasionally make “accidental” contact with a surface, such as an arm “Swing” gesture that contacts leaves), and contact gestures always make physical contact with the recipient and may also include an audible component. After video coding, but before analysis, we collapsed the categories of modality into visual and tactile. Visual included both silent and audible visible gestures. We combined these because reliably discriminating audible from silent-visual gestures was challenging in the arboreal habitat, with leaf and branch noises accompanying most movements. Although some gestures did appear to employ sound purposefully, such as “Stomp” and “Shake object,” we saw these in very low frequencies and therefore combined them with all other visual gestures.

For both the signaler and recipient we coded individual identity and age, the behavioral context immediately before and after signaling (Affiliating, Agonistic, Display, Feeding–individual, Feeding–food sharing, Grooming, Nesting, Nursing, Play–social, Play–solitary, Resting, Sex, Moving, Traveling, Other, Unknown; see Table SII for definitions), the estimated distance between the signaler and recipient (<1 m, 1–2 m, 2–3 m, 3–5 m, 5–10 m, >10 m; distances estimated using body size as a point of reference; Cant 1992; Oishi et al.2009). We recorded the state of the recipient’s visual attention at the time that the signal was initiated as attending (recipient had eye contact with the signaler or showed tracking of the signaler’s behavior through head or body movements); head in direction (recipient located in front of the signaler with the head in an arc of up to 45° in either side of the direction the signaler is facing); partial view (the recipient is in the signaler’s peripheral view, with the head at 45–90° to either side); out of sight (recipient is not in a position to see any physical movement made by the signaler); out of sight but in body contact (as out of sight, but recipient is in physical contact with the signaler). Signal combinations may be produced as a planned combination of signals, or because of the addition of another signal after the failure of an earlier signal (Genty and Byrne 2010; Hobaiter and Byrne 2011b; Liebal et al.2004b). We followed Hobaiter and Byrne (2011b) in distinguishing these two types of signal combination. Sequences are two or more signals that are overlapping or separated by <1 s. Bouts are two or more individual signals or sequences of signals that are produced with ≥1 s of response waiting between them.

Interobserver Reliability

We code video data across ape gestural studies employing the same methodology and coding protocol independently of the hypotheses tested or study population. AK and EH were trained by experienced gesture coder CH, and each then coded 55% of all video footage. We assessed reliability both between and within coders. We used an overlapping 10% of coded footage to assess interobserver reliability. We evaluated intraobserver reliability by coding a separate 7.5% of the total video footage twice, but ≥72 h apart. We selected videos for reliability testing using a random number generator, and measured the degree of concordance for specific coding categories between the ratings using both percentage agreement and Cohen’s κ (Altman 1991). The results of the inter- and intraobserver reliability testing showed 76–95% overlap and “moderate” to “very good” agreement for 10 of the 11 variables (Table III), suggesting that coefficients exceed chance for coded behavior (Bakeman and Gottman 1997; McHugh 2012). The interrater agreement on signaler persistence was 76% but achieved only a “weak” degree of agreement κ score (0.49).

Table III Results of interobserver and intraobserver reliability testing, by coding category Full size table

Analysis

In describing the repertoire of wild chimpanzees, Hobaiter and Byrne (2011a) required at least two instances of gesture use by an individual to include it in an individual repertoire, and use by at least two individuals to include it in possible species repertoires. However, as our dataset was relatively small and research has indicated that repertoire size is closely correlated to the quantity of data recorded in smaller dataset (Hobaiter and Byrne 2011a), we describe all potential gesture types used and provide the number of instances of gesture use. We calculated the number of gesture types identified relative to the number of gesture instances (an individual example of gesture use) coded for the total dataset, and individually for both the adult and offspring datasets. We graphed these for visual inspection to assess whether the repertoires reached asymptote. To address any effect of pseudoreplication from the use of ad libitum sampling, we converted data to means for each individual before analyses.

In analyses of signal choice we excluded any signals where the recipient’s attention state was unclear, as well as any signals apparently directed toward observers in order to restrict our analyses to signal use between orang-utans. We conducted analyses of signal choice with recipient attention state by fitting generalized linear mixed effect models (GLMM) using a binomial error distribution and logarithmic link function in RStudio 1.0.136 running R version 3.3.1 (2016-06-21). We fitted models using the lme4 package for R. We included only single signals, or the first signal in a rapid sequence (signals separated by 1 s or less; following Hobaiter and Byrne 2011b), in analyses of attention state, and excluded signals from an individual with fewer than five communicative interactions (communications). The GLMM included only intentional gestures and used gesture modality as the response to recipient attention. In addition, we included the social relationship (mother–infant, other; typically mother–infant), signaler age class (adult, immature), and signaler location (ground, tree) in the model as control effects. The GLMM included signaler identity (N = 12), recipient identity (N = 13), signaler context before communication (N = 16 levels; see Table SII), and recipient context before communication (N = 16 levels; see Table SII) as random effects. These factors have the potential to influence the choice of intentional signal; however, we have insufficient data to fully explore these in this analysis and the observations recorded represent a small and random sample of the possible levels in each factor. We include them as random effects in order to take into account their impact. We report the influence of recipient attentional state and the three control factors (Bolker et al.2008). We applied a likelihood ratio test using χ2 tests of independence to assess the potential correlation between the attention state of the recipient and the intentionality and modality of the following communicative signal.

To further quantify the use of gesture modality with recipient attention, we calculated the variation in usage (following Hobaiter and Byrne 2011a). First, we calculated the proportion of signal usage by modality across the complete corpus by individual. Next, we calculated the percentage deviation from this baseline use for each state of recipient attention with the following formula: Deviation = (β / α – 1) * 100, where β = portion of signals within each attention state and α = portion of signals in the overall corpus. We then analyzed the resulting deviations, indicative of adjustments made by the signaler based on recipient attention state, using planned t-tests, and reported them with the mean ± standard deviation.

We did not have sufficient cases of successful gesture use per individual to explore whether specific goals were associated with each gesture type. However, as research has shown that individual signaler identity did not impact signal meaning (Graham et al.2018; Hobaiter and Byrne 2014), we present a preliminary investigation here in which gesture use was combined across signalers. After a gesture was employed, the reaction that caused the signaler to stop signaling was deemed to be the apparently satisfactory outcome, or goal, of the gesture.

Data Availability

The datasets analysed during the current study are available in the figshare repository, https://figshare.com/articles/DATA_Orang-utan_Signalling/8132159.