Study site, subjects and housing

From 17 September to 28 November 2012, and from 24 August to 28 October 2013, we non-selectively trapped NC crows with meat-baited whoosh nets in our farmland study site in Gouaro-Déva, on the central west coast of New Caledonia, South Pacific. Birds were sexed based on morphology (males are larger than females [61]) and aged based on gape colouration (as in other corvids, gape colour in NC crows changes over time from pink, through mottled grey, to black [49, 62]). Two of the subjects (CS9 and ER4) were trapped in two consecutive years (2012 and 2013) and retained a similar level of gape colouration, which indicates some uncertainty in the ageing method (which may be due to social dominance effects – see [62]). This said, gape colouration provides a useful proxy of the general developmental stage of subjects, and enables identification of the youngest and oldest birds in a sample [63]. Crows were housed individually in field aviaries (3 × 3 × 2.5 m) with the exception of adults that had been trapped with dependent young, which were always kept together.

The tool behaviour of 34 crows was assessed in pre-testing sessions (for details, see [43]), and only birds that were confirmed to manufacture and use hooked stick tools progressed to the main experiments [43, 63]. Twenty-nine crows were tested in Experiment 1 (18 in 2012, 14 in 2013; three crows participated in both years), with seven of them also participating in Experiment 2 (all in 2012). Subjects were tested individually in an experimental chamber (connected to the housing aviary), which had opaque side walls to ensure that they could not see, and were themselves not visible to, any other crows during formal trials. To facilitate motivation, food bowls were removed from the housing aviary ca. 1–1.5 hours before trials of Experiment 2, and some trials of Experiment 1. During experimental trials, birds had ad libitum access to water, but not to food except for the bait provided in extraction tasks. Observers filmed crow behaviour with a Panasonic HD camcorder for subsequent analyses, from a hide outside the experimental chamber (Fig. 1a).

Experiment 1

Experimental procedures

The basic experimental set-up is schematically illustrated in Fig. 1a. We presented raw materials for tool manufacture on one or two ‘material logs’, by firmly wedging stems of Desmanthus virgatus (3–12 stems, each usually containing multiple forks) into drilled holes to stand upright, as crows would encounter them in the wild [42]. Up to two ‘food logs’ contained between 6 and 18 drilled holes each (diameter either ca. 9 mm or 12 mm; depth 70 mm), which were baited with a peanut-sized piece of pork or beef heart, or a dead spider. In some trials, a ‘manufacture log’, one part of a split wooden log, was presented between the material and food logs, to provide additional surfaces for birds to craft their tools on (Fig. 1a, c). Trials lasted for 90 minutes, but finished earlier if the subject had extracted all bait.

Data analyses

For each tool-manufacture sequence, we scored the behavioural actions of three stages (Fig. 2a): the release of the ‘basic tool’ (Fig. 1e); the processing of the basic tool; and the deployment of the ‘tool’ (once the basic tool was inserted into a hole, it was considered a tool). We scored behaviours until the subject extracted bait from a hole, abandoned the tool (i.e., did not touch it with its bill or feet for more than two minutes), or five minutes had elapsed after the first tool insertion into a hole. Videos were scored with JWatcher (www.jwatcher.ucla.edu) and Solomon Coder (www.solomoncoder.com) software, using the definitions provided in Additional file 2: Table S1; note that, unlike an earlier study [4], we did not score removal of leaves or small side branches, as these actions can only be expressed when plant stems possess these structures, which would lead to unreliable estimates of behavioural variability. Inter-observer agreement was assessed as described belowFootnote 1.

To enable meaningful comparisons within and between birds, manufacture sequences were excluded if: the length of any of the shafts of the provided plant material was shorter than ca. 20 mm; there were more than two branches growing out of, or within ca. 20 mm from, the chosen joint; the subject abandoned the plant material or basic tool before its insertion; it pulled out an entire plant stem from the material log; it made a tool out of plant debris (from previous manufactures); or it made a tool by cutting the tool shaft (which results in a non-hooked stick tool; see Fig. 1fiii). Of the 29 crows which participated in Experiment 1, 18 produced at least one manufacture sequence that met these criteria, yielding a total sample of 85 valid sequences.

Using non-parametric Kruskal − Wallis tests, we analysed between-bird variation in (Fig. 2b–d): (i) the number of different action types; (ii) the time spent processing the hook; and (iii) the time spent bending the tool shaft (for definitions, see Additional file 2: Table S1). Subjects that had produced fewer than three valid sequences were excluded from these analyses, leaving a subsample of 74 sequences from 10 subjects. To assess the effect of gape colouration on (i), we ran generalised linear mixed models (GLMMs) using the ‘lme4’ package [64] in R [65], with a Poisson error structure and log link function, and with bird ID fitted as a random effect to account for data non-independence. For all GLMMs, generalised linear models (GLMs) and linear mixed models (LMMs), significance of main effects was assessed with likelihood-ratio tests (best model against null model, at α = 0.05). Since metrics (ii) and (iii) were not normally distributed and included zero values, we analysed these data in two steps: we first used a GLMM with a binomial error structure and logit link function to test whether gape colouration was related to the expression of the behaviour of interest (yes/no score). For the sample of sequences that included the behaviour, we then specified a second model with a gamma error structure and inverse link function, to test the influence of crow ‘age’ on the time spent performing the behaviour.

We additionally examined the similarity between manufacture sequences using Needleman − Wunsch distance (NW distance), a measure commonly used in genetic analyses [66]. This method first aligns two sequences (in our case, of behavioural actions during the ‘processing’ stage of tool manufacture where path dependence was assumed to be negligible; see Fig. 2a) so that the number of differences is minimised, and then counts this number of differences; if two paired sequences differ in length, actions missing in one of them are treated as ‘deletions’ in the other. In order to avoid overrepresentation of subjects with more sequences, we picked the first three sequences from each subject and calculated the NW distance for each pair (30 sequences for 10 subjects) using the ‘Biostrings’ package [67] in R. We tested whether NW distance was correlated with individual ID (by multivariate permutational ANOVA), or gape colouration (by Mantel test), using the ‘vegan’ package [68] in R.

Experiment 2

Experimental procedures

To investigate the effects of raw-material properties on tool-manufacture behaviour, we presented subjects with plant stems that encompassed the full natural range of plant properties. In this experiment, we ensured that stems contained only a single fork suitable for hooked stick tool manufacture, removing side branches where necessary (Fig. 1b). When preparing trials, three people (always including BK), who had previously observed hook tool manufacture by captive crows, independently used their best judgement to assign ‘material scores’ to eight stems. Specifically, they were briefed to assess the difficulty that they believed a crow would experience when attempting to sever each stem immediately below the joint on the root shaft. This method exploits the fact that, in addition to assessing basic dimensional properties (such as thickness), human observers can evaluate biomechanical characteristics (such as rigidity) that would be difficult or impossible to measure non-destructively. Material scores ranged from 1 (green and flexible) to 8 (woody and rigid). The median of the three independent scores for each stem was used to assign a final score (inter-observer agreement was excellent1). Materials were selected so that the two stems in the middle of the range (material scores 4 and 5) resembled those preferred by crows in the wild [42], and matched each other as closely as possible (for formal analyses both were given the same material score). One of these matched stems was randomly selected to serve as a ‘control’, allowing us to test for demotivation of birds at the end of a trial (see below). The other seven stems were presented simultaneously, arranged side-by-side in random order, on a material log (Fig. 1b).

We provided a food log with a single drilled hole (diameter 16 mm; depth 70 mm), baited with a peanut-sized piece of pork or beef heart, and in some trials, a manufacture log (see above). Before the bird entered the experimental chamber, a tiny piece of meat was positioned on the food log next to the drilled hole to attract the subject’s attention. After each tool manufacture with successful bait extraction, the observer called an assistant by radio. The assistant removed the tool and any plant debris (but left the remaining plant stems in place for subsequent choices), and re-baited the food log, in full view of the subject. If a bird manufactured a tool but did not extract any bait with it within 15 minutes, the tool and plant debris were removed, but the remaining plant stems stayed on the material log, and the food log was re-baited. After all choices had been made, or 15 minutes passed without a tool manufacture, any remaining stems were removed and the control stem was presented on the material log in the position where the matched stem had been presented previously, and the food log was re-baited. The trial finished when the subject used this control stem to manufacture a tool and extract bait.

Data analyses

Videos were scored as described above for Experiment 1, and results of inter-observer agreement evaluations are reported below1. In Experiment 2, we used more relaxed exclusion criteria for manufacture sequences than in Experiment 1, since provided materials were more rigorously controlled, and the experiment was specifically designed to tempt crows to use non-preferred plant materials. No stems were excluded from our analyses of basic manufacture decisions (56 stems; Fig. 3a), and of the order in which stems were chosen (56 stems; Fig. 3b). Only sequences in which the subject either manufactured a non-hooked stick tool (five cases), or pulled out an entire plant stem from the material log (four cases), were excluded from further analyses, as this inevitably restricted the range of subsequent behavioural options. The final dataset comprised of 33 manufacture sequences (see Additional file 6: Figure S1).

The choice of a stem was scored when the basic tool was released (Fig. 1e), even if a bird had previously interacted with one or more other stems without releasing a basic tool. Subjects’ preferences for stems of particular material scores were analysed using a custom-written permutation test. Given the actual number of choices made by each subject during experimental trials, we performed 10,000 permutations to calculate the mean choice order of any candidate stem under the assumption of random choice (null hypothesis), and then compared the observed choice with this random distribution. We also scored how birds released basic tools from stems (Fig. 1g), as we expected material properties to be particularly important at this early stage of the manufacture process. Given the range of possible pathways that can lead to the release of a basic tool (Fig. 4), and modest replication for some of these behavioural sequences, we used the following rule to group cases for analyses: if the first and second actions were both cuts, the release method was scored as ‘cut’, while if any action was a pull, the release method was considered to be a ‘pull’ (for definitions, see Fig. 1g). Using frame-by-frame analysis, the time required to release the basic tool was measured to the nearest 0.2 seconds.

We used (G)LMMs, with bird ID fitted as a random effect (see above), to analyse the effect of material score on: (i) whether or not a hooked stick tool was manufactured; (ii) the time taken to release the basic tool (log-transformed to normalise errors); (iii) the number of behavioural actions in each sequence; and (iv) the method chosen for releasing the basic tool. For (i) and (ii), we fitted a quadratic term in our models, as birds seemed to struggle with stems at either end of the range, and models including this term had lower AIC values than those without (for [i]: 56.20 vs. 67.66, P <0.001; for [ii]: 90.88 vs. 92.13, P = 0.07). We also examined whether crows performed the same first action for the matched and the control stem; this was only possible for five subjects, since one bird did not manufacture a tool from the matched stem and one bird pulled the entire control stem out of the material log.

Complementary analyses

Our results suggested that some of the variation observed in crows’ tool manufacture behaviour (Experiment 1) could be explained by the properties of raw materials (Experiment 2). Although we had attempted to collect stems of fairly standardised properties for Experiment 1, we noticed seasonal changes both in the properties of stems we could find in our study site, and in the crows’ manufacture behaviour (Fig. 5a). We had not recorded the diameter of stems before providing them to crows in Experiment 1, but we were able to gauge seasonal patterns by measuring the diameter of manufactured tools (Fig. 5b), which were routinely photographed on grid paper after trials. Using ImageJ software, one of us (SS) measured the diameter (ca. 1 cm above the joint on the tool shaft) of a subsample of 103 hooked stick tools manufactured by 16 subjects across 19 trials of Experiment 1 (deployment could not be confirmed for all of them). Trials were selected pseudorandomly for analysis, in a way that achieved even coverage of our study period at approximately one-week intervals.

We used (G)LMMs, with bird ID and year fitted as random effects, to analyse seasonal changes (trial date) in: (i) stem diameter; and (ii) time to release the basic tool (both log-transformed to normalise errors). Given the observed quadratic relationship between plant properties and the time taken to release the basic tool in Experiment 2 (see above), we also fitted models with a quadratic date term, yielding lower AICs compared to those without this term (for [i]: −38.2 vs. –34.5, P = 0.02; for [ii]: 188.2 vs. 199.9, P = 0.01). As some birds were tested sequentially, we removed bird ID as a random effect for analysing temporal changes in the number of actions in sequences; for this test, we constructed GLMs with a Poisson error structure and log link function, only using the first three sequences from each subject, to avoid overrepresentation of subjects with more sequences. Significance of the main effect was tested using the ‘lmtest’ package in R [69]. We also examined whether behavioural sequences changed over time (Experiment 1), and varied across material scores (Experiment 2), by testing for correlations between NW distance and time interval between trial dates, or differences in material scores, respectively. NW distance was calculated as described above, except that the sequences contained incomplete actions for these analyses (see Fig. 5a).