Literature search and selection criteria

We first compiled all studies that cited or were cited by two key reviews28,31. We searched for studies published before 31 July 2014 on Web of Knowledge, Scopus and Google Scholar databases with the key words ‘flight initiation distance’, ‘flight distance’, ‘escape distance’, ‘approach distance’, ‘flushing distance’ and ‘response distance’. The references cited in these studies were also examined. Our criteria to include a study were that FID of a given species (measured sensu27) had to be collected in areas of low and high human presence (=human disturbance). We followed the criteria used by authors to categorize each area as a function of its degree of disturbance. Hunted populations were not included in our data set. A PRISMA diagram describing our literature search and the detailed reasons for exclusion of studies are available in Supplementary Fig. 8 and Supplementary Table 3, respectively. Our final data set consisted of 504 effect sizes from 75 studies across 212 species distributed in three major taxa: birds, mammals and lizards (data available in Supplementary Data 1).

Estimating effect sizes

To quantify species response to disturbance, we used the effect size metric Hedges’ g11; a bias-corrected measure of standardized mean differences, which does not overestimate the magnitude of effect when sample size is small11. For each species, we compared the mean FID of populations in areas of higher human disturbance with those in areas of lower human disturbance. These FID comparisons were restricted to populations of species studied in a same study. In our data set (Supplementary Data 1), positive effect sizes indicate sensitization, whereas negative values indicate tolerance of human disturbance. When mean, variance and sample size of FIDs were not provided in a paper, we estimated Hedges’g from the statistical results (t, F, χ2, Z and P)53. We directly contacted several authors for missing data (see Acknowledgements for details). Importantly, recent studies have shown that the starting distance of an approaching person (that is, animal–human distance when the approach begins) can affect FID27. However, the potential effects of starting distance on FID were controlled in most effect sizes either because studies used a fixed starting distance among experiments (for example, starting distance fixed in 30 m) or because starting distance was controlled analytically by using it as a covariate in statistical models, or because we used the marginal means for those studies that provided them or those studies in which we were able to obtain the raw data (see Supplementary Data 1 for details).

Meta-analysis

We used multi-level mixed-effects meta-analysis17 to test for both mean effect sizes and the importance of our predictors. We controlled for phylogenetic and study non-independence by including phylogeny (Supplementary Fig. 9) and study identity as random-factors in our models17. Although we also have multiple estimates per species in our data set (Supplementary Data 1), a model selection approach showed that the inclusion of ‘species identity’ as an additional random-effect did not improve our models (Supplementary Table 4). Phylogenies of birds, mammals and lizards and how they were combined to test for difference among these taxa is described in Supplementary Methods 1. The mean effect sizes (that is, mean of the effect sizes weighted by the inverse of their variance) were considered significant if their 95% confidence intervals did not include zero53. We used the between-groups heterogeneity statistic (Q b ) to test for significant difference between mean effect sizes53,54.

We used I2 index as a measure of heterogeneity in the effect sizes in which the value represents the proportion of total variation in data that is not sampling error (0%—all sampling error; 100%—no sampling error)15. We used an extended version of I2 that partitions the total heterogeneity among different sources: variation explained by study identity, by phylogenetic effect and by the residual variation (that is, that remaining to be explained by the predictor variables17). We calculated the degree of phylogenetic signal in our effect size estimates using the phylogenetic heritability index17, H2, which is the variance attributable to phylogeny in relation to the total variance expected in the data. When the unit of analysis is species, H2 is equivalent to Pagel’s λ (ref. 55), in which higher values are associated with stronger phylogenetic signals. Primary studies can suffer from publication bias, where studies with low sample size are more prone to be rejected because of their higher probability of not finding significant effects16,53. We checked for publication bias using Egger’s regression16, in which intercepts significantly different from zero suggest potential publication bias. To overcome the non-independent nature of our data, we applied the Egger’s regression test on the meta-analytic residuals17. Analyses were conducted using the metafor54 R package v.1.9-4.

Covariates

The large number of observations in birds permitted us to investigate whether certain variables were potentially important predictors of tolerance of human disturbance. Based on previous findings in the literature and our own hypotheses, we collected information on eight variables. Seven were associated with a species’ morphology, life-history and natural history traits. These data were obtained from Del Royo et al.56 (see Supplementary Data 1 for details). The eighth variable, the habitat contrast, describe the type of habitats contrasted in populations under low and high human disturbance. These data were extracted from each surveyed paper. Importantly, for these covariates, multi-collinearity was not an issue (variance inflation factor<1.50, below the suggested threshold57 of 3; see also correlation matrix in Supplementary Table 5). Below we justify the use of each variable and our predictions concerning their effect on bird response.

Body mass. A substantial amount of empirical evidence shows that large animals are less tolerant of human approaches—they generally flush at a greater distance from humans23,24,25,26,27. There are two non-exclusive main hypotheses to explain this response: (i) large animals are generally less maneuverable29,30 or (ii) large animals suffer a greater opportunity cost of not foraging because of their greater absolute metabolic needs27,32. Thus, we hypothesized that large animals would tolerate less human disturbance by showing larger positive effect sizes than small animals. Body mass was measured in grams and log 10 transformed before analysis.

Group size. Three models of predation risk assessment predict a declining risk of predation as group size increases58,59,60. If predation risk decreases with group size, individuals in larger groups might thus tolerate closer approach. Moreover, if tolerance of non-lethal human disturbance is a socially transmitted behaviour, one could expect the effect of social transmission to be enhanced in larger groups61. Therefore, group size should be expected to influence tolerance of human disturbance by increasing the tolerance as group size increases. Following Burish et al.62, we coded species into three categories: alone or in pairs, in groups of 5–50 individuals, in groups of >100 individuals.

Habitat openness. Animals in open habitats can simultaneously detect predators at a greater distance and might have to travel a longer distance to reach protective cover. A recent study showed that birds originally from open habitats tend to delay the flight when compared with birds from closed habitats34. Moreover, highly altered habitats, such as urban and suburban places, usually have reduced vegetative cover. We thus hypothesized that species naturally living in open habitats would better tolerate humans because they would suffer less as vegetation cover was reduced. Either way, habitat openness might influence FID and must be accounted for if we are to isolate human disturbance effects. We categorized species as being originally from open habitats (for example, uplands and grassland) or closed habitats (for example, dense forests and woodlands).

Foraging habit. Prior work has shown that in areas of human disturbance, birds may place their nests higher in trees63,64 suggesting that being in trees affords enhanced safety. However, a previous study looking at the effects of height in a tree on FID found either no effects or found that birds that were higher in trees initiated flight at greater distances65. Our prediction is that species that typically forage on the ground would be under greater pressure to tolerate humans to minimize the opportunity cost of resuming foraging after a potentially unwarranted escape. We thus classified species as typically foraging on the ground versus species typically foraging above of ground level (for example, in trees or catching aerial insects while flying).

Diet. Birds that eat live prey, particularly carnivorous raptors, have especially good visual acuity and motion sensitivity66. Given this sensitivity, birds that eat live prey are disturbed at greater distances23. This increased sensitivity to disturbance could select for tolerance if they are better to learn that humans are not a threat. Alternatively, it could also reflect the possibility that foraging efficiency is reduced because their prey are similarly disturbed by humans. In the latter case, we might expect that these species would be less likely to tolerate the on-going disturbance. We categorized species as carnivorous, herbivorous or omnivorous.

Migration. Birds that migrate are exposed to a greater variety of habitats and might be selected to rapidly learn to assess predation risk in new habitats. Under this scenario, we expected that these species would be more likely to tolerate increased disturbance. We coded species as resident or migratory.

Clutch size. Like body size, clutch size is a life history trait that reflects energetic investment, and therefore need37. In general, we might expect that birds that produce fewer eggs per reproductive period might be more energetically stressed than those that produce more eggs because they have larger parental investment per offspring36. In this case, small clutch-sized species would tolerate closer approach because of the greater opportunity costs associated with flight27,28. Alternatively, enhanced energetic needs might select for small clutch-sized species to not tolerate the on-going monitoring costs associated with disturbance67 and may thus select them to move off and forage in areas without disturbance, resulting in a lower tolerance of these species. Either way, clutch size must be accounted for to isolate the effects of human disturbance on flight. We used the estimates of the number of eggs per reproductive period. Because there was a low correlation between clutch size and body mass (Supplementary Table 5), clutch size effects were not corrected by species body mass in our analyses.

Habitat contrast. The nine habitat contrasts were: (i) natural versus urban area, (ii) rural versus suburban areas, (iii) rural versus urban area, (iv) suburban versus urban area, (v) inside versus outside reserve, (vi) low versus high human disturbance in urbanized areas, (vii) low versus high human disturbance in recreational nature (for example, beaches, ski areas and other tourist locations), (viii) low versus high human disturbance in islands and (ix) low versus high human disturbance in reserve. We used these habitat contrasts either because of their difference in human disturbance degree or because a particular characteristic of habitat is expected to influence animal’s tolerance of humans. Specifically, contrasts between natural, rural, suburban and urban populations were tested because they represented increasing human presence and thus a differential human tolerance is expected as a function of frequency of exposure to humans20,21,22. Previous studies have shown that even subtle temporal or spatial change in human disturbance within a given habitat type may triggers changes in animal’s tolerance of humans27,41,42,43,44,45, justifying our exploration of the contrasts between low versus high human disturbance within a same habitat type (levels vi–ix). Overall, we expect a lower FID difference (that is, tolerance) in comparisons within than between habitat types. Recreational areas, islands and protected areas (reserves) were tested separately because their marked difference in the pattern of human disturbance. Because island populations often have reduced predation risk compared with mainland populations68, we expected either none or a small difference in FIDs of different populations found on islands. Animals living in natural areas with tourism may be more responsive to humans because they commonly experience seasonality in human disturbance (for example, visitation only in summer or winter). Beyond temporal variation, populations inside and outside protected areas may suffer marked spatial variation in human disturbance, as well as potential lethality associated with human presence (for example, individuals in protected area may feel safer). This spatial variation in human tolerance may also occur in populations ‘within’ protected areas if the frequency of human visitation varies in the protected area. Importantly, our habitat contrasts were restricted to comparisons between populations of species tested in the same study. As explained in the Meta-analysis section, the variation among studies was controlled for by using study identity as a random factor in our models.

Multi-model inference

We used a multi-model inference approach based on Akaike’s criteria corrected for small sample size to estimate the relative importance of the predictor variables12. To calculate the importance of each predictor, we first assessed the relative strengths of each candidate model by calculating its Akaike weight; analogous to the probability of that model is the best model. A constant term (intercept) was included in all models. In sequence, we estimated the importance of a predictor by summing the Akaike weights of all models in which that candidate variable appeared, which can be interpreted as the probability that a particular predictor is a component of the best model, which allowed us to rank predictors in order of importance12. We used a model averaging approach to estimate model parameters12. Multi-model analyses were conducted using the MuMIn69 R package v. 1.14.0.