Empirical data and social levels definition

Sperm whales groups were tracked visually and acoustically, day and night, during 2 to 4–week research trips between 1985 and 2003 in the Eastern Pacific Ocean, mainly off the Galápagos Islands (summarized in ref. 39). Three nested social levels were evident within the Pacific sperm whale society: individuals, social units and vocal clans. Individuals were identified by photographic records, comparing the natural markings on the trailing edge of tail flukes62. Social units were sets of about 12 individuals39 that live and move together for years, delineated using association indices on long-term photo-identification data45. Clans were sets of social units with high similarity in their coda repertoires, the stereotyped patterns of clicks used for communication40. Codas were recorded with hydrophones and repertoires were assigned to the social units whose members were photo-identified within 2 h of the recording and had at least 25 codas recorded30. The social units’ repertoires were compared to define the best partition of social units with distinct repertoires into clans30. Clans are estimated to contain many social units and several thousands of members, based on the estimated abundance of sperm whales in the Pacific and number of clans30. Social units are typically found in behaviourally coherent groups with other units from their own clan, but never with units from different clans even though sperm whales from several clans may use the same waters30.

Agent-based modelling

We simulated the interactions of multiple individual whales using an agent-based modelling framework (ABM) to test whether the clan structure observed in the sperm whale society could arise from evolving vocal behaviour. The ABMs were built in R63 based on empirical parameters (Supplementary Methods), and are described according to the Overview, Design concepts and Details protocol64 as follows:

A Purpose. The models test which transmission mechanisms for acoustic behaviour, if any, can give rise to clans of social units of sperm whales with distinct acoustic repertoires, and explain the multilevel social structure observed empirically. B Entities, state variables and scales. The models have one kind of agent that behaves under realistic life-history parameters (empirical support for model parameters available in the Supplementary Methods): female whales that learn coda types at ages of 0 to 2 year old. Because male sperm whales lead quasi-solitary lives and rarely produce codas62,65 they were not represented in the models. The agents are characterized by their age (years), their coda repertoire (a vector of frequencies of use of different coda types), and which social unit and vocal clan they belong to (nested categorical variables). The models were explicitly temporally-structured and implicitly spatially-structured. However, we accounted for different levels of population mixing (and so implicitly for individual movements), with coda transmission operating among individuals of three different social levels (see ‘d’). Simulations lasted for 700 time steps (years). C Process overview and scheduling: During each time step, biological processes occurred in the following order: birth, coda repertoire composition and changes (at ages up to 3-year old), social unit membership change (or not) and death. Calves have a high probability of staying with their natal group, and migration of individuals among social units is rare45; thus nearly-permanent and nearly-matrilineal social units are an emergent property. In these respects, the models mimic several transmission processes characteristic of some socially complex species. We started with two null agent-based models without social learning: in one the agents only learn their codas individually; in the other they receive their mothers’ coda repertoire, representative of genetic inheritance (as well as stable vertical–cultural transmission). We then simulated a total of 20 complementary scenarios with combinations of oblique social learning of coda types and transmission biases operating at the three different social levels (see ‘g’). D Design concepts. There are two emergent properties of the interactions among agents: social units (sets of females and their offspring who stay together during the simulated time) and vocal clans (sets of social units with highly similar vocal repertoires). Social units emerge in all models and vocal clans can be predefined (see ‘g’), or may emerge. All demographic processes were modelled with demographic stochasticity and parameterized from empirical studies (see Supplementary Methods). Birth rates were age-specific, and mortality rates were density- and age-dependent (calf agents had higher probability of dying than adult agents66), and migration rates of individuals between units were low and decreased with age. The main process of interest modelled is changes in individual coda repertoires, that is, in frequencies of use of coda types. Each agent has a repertoire represented by a vector with 62 elements denoting continuous absolute frequencies of different coda types from 0 (absent) to 100 (always performed coda type) (Fig. 2a) (details in Supplementary Methods). Calves compose repertoires at early ages (although precise age is inherently difficult to estimate empirically; Supplementary Methods). Repertoire composition was represented by calf agents replacing some coda types and frequencies once a year, while at ages of 0, 1 and 2. At age 3, all agents’ repertoires were fixed. Depending on the sub model (see ‘g’), the repertoire change occurred according to one of the three main transmission processes (see Fig. 2a): (i) individual learning—calf agents compose their own coda repertoires, that is, are assigned to random coda types and frequencies drawn from uniform distribution ∈[0,100]; (ii) genetic inheritance (which also represents vertical social learning)—calf agents receive their mothers’ coda repertoires; and (iii) oblique social learning—calf agents copy coda types and frequencies from adult agents, of different generations, kin-related or not. For the models with oblique social learning, the three following effects were included: (iv) homophily—calf agents preferentially copy codas from adult agents of social units with the highest repertoire similarity with the calf’s social unit’s repertoire. (The homophily effect posits that behaviourally similar individuals tends to interact more often17; since social learning occurs during social interaction, the homophily effect on learning can be represented as individuals with similar behaviour learning preferentially from each other.); (v) conformism—calf agents disproportionately copy the most common coda types; and (vi) symbolic marking—all agents of a given social unit are assigned to a random sequence of six coda types with frequency of usage 100 (a ‘symbol’) at time t=1, to mark the identity of their units; all calf agents from t=1 deliberately copy the ‘symbol’ of the unit to which they belong. To account for different degrees of population mixing, we replicated the models with oblique social learning and additional effects (iii–iv) across the three levels of the sperm whale society: social unit, predefined clans and population (Fig. 2b): (vii) social units—calf agents randomly copy codas from agents of their own social unit; (viii) predefined clan—agents were arbitrarily assigned to three clans and calf agents learned only from adult agents of their own clan. (Since clan partition could be driven by non-learning mechanisms, we simulated the pre-existence of clans representing geographically-segregated clans such as those that seem to occur in the Atlantic where acoustic variation is driven by spatial isolation44. We refer to these as ‘predefined clans’ as opposed to the ‘emergent clans’ that may arise in the simulations due to acoustic similarity.); (ix) population—calf agents learn from any agent in the population. We combined transmission mechanisms, effects and social levels in a total set of 20 ABMs (see ‘g’, Fig. 3a). At the end of each simulation, we observed the number and size of social units and vocal clans and how similar their coda repertoires were (methods below). E Initialization. Simulations were initialized with the following parameters based on empirical data (details, justification and references in Supplementary Methods). At the first time step, year t=1, all simulations started with a population of N 0 =1,000 agents, to which ages were randomly assigned from a negative exponential distribution (so the initial population was mostly young, with ages typically varying from 0- to about 70-year old), and social unit membership labels were assigned with equal probabilities. Each agent received an empty vector of 62 elements (that is, coda types) representing their coda repertoire. For each agent, half of the elements in its coda repertoire vector were randomly selected to receive an absolute frequency of usage from a uniform distribution ∈[0,100] (absent coda type=0; always performed=100). Agents are considered calves when they are 0-, 1- and 2-year old, during which changes in the coda repertoire occurred. Adult female agents became sexually mature after 9-year old, stopped reproducing after 41-year old, and lived 70 years on average. Population was modelled density dependent, with age-dependent reproduction, mortality and migration rates, such that the population fluctuated around the carrying capacity (N 0 ) over time. The initial number of social units was based on the initial population size (N 0 ) and empirical average unit size in the Pacific (about 12 members). Social units split in half when double the maximum initial unit size. Calf agents remained in the mother’s social unit since they highly depend on their mothers, and adult agents had low probability of randomly migrating to other social units during their lives (c=0.05). Repertoire changes were represented by replacement of frequencies of coda types and occurred three times for each agent (repertoires were fixed at the age of 3 years). Newborn agents started the simulation with empty coda repertoires; each simulated year, all calf agents changed their coda repertoires under one of the three main mechanisms—with additional effects or not—operating at one of three social levels (see ‘g’, Fig. 2). In all models, calf agents also had low individual learning rate (ilearn=0.02), that is, each year replacing the frequency of one coda type (62 codas × 0.02≈1) chosen at random by a frequency drawn from a uniform distribution ∈[0,100], which accounted for random learning errors or deliberate innovations60. Supplementary Fig. 3 illustrates the population output measures of a typical simulation. F Input. The models have no external input data, but initial parameters differed in sub models. G Sub models. We created a total of 20 sub models (Fig. 3a), all of which have the same structure but differ in the way calves compose their coda repertoires (Fig. 2). In the first null agent-based model (ABM 1), calf agents learn their coda repertoire only through individual learning. In the second null agent-based model (ABM 2), calves receive the exact repertoire of their mothers, mimicking genetic or vertical–cultural transmission of coda repertoires. In all the following models (ABMs 3–20), calves change repertoires with oblique social learning, some with combinations of the three transmission biases: homophily (ABMs 6–8 and 15–10); conformism (ABMs 9–11 and 15–17); and symbolic marking (ABMs 12–14 and 18–20). Oblique social learning and its biases occurred within social units (ABMs 4, 7, 11, 13, 16 and 19), across social units of the same predefined clans (ABMs 5, 10, 14, 17 and 20) and in the entire population (ABMs 3, 6, 9, 12, 15 and 18).

Coda repertoire similarity

The empirical repertoires of the social units were compared based on the inter-click intervals of each coda using an averaged multivariate similarity metric30. Because in the ABMs we simulated frequencies of usage of coda type—and not the inter-click intervals of each coda—we compared repertoires of each pair of simulated social unit with the weighted Bray–Curtis index between the average frequency of usage of codas of all agents of these units. We adjusted the index to represent similarity, which ranged from 0 (completely different) to 1 (exactly the same repertoire). We detail the differences between empirical and simulated codas and repertoire comparison in the Supplementary Methods.

Clan partition in empirical and simulated data

Clan partitioning in the simulated data was adapted from the original methods for vocal clan definition: the social units’ coda repertoires were compared and the best partition into clans was based on the repertoire similarity30. While the original approach included hierarchical clustering, we used the network formalism to depict social units (nodes) connected by similarity of coda repertoires (links) and modularity to define the emergence of clans (see below). To allow for direct comparisons, we reanalysed the empirical social and acoustic data30,45 with the same network framework. First, we built a social network of photo-identified individuals (nodes) connected by the strength of social relationships (links), that is, the proportion of time individuals were seen together45 estimated by the half-weight association index. We then overlapped the empirical acoustic network, in which the social units (nodes) were connected by the similarity in their averaged coda repertoires (links).

For both empirical and simulated data, vocal clans were defined by modules in the acoustic networks, that is, subsets of nodes (social units) that are highly and strongly linked within each other (by acoustic similarity) and weakly linked with the rest of the network. We searched for the best module partition using the Walktrap algorithm67, which is based on the assumption that random walks in a network will tend to get ‘trapped’ inside strongly connected modules. More specifically, this algorithm uses an agglomerative approach to form modules, using a distance metric based on the probability of a random walk from node i to node j. Hence, nodes belonging to a given module will share similar probabilities of going to nodes outside their module. To the resultant hierarchy of modules, the largest increase ratio of the total distance is used to infer the best partition into modules. Subsequently, we assigned a value to this partition using the weighted version of modularity metric Q68:

where A is a weighted adjacency matrix, with elements representing the acoustic similarity between social units, is the weighted number of links, k i is the weighted degree of node i and and g i gives the label of the module (herein clan) the node (herein social unit) i belongs to.

The significance of clan emergence, both in empirical and simulated data sets, was assessed comparing the modularity Q-values to a benchmark distribution generated from 1,000 theoretical networks. We created theoretical networks with the same size (number of nodes, that is, social units), same link weight distribution (that is, acoustic similarity) and connectance (proportion of realized links) using a model that randomizes the link weights among nodes69. Clan emergence was considered significant whenever the modularity Q-values of the observed acoustic networks were outside of the 95% confidence intervals of the benchmark distribution.

Sensitivity analysis and robustness of clan emergence

The parameters and initial conditions of the ABMs were grounded on empirical evidence (Supplementary Methods) and fixed across scenarios to allow for directly comparison of learning strategies without any confounding influence of other changing parameter values. To evaluate whether the observed partition of social units into clans was robust to varying the initial conditions in the models, we performed a sensitivity analysis of the 6 initial demographic and 2 learning parameters that were common to all of the 20 ABMs (population size and carrying capacity, reproductive age, migration rate, mortality rates, age distribution, initial average social unit size, individual learning rate and coda repertoire size; full description in Supplementary Methods). We ran each ABM changing a single parameter value at a time to two extreme parameter estimates of a biologically meaningful range (Supplementary Table 2) and calculated modularity and 95% confidence intervals with the theoretical model described above. Specifically, we tested whether changing the ABMs initial setup would still yield emergence of clans in the scenarios with biased social learning (ABMs 15–17); and, conversely, whether clans would emerge in the rest of the scenarios in which they originally have not emerged (ABMs 1–14 and 18–20; see Figs 3c and 4).

In addition, we evaluated the robustness of the metric for clan partitioning (modularity) by bootstrapping the links of the 20 simulated acoustic networks (Supplementary Methods). The simulation of coda repertoires by the ABMs represented a complete sampling, in the sense that all codas of all agents of all social units were recorded and compared. This is clearly not the case for the empirical data, in which field logistics inherently yield incomplete sampling of the social units’ coda repertoires. To make empirical and simulated data more comparable and assess whether the modularity patterns in the simulated data were consistent in subsets of the simulated data, we resampled the acoustic network weighted links (that is, coda repertoire similarity between social units) with replacement (bootstrap, 1,000 iterations) and calculated the weighted modularity with increasing sampling, from 5 to 100% with increment of 5% of the links at a time.