Ethics statement

The data collection protocol was approved by the Emory University Institutional Animal Care and Use Committee and all data were collected in accordance with its guidelines for the ethical treatment of nonhuman study subjects.

Study system

The data were collected by JCF in 1998 from a large group of captive pigtailed macaques (Macaca nemestrina) socially housed at the Yerkes National Primate Center in Lawrenceville, Georgia. Pigtailed macaques are indigenous to south East Asia and live in multi-male, multi-female societies characterized by female matrilines and male group transfer upon onset of puberty35. Pigtailed macaques breed all year. Females develop swellings when in Œstrus. Macaque societies more generally are characterized by social learning at the individual level, social structures that arise from nonlinear processes and feed back to influence individual behaviour, frequent non-kin interactions and multiplayer conflict interactions, the cost and benefits of which can be quantified at the individual and social network levels25,26,31,36,37,38.

The study group contained n=48 socially mature individuals (we exclude non-mature individuals because their behavioural strategies are still developing and so are non-stationary over short timescales) and 84 individuals in total. Socially mature males were at least 48 months and socially mature females were at least 36 months by study start. These thresholds correspond to approximate onset of social maturity in pigtailed macaques. The study group had a demographic structure approximating wild populations and subadult and adult males were regularly removed to mimic emigration occurring in wild populations. All individuals, except 8 (4 males, 4 females), were either natal to the group or had been in the group since formation. The group was housed in an indoor–outdoor facility, the outdoor compound of which was 125 × 65 ft.

Pigtailed macaques have frequent conflict and employ targeted intervention and repair strategies for managing conflict25. Data on social dynamics and conflict were collected from this group over a stable, four month period. Operational definitions are provided below.

Operational definitions

Fight

It includes any interaction in which one individual threatens or aggresses a second individual. A conflict was considered terminated if no aggression or withdrawal response (fleeing, crouching, screaming, running away and submission signals) was exhibited by any of the conflict participants for 2 min from the last such event. A fight can involve multiple individuals. Third parties can become involved in pair-wise conflict through intervention or redirection, or when a family member of a conflict participant attacks a fourth-party. Fights in this data set ranged in size from 2 to 31 individuals, counting only the socially mature animals. Fights can be represented as small networks that grow and shrink as pair-wise and triadic interactions become active or terminate until there are no more individuals fighting under the above described two minute criterion. In addition to aggressors, a conflict can include individuals who show no aggression or submission (for example, third-parties who simply approach the conflict or show affiliative/submissive behaviour upon approaching, and recipients of aggression who show no response to aggression (typically, threats) by another individual). Because conflicts involve multiple actors, two or more individuals can participate in the same conflict but not interact directly.

In this study only information about fight composition (which individuals were involved) is used. Only fights that included two or more socially mature individuals were used in the analysis; the data includes N=994 such fights. We do not consider internal aspects of the fight, such as who does what to whom, except for the order of each individual's first involvement in the fight (used to estimate time-ordered conditional probabilities for use in the dynamical branching process model). Time data were collected on fight onset and termination but are not used in these analyses.

Power

The degree of consensus among individuals in the group about whether an individual is capable of using force successfully26,27. In previous works, we showed that consensus can be quantified by taking into account the total number of subordination signals an individual receives and multiplying this quantity by a measure of the diversity of signals received from its population of signalers (quantified by computing the Shannon entropy of the vector of signals received by individual i)26. In pigtailed macaque societies, the subordination signal is the silent bared teeth display36 emitted outside the conflict context during pass-byes and affiliative interactions. The distribution of power in our study group is heavy tailed, such that a few individuals are disproportionately powerful.

Policing

A policing intervention is an impartial intervention performed by a third party into an ongoing conflict25. Three males and one female perform the majority of effective policing interventions but only the three males (Eo, Qs, Fo) specialize on policing12. These four individuals occupy the top four spots in the power distribution25,27.

Redirection

A redirection occurs when an aggressor, recipient, or intervener directs aggression at a third (or fourth) individual who was not its original target or attacker. The target of the redirection may not have been involved in the fight until the redirection, or may have been involved in the fight but interacting with individuals other than the redirecting individual.

Data collection protocol

The data were collected by a trained observer (J.C. Flack). The observer spent roughly 100 h before data collection learning to recognize individuals and accurately code their behaviour from the observation tower above the monkey compound. Accuracy was validated by a second trained observer (F.B.M. de Waal). J.C.F. also evaluated coding accuracy using video. Coding accuracy was nearly 100%.

During observations all individuals were confined to the outdoor portion of the compound and were visible to the observer. The ≈150 h of observations occurred for up to 8 h daily between 1,100 and 2,000 h over a 20-week period from June through October 1998, and were evenly distributed over the day. Conflict and signalling data were collected using all-occurrence sampling in which the entire conflict event is followed from start to finish and all participants and their behaviour are recorded.

Provisioning occurred before observations and once during observations at approximately the same time each day. The group was stable during the data collection period (defined as no reversals in status signalling interactions resulting in a change to an individual's power score; see ref. 26).

Model descriptions and justification

We first evaluate which of three basic, empirically grounded fight-joining models explain our macroscopic observable, the distribution of fight sizes. We only accept a model if it is both consistent with the measured, microscopic data and can recover in simulation the observed, measured macroscopic output. Hence all of our models are closely tethered to the measured data and biological details of our model system.

It is important to realize that the parameters in the maximum entropy and branching models come from the microscopic data. The models do not assume prescribed values for parameters but are perhaps better viewed as hypotheses about the ways in which the measured, microscopic detail is connected to observed macroscopic patterns.

All models assume that events external to the system do not create correlations in behaviour. The data were collected in a controlled, captive setting designed to minimize the influence of external events, and we have no evidence for important, consistent external forcers of conflict.

The independent model assumes individuals do not respond to each other but instead join fights without regard to who else is fighting. In this case, perturbations to individual conflict behaviour would have no additional effect on group behaviour.

The independent model fails to recover both the observed distribution of fight sizes (Fig. 1) and the observed significant pairwise correlations (Supplementary Fig. 7).

Individuals sometimes randomly join fights and sometimes the decision to fight reflects strategic interactions at a pairwise level13. We capture this interpretation of the microscopic data using a maximum entropy approach, which corresponds to the spin-glass model of magnetic systems in physics. Because the model is parameterized by the data, it is empirically grounded and serves as a valid biological hypothesis. However we note that it is less mechanistically specific than the branching process described below: spin-glass interactions are not directional or time-ordered, but rather operate symmetrically and over the timescale of an entire fight bout.

The pairwise maximum entropy model, with parameters fit from the microscopic data, recovers well the distribution of fight sizes (Fig. 1). The good performance of the model leads to the prediction that the sensitivity χ is about twice that of a system with the same conflict frequencies but no strategic interactions.

Another reasonable interpretation of the microscopic data is that the random component of fight-joining decisions is very small and the decision to fight reflects strategic interactions at a pairwise level. This leads to a branching process model that, with parameters fit from the microscopic data, also recovers the observed distribution of fight sizes. The collective behaviour produced by this model can be simply understood in terms of a single parameter, the branching ratio. Additionally, branching process models like this one have been used in previous work to explore other aspects of conflict dynamics in this system, including the role of policing and other forms of third-party intervention in the infectivity of aggression12.

Independent model inference

The independent model consists of individuals participating in conflict randomly, with the frequencies of individual appearance equal to their empirically measured, heterogeneous values f i =〈x i 〉. Naively, this can be written as a relative negative log likelihood , and is equivalent to a maximum entropy model that matches only the frequencies f i . (The relative negative log likelihood L(x) of state x is related to the likelihood p(x) by p(x)=exp(−L(x))/Z, where Z is a normalization constant set by the constraint that the sum of likelihoods over all states is one. In statistical physics, Z is the partition function and L(x) is proportional to the free energy of state x.)

However, as detailed in ‘Operational definitions’ in the Methods section, a fight was operationalized for these analyses as involving two or more socially mature individuals. (Observed fights that involved only juveniles and 0 or 1 mature individuals are therefore excluded.) Correspondingly, we forbid our models from producing fights of size smaller than two. We treat this as an additional constraint on the model. The resulting maximum entropy model is then the one in which the likelihood of states with fewer than two individuals present is taken to zero. This corresponds to a relative negative log likelihood

where Θ(z) is 0 when z≤0 and 1 when z>0.

In the unconstrained case (α=0), we can easily solve for h i :

We must now solve numerically for h i to match the empirically measured f i =〈x i 〉=〈x i 〉 α→∞ . To accomplish this, note that the unconstrained model will have modified statistics:

where f 0 is the frequency of size zero fights, f i1 is the frequency of size one fights consisting solely of individual i, and is the overall frequency of size one fights (all measured in the unconstrained model). In terms of unconstrained individual frequencies, these are

We use an iterative procedure to solve equation (7) for 〈x i 〉 α = 0 , which are then used in equation (6) to find the fields. Finally, samples from the independent model (equation (4)) are produced by sampling using equation (5) and simply discarding samples in which fewer than two individuals appear. For our data, this results in discarding about 17% of samples produced with equation (5).

Pairwise maximum entropy model inference

We next constrain our equilibrium maximum entropy model to match the frequencies of appearance of both individuals and pairs of individuals. This model is known to be the spin-glass Ising model39, with relative negative log likelihood

As in the independent model, the likelihood of fights with fewer than two individuals is taken to zero:

The statistics we fit in the equilibrium model are the individual and pairwise frequencies of appearance:

with N(x i ) and N(x i , x j ) representing, respectively, the observed number of appearances in unique fights of the individual i and the pair i, j.

We would like to fit the individual and pairwise frequencies of appearance to the precision that the data supports. As a measure of the goodness of fit, we use the average of normalized residuals

where

is the expected variance of each residual due to finite N, and repeated indices are understood as individual frequencies: f ii =f i . A good fit is expected to have 〈χ2〉≈1.

To perform fitting, we use a simplified method that starts with the mean-field solution and varies a single parameter corresponding to weighting a non-interacting prior. Specifically, we make use of the L 2 -regularized mean-field entropy of ref. 40. The regularization consists of a Gaussian prior with a form designed to make the mean-field case easily solvable, corresponding to an additional term in the relative negative log-likelihood

where γ is the strength of the prior, which favors interactions J ij that are smaller in magnitude. The mean-field solution under this regularization is ref. 40

where J′ is the matrix that has the same eigenvectors v q as the correlation matrix

and eigenvalues , where are regularized versions of C's eigenvalues c q :

This regularization is typically used in a Bayesian sense to avoid overfitting, where a typical value of the regularization strength is γ=1/(10N f2(1–f)2), with N the number of samples and f=n−1∑ i f i the average individual frequency40. Avoiding overfitting, however, still typically requires either enormous N (for example, ref. 39) or restricting the effective number of fit parameters via an expansion (for example, ref. 40). In our case, N is fundamentally limited in that we are describing a stable social epoch of finite duration. In addition, typical high-temperature expansions cannot easily incorporate the restriction that fights have a minimum number of participants (the α term in equation (11)).

Alternatively one can treat γ as a fitting parameter that interpolates between the mean-field solution (which we find overestimates the strength of interactions) and the case of independent individuals. Although it is not a priori obvious that varying this single parameter will be enough to fit the observed statistics within expected statistical fluctuations, we find that this is true for our data. Numerically sampling from the distribution defined by equation (11) with the regularized mean-field J from equation (17), we minimize 〈χ2〉 from equation (14) as a function of γ. Sampling is performed using a standard Metropolis Markov Chain Monte Carlo approach. In evaluating the fit, we choose the number of samples to scale with the number of data samples, using N samples =20N.

As a simple check that this inference approach is not biased to find more sensitive systems, we infer a pairwise model using data produced by the independent model and compare its susceptibility to the known exact value (equation (11) in Supplementary Note 3) in Supplementary Fig. 1. The resulting inferred model has susceptibility that stays close to the true value for small h ext and remains smaller than the true value for larger h ext .

The variation in inferred J ij parameters over individuals can be seen in the lefthand plot of Supplementary Fig. 3. Interaction terms are slightly more often positive than negative, corresponding to excitatory interactions.

Branching process inference

In the branching process model, we use time-ordered appearance data, fitting the time-ordered conditional appearance probabilities

where counts the number of times individual j appears in the same fight bout as individual i, but at a later time T>t. Note that this is different than previous work in inductive game theory13: there (time directed) correlations were measured between individual appearances in separate fight bouts, whereas here we measure correlations within fight bouts.

The parameters we vary in the heterogeneous branching process model are the single-step conditional probabilities p ij , which measure the probability that individual j appears in step t+1 of the branching process given that individual i appeared in step t. Being probabilities, p ij are constrained to values 0≤p ij ≤1. (Note that the branching model is thus limited in the extent to which it can represent inhibitory interactions, for example, if i's involvement in the fight deters j from joining.) We modify this constrained optimization problem into an unconstrained one by defining and performing the optimization over the (unconstrained) parameters.

In the branching process simulation, the first individual to join each fight bout is chosen randomly with probability proportional to the frequency with which each individual appears at the beginning of fights in the data. At each subsequent time step in the branching process, each individual j who has not yet been activated in the current fight has a probability of joining equal to the sum of p ij for all i active in the previous time step. The fight bout concludes when no individuals are active in a given time step. As discussed above, fight bouts that do not grow beyond a single individual are discarded.

Analogously to the case of the equilibrium maximum entropy model, the branching process parameters are fit by minimizing

where

We use a standard Levenberg–Marquardt algorithm (scipy.optimize.leastsq) that uses individual residuals and a lowest-order approximation for the Jacobian with respect to parameters. We find that restarting the Levenberg–Marquardt routine every 10 steps (effectively resetting its damping parameter to avoid unnecessarily small steps arising from its assumption of a non-stochastic objective function) produces faster convergence with fewer samples required for each estimate of the residuals and Jacobian. Minimization is stopped once 〈χ2〉 defined in equation (21) falls below 1. In addition, we find that switching between simple gradient descent steps and Levenberg–Marquardt minimizations allows for more efficient fitting of larger P ij .

The resulting inferred redirection probabilities p ij are visualized in Supplementary Fig. 2 and the righthand plot of Supplementary Fig. 3.

Data availability

The data that support the findings of this study are available from J.C.F. upon reasonable request.