In the next section we draw on the existing literature to sharpen the question we will be considering. We also present the data we will use and discuss the overall analysis framework. Then in the following section, we present the relevant statistical methods in more detail. Next, we present our main results: first we perform a homogeneity test, and as this indicates non-homogeneity we go forward with change-point methodology, and crucially also present the degree of change. Then we investigate the effect of democracy. In the final section, we discuss our findings: we examine the robustness of our approach to various choices and its relationship with previous works, and also consider potential theoretical mechanisms.

We see our contribution as two-fold. First, we introduce a set of statistical methods to the peace research community, some of them new. We have attempted to make the presentation of the methods accessible to most peace researchers. Technical details, of separate interest also to specialists in statistics, are placed in the Online appendix. Second, we present new results and conclusions that partly challenge previous works and may generate hypotheses that can form the basis of future investigations. We find evidence that a sequence of war sizes from the last two centuries is not entirely homogeneous. In this sequence, the point of maximal change is found in 1950, corresponding to the Korean war. The upper quartile of the battle-deaths distribution decreases substantially, from 63,545 before the Korean war to 14,943 after. Note that there is considerable uncertainty around these estimates and that the conclusion is open to interpretation. Our change-point analysis gives a very wide 95% confidence interval for the point of change, but it also places considerable confidence on only a small handful of wars, including the Korean war, which is the maximum likelihood estimate. The uncertainty is discussed in detail below. We differ from parts of the literature by not focusing exclusively on WWII as the potential point of change, but by applying change-point methodology to investigate distributional changes in a time series of wars. We also investigate the role of covariates, in particular democracy.

While the empirical pattern constituting the long peace is not in itself disputed, some recent investigations have questioned whether the pattern can be said to constitute a statistically established trend (see e.g. Cirillo & Taleb, 2016 ; Clauset, 2017 , 2018 ; Braumoeller, 2019 ). Could this long period of relative peace simply be a random occurrence in an otherwise homogeneous war-generating process, or does it represent a significant change, a trend towards peace? Cirillo & Taleb (2016) , Clauset (2017 , 2018 ) and Braumoeller (2019) answer the last question negatively: they find that the long peace is not a sufficiently unusual pattern when considering the variability inherent in long-term datasets of historical wars. The question investigated by these authors is essentially statistical in nature, and we follow in the same vein. We approach a similar question, with similar data, but with somewhat different statistical tools.

Is the world becoming more peaceful? The question is both deceptively simple and quite controversial. Authors such as Gat (2006) , Goldstein (2011) and Pinker (2011) have argued that the world is becoming steadily more peaceful, and a multidimensional quilt of research has contributed pieces of layers with similar stories and conclusions. 1 Parts of these arguments concern wars and armed conflicts, and there, the concept of ‘the long peace’ ( Gaddis, 1989 ) has gained the weight of repeated respectful use, to signal the relatively few large interstate wars in the period after World War II (WWII).

Homogeneity tests are a general class of methods which aim at testing a null hypothesis of stationarity , that is, to test whether the observed sequence is consistent with a single, stationary statistical model or whether there is sufficient deviation from the model as to indicate that there has been a change. Most of the results in Clauset (2017 , 2018 ) are based on tests of homogeneity, where Clauset does not find sufficient evidence to reject the null hypothesis of no change. Tests of homogeneity seem attractive because they can potentially discover many types of deviations from the stationary model. However, for partly the same reason, they can often have low power in discovering actual changes. There are many homogeneity tests to choose between, which differ in, for example, the assumptions made, the choice of test statistic and the choice of alternative hypothesis; see Hjort & Koning (2002) and Cunen, Hermansen & Hjort (2018) for partial reviews and methods. We present a general homogeneity test in the methods section.

Finally, there are several different statistical frameworks for assessing whether a certain sequence of observations, war sizes in our case, supports a trend, or not. The possible options include regression models with respect to time, homogeneity tests and change-point analyses. We have not investigated regression models as these would impose too much of a constraint on the type of change present (also a quick look at Figure 1 clearly indicates that there is no simple linear time trend).

We have used the Correlates of War (CoW) interstate conflict dataset ( Sarkees & Wayman, 2010 ). This dataset contains onset dates x i and the number of battle-deaths z i for all interstate wars with more than 1,000 battle-deaths in the period 1816 to 2007, comprising a total of 95 wars. The dates x i range from 1823.27 (the Franco-Spanish war) to 2003.22 (invasion of Iraq). Figure 1 displays these data, with z i on the log10-scale. The choice of the CoW dataset is motivated by its widespread use ( Clauset, 2017 , 2018 ; Fagan et al., 2018 ; Spagat & van Weezel, 2018 ), which enables comparisons with other approaches. Also, the CoW dataset is considered to be of good quality, despite the issue mentioned above.

Further, there is a choice between different datasets. Naturally, we would prefer a dataset stretching as far as possible back in time, with measurements of high quality and constructed with careful and precise definitions. The previously mentioned study by Cederman, Warren & Sornette (2011) combines data from Levy (1983) , the CoW project ( Singer & Small, 1994 ) and the PRIO/UCDP Armed Conflict Database (ACD) ( Gleditsch et al., 2002 ). The dataset has a long time span, but is unfortunately limited to wars involving ‘major powers’. The quality of the reported battle-deaths number can also be an issue. Even for recent wars involving developed countries the estimates of the number of battle-deaths can be contested. The Falklands war, for instance, is included in the CoW interstate wars dataset with 1,001 battle-deaths, even though the actual number is most likely closer to 900 ( Reiter, Stam & Horowitz, 2016 ).

Now we have decided on a quantity of interest, war sizes, and have found a class of appropriate statistical distributions to model this quantity. Still, there is a major question to resolve: should we normalize the war sizes by population size or should we consider the absolute number of fatalities instead? Here, normalization refers to dividing the number of fatalities by the population size, typically the world population. Pinker (2011) forms most of his arguments around relative quantities, such as deaths per 100,000. Clauset (2017 , 2018 ) discusses the choice of normalization in some length, and decides to analyse the absolute numbers. The choice of normalization in fact translates into different questions: are we interested in making claims about the absolute sizes of wars? Or the risk of dying in wars? And in the latter case, with respect to which segment of the population should this risk be defined? All these questions are valid and interesting, but naturally the answers to one of them will not be directly relevant for the others. We have chosen to consider the absolute numbers. For the proponents of the long peace theory this is a conservative choice since normalizing by world population inflates the size of ancient wars compared to more recent wars.

Richardson’s insights concerning power laws are discussed by Pinker (2011) in his international best-seller The Better Angels of Our Nature . There, he argues that violence in a wide sense, including crime, torture, animal cruelty and war, has declined. Power laws also form the basis of empirical investigations that challenge Pinker’s conclusions about the decline of war and the long peace. In Cederman, Warren & Sornette (2011) , a sequence of 118 war sizes from 1495 till 1997 is modelled with power law distributions. The authors find a shift in the power law parameter in 1789, indicating larger wars after that year compared to the period before. Cirillo & Taleb (2016) build their own database of war deaths from year 1 to the present. They use statistical models with power law tails and find that their dataset is well enough described by a single, stationary model. Clauset (2017 , 2018 ) examines the CoW data discussed below, models the size of interstate wars with power laws, and finds that he cannot reject the null hypothesis of no change. Indeed, he argues that the current trend would have to persist for 150 years until we could statistically claim that the world had become more peaceful.

and a positive parameter θ . This means that the probability of observing an event, in our case a war, of size larger than z is inversely proportional to z raised to θ . If θ is large this probability quickly decreases with z , but if θ is smaller P ( Z > z ) can stay considerable even for large z . This last characteristic is sometimes referred to as the ‘fat-tailed’ property and entails a non-negligible probability of observing truly enormous events. Often the power law distribution is only appropriate for observations larger than some threshold z 0 , a point we will return to in the methods section.

Both the time between wars and the size of each war are relevant for investigating whether the world has become more peaceful. A peaceful world could be characterized by fewer wars (i.e. longer time between wars), smaller wars, or both. Note a potential caveat concerning the assumed connection between a decline in war sizes and arguments about whether the world is becoming more peaceful. Fazal (2014) argues that the risk of dying in war has declined because of the revolution in military medicine: war may see just as many casualties as before but fewer deaths, since modern medicine is able to save more lives. We will not explore this hypothesis in our article.

Efforts to uncover trends in armed conflict have a long history and date back at least to the seminal contributions of Lewis Fry Richardson (1948 , 1960 ). Richardson assembled datasets of historical wars and sought to uncover long-term patterns by statistical modelling of various quantities, for example the time between wars and also the number of fatalities in each war. We will consider the Correlates of War (CoW) interstate conflict dataset ( Sarkees & Wayman, 2010 ), see Figure 1 , which we discuss in a bit more detail below. For now, consider a general war dataset consisting of

When introducing covariates in this change-point model, there are some issues to consider. First, one can either assume that the covariate effect has changed across the change-point, or that it has remained constant (so β L = β R ). This choice might depend on prior knowledge, or be decided based on some model selection criteria. Secondly, one must be aware that inclusion of covariates might alter the change-point inference (compared to a model without covariates).

Note that some of the wars have missing democracy scores. We remove these observations and end up with 90 wars for this analysis. The full model has now become moderately complex, with parameters θ L , μ L ,0 , β L to the left, θ R , μ R ,0 , β R to the right and a common α, in addition to the change-point τ .

Assume that we have covariate information w i for each war. In this illustration, the covariate is the mean democracy score of the countries involved in each war, measured the year before the war started. To measure democracy, we utilize the Polity index from the Polity IV dataset ( Marshall & Jaggers, 2003 ). The Polity index scores regimes on a − 10 to 10 scale, where − 10 are the most autocratic regimes and 10 the most democratic. The covariate will be negative when a war involves mostly autocratic regimes, and large and positive if a war involves only democracies. Here, we will let the covariate influence the scale parameter μ of the inverse Burr:

The change-point method above is sufficiently general to support the inclusion of covariates influencing the model parameters, for example democracy scores, as we will see. For simplicity of presentation, we will present the inclusion of a single covariate to the inverse Burr model described above; in the Online appendix we give a more general treatment.

In our analysis, we will use the change-point method briefly discussed here along with the inverse Burr model described in the previous section. In addition to the choice of distribution, the modeller also needs to decide on which parameters of the distribution should be allowed to be (potentially) influenced by the change-point. For the model in Equation ( 4 ), we allow θ and μ to change, but assume the same α across the change-point. We then end up with a total of six parameters to estimate: the change-point τ , along with ( α , μ L , θ L , μ R , θ R ) .

The change-point method of Cunen, Hermansen & Hjort (2018) also allows us to construct confidence curves for the degree of change associated with the change-point. The degree of change is a one-dimensional parameter, called ρ , defined as a function of the model parameters on both sides of τ , and meant to capture the size and direction of the change. Usually it will be in the form of a ratio or a difference; here we will study the ratio between quantiles of war sizes on each side of τ . Confidence curves for the degree of change, cc ( ρ ) , are displayed in the results section. Importantly, cc ( ρ ) takes into account the uncertainty in the change-point position. The confidence curves for the degree of change can therefore be considered an implicit homogeneity test. The change-point method described here always gives a point estimate for the change-point position, but if the degree of change analysis indicates that the magnitude of the change is very small, or highly uncertain, there is no reason to argue that there really has been a shift in the distribution. Conversely, if the degree of change analysis indicates a change of large and significant magnitude, one may put faith in the existence of a change.

In the Online appendix we provide a short technical overview of the change-point method we have used. The version of the method used here only allows for a single change-point in the sequence of data. Importantly, the method involves maximum likelihood estimators of the model parameters, γ ^ L to the left and γ ^ R to the right, and of the change-point parameter τ ^ . The confidence curve cc ( τ ) is based on the deviance function and its construction requires computer simulations. Ideally, the results presented here should not be too sensitive to the choice among various change-point methods. The chosen method is easy to use and highly flexible, and relies on a natural extension of general likelihood theory to change-point parameters. It can be used in connection with any parametric model for the data and allows for changes in one, some, or all of the p model parameters inside γ L and γ R . Thus, it allows the user to discover more complex changes than simple jumps in the mean level (which parts of the change-point literature are constrained to).

There are many ways in which to search for a change-point in a sequence of data; see Frigessi & Hjort (2002) for a broad introduction to a special journal issue on discontinuities. Here we employ change-point machinery developed in Cunen, Hermansen & Hjort (2018) , both for spotting a potential change-point and, crucially, for assessing its uncertainty. To assess uncertainty and present our result, we use confidence curves (see Schweder & Hjort, 2016 ). The confidence curves can be understood as graphical generalizations of confidence intervals. They present the uncertainty at all levels of confidence, instead of just a single confidence interval at some arbitrary level of confidence (typically 95%). See the results section for more on the interpretation of confidence curves.

When faced with a sequence of observations, change-point methodology is used to search for where the point of maximal distributional change occurs. More formally, we have observations z 1 , … , z n from some parametric model, say f ( z , γ ) , where γ is of dimension p . Assume that there is a change-point τ in the sequence, where the model parameter changes from γ L for i ≤ τ to γ R for i ≥ τ + 1 . The aim of a change-point analysis is to estimate τ and, importantly, to assess the uncertainty around it. Subsequently, one should also assess the degree of change associated with the change-point, in order to investigate the magnitude and direction of the change, and thereby assess whether the change is large enough to have any practical importance.

There are other distributions with power law tails, and the choice between these models should ideally not influence the reported results to a great extent, as long as the chosen model has a reasonably good fit to the data. In the Online appendix, we examine goodness of fit, some model selection with the focused information criterion, and also report results using other parametric models.

Another option is to model the entire dataset, which in our case only has wars of sizes 1,001 and more (see Online appendix Section D), with a distribution that fulfils the power law requirement in the tails. Generally speaking, the distribution function F ( z ) for the z i is said to have power law tails, with power index b , if z b { 1 − F ( z ) } tends to a positive constant as z increases. One such model is the inverse Burr distribution, taking

In order to use the change-point method from Cunen, Hermansen & Hjort (2018) we need a parametric model for the war sizes, z i . As discussed above, we want to use a model with power law behaviour. One general option is to use the power law distribution directly, see Equation ( 2 ). For most datasets, the power law distribution will not fit well for the entire dataset, but only for observations larger than a certain threshold, that is, z i ≥ z 0 has a density proportional to z i − ( θ + 1 ) . Then, one needs to estimate both the parameter θ and the tail-index threshold z 0 . We investigate this approach in the Online appendix; related approaches are used in Clauset (2017 , 2018 ). This model is simple to use, but does not directly utilize the observations below the threshold z 0 and may therefore entail some loss of information compared to the next option. In the following, we will refer to this model as the ‘simple power law’ model.

Importantly, the H n plot may be utilized for the one-sided case where a change is assumed to have a given direction, on a priori grounds, thus yielding bigger detection power than with a two-sided version. Also, the method works for non-parametrically defined μ . In order to find the p-value for the test, one needs to work out the distribution of the H n process. We present these derivations in the Online appendix. There we also investigate a different homogeneity test based on a weighted Kolmogorov-Smirnov statistic.

Here μ ^ L = μ ^ 1 , τ and μ ^ R = μ ^ τ + 1 , n , along with κ ^ L and κ ^ R being estimates of the relevant standard deviations, to the left and to the right, in the usual setup where μ ^ a , b is approximately normal with variance of the form κ 2 / ( b − a + 1 ) . The function H n ( τ ) can be plotted for all potential τ values, and also provides natural test statistics for H 0 , for example H n , max = max c ≤ τ ≤ d | H n ( τ ) | , along with one-sided versions. The null hypothesis of homogeneity is rejected if H n ( τ ) takes values sufficiently far from zero. In addition, the plot of H n ( τ ) will indicate the position τ ^ at which the plot is farthest away from zero, which may serve as an estimate of the change-point (but from an entirely different perspective than the change-point method we present below).

Suppose a sequence of observations y 1 , … , y n is registered over time, and that one wishes to query the null hypothesis H 0 that the distribution generating the sequence has remained constant, against the alternative that somewhere a change has taken place. Assume μ is a parameter of particular interest, like the median or standard deviation, with μ ^ a , b the estimate of this quantity based on the stretch of data y a , … , y b . For each candidate position τ , inside a relevant pre-defined interval of time [ c , d ] , consider the relative difference in estimated μ , to the left and to the right, via

In the first subsection, we construct a non-parametric homogeneity test. Since this test indicates non-homogeneity (see the results section), we proceed with our change-point framework. First, we present parametric models for the war sizes, before presenting our change-point method. In the last subsection, we explain the inclusion of covariates.

found for the change-point analysis without covariates. The most interesting parameters, in this context, are(estimate −0.007, 90% interval) and(estimate −0.163, 90% interval). The estimatedis close to zero and its confidence interval covers zero, while the interval forindicates that the scale parameter decreases as the mean democracy score increases. The changing effect of democracy is reflected in, showing the fitted median as a function of mean democracy on both sides of the change-point. Before 1950 the median number of battle-deaths is almost constant across democracy scores, while after 1950 the median number of battle-deaths decreases sharply with increasing democracy.

We include the democracy covariate and allow the effect of democracy to change across the change-point. The inclusion of the covariate changes the point estimate of the change-point somewhat, from 1950.483 to 1967.431 (the Six Day war). The Korean war is still given high confidence and we have therefore performed follow-up analysis taking the 1950.483 change-point as given. When it comes to parameters θ L , μ L ,0 , θ R , μ R ,0 , α , estimates with precision correspond roughly to those

Figure 4B gives a different way to visualize the change in distribution at the estimated change-point. The red dots are wars taking place before the Korean war, and the black dots are the wars after. The lines are the fitted complement cumulative distribution functions (CDFs), That is, 1 minus the fitted CDFs, on the log-log scale for the inverse Burr distribution on each side of the estimated change-point. The vertical dashed lines indicated the fitted medians and upper quartiles, and again we observe that the difference between the two distributions is larger for the higher quantiles. We also see that for

Figure 4A gives the confidence curves for the two degree of change parameters described above. These are computed with the simulation based method described in Section C of the Online appendix. The confidence curves reveal that the ratio between upper quartiles is significantly larger than 1 on the 95% level, whereas the ratio of medians is larger than 1 only at somewhat lower confidence levels. Thus, the upper quartiles on each side of the potential change-point are significantly different on a 5% level. This analysis is not conditional on a given change-point value, but takes into account the uncertainty in the change-point position.

Here we use q = 0.50 and q = 0.75 to estimate the medians and the upper quartiles, respectively. Note that the number 1,001 here simply serves to bring the quantiles back to the battle-death scale. The point estimates via the inverse Burr are ρ ^ 1 = 2.15 and ρ ^ 2 = 4.25 . The fitted median decreases from 10,129 battle-deaths pre 1950 to 4,721 after the change-point. The upper quartile decreases from 63,545 to 14,943 battle-deaths.

For the inverse Burr model in Equation ( 4 ), the estimated parameters are: α ^ = 0.499 , μ ^ L = 43887 , θ ^ L = 0.702 , μ ^ R = 10940 , θ ^ R = 1.022 . We assess the direction and magnitude of the potential change by computing confidence curves for the degree of change. We examine the ratio between certain quantiles before and after the estimated change-point, ρ 1 = ϕ 0.50 , L / ϕ 0.50 , R and ρ 2 = ϕ 0.75 , L / ϕ 0.75 , R , with L and R again referring to the parameters to the left and to the right of the change-point. When the bigger wars are of primary interest, the ratio ρ 2 of the upper quartiles would be more relevant to assess than the ratio ρ 1 of medians. With the inverse Burr we have the following expression for the 100 q % quantile,

The full uncertainty around the point estimate is given by the confidence curve in Figure 3 . The potential change-point values are on the horizontal axis, while the degree of confidence is on the vertical axis. The confidence curve hits zero at the point estimate (1950), and we can read off confidence intervals at all levels. Note that these intervals can consist of disjoint parts. Clearly there is uncertainty in the change-point position; we see that the 95% confidence interval, indicated by the red horizontal line in the figure, encompasses the whole range of possible change-point values. The 80% interval encompasses only 30 war-onset-times however, most of them from 1939 to 1992, but with ‘gaps’. Note that the analysis places considerable confidence on three war-onset-times in the dataset in addition to the point estimate, especially 1965.103 (the Vietnam war), 1939.669 (WW2) and 1982.236 (the Falklands war).

Our change-point method provides the maximum likelihood estimate for the change-point at τ ^ = 1950.483 . Thus, the point of maximal change in the parameters of the inverse Burr model is found between the 60 wars up to and including the Korean war on the one side and the 35 wars following the Korean war on the other side.

The p-values, for monitoring the no-change hypothesis with respect to quantiles, become even smaller for higher quantiles than 0.75. Thus the battle-death distribution has not remained constant over time. More specifically, plots such as those in Figure 2 reveal that there are clearer changes in the upper parts of the distribution than in the lower parts.

For the sequence of log-battle-deaths y i = log z i for i = 1 , … , n = 95 , we may compute, display and analyse H n plots of Equation ( 3 ) for any relevant choice of focus parameter μ . Figure 2 displays H n plots for the median F − 1 ( 0.50 ) and upper quartile F − 1 ( 0.75 ) , with maxima 1.436 and 2.746, respectively. When looking at the median level we cannot reject the null hypothesis of homogeneity at any ordinary level. For the upper quartile, however, the maximum of 2.746 is significantly high. Computing an associated p-value uses theory from the results section, with a one-sided version of the test statistic, since we judge it a priori clear that the battle-death distribution has not gone up after WWII. The exact p-value depends on the time range [ c , d ] used. We take d = 1987 , to allow ten wars to the right, in order for the statistical approximation theory to work well. With c = 1909 we find the p-value equal to

Discussion

Recent contributions, reviewed above, argue that there is no clear evidence of change in the sizes or the times between interstate wars since 1816. In contrast, we find evidence that a change in the distribution of war sizes has taken place, and that it may have happened in the years after WWII, rather than in 1945 which is the assumed change-point within the current literature. We stress that the results from the change-point analysis are open to interpretation. On one hand there is considerable uncertainty in the change-point position: the 95% interval for τ covers the entire range of possible change-point positions. Some readers will thus interpret Figure 3 as favouring the ‘no-change’ hypothesis. On the other hand, the figure also indicates that all the most likely candidates for the change-point positions are found either at or after WWII. Moreover, the degree of change analysis shows a significant decrease in battle-deaths after the change, at least when considering the upper quartiles. The change in the parameters of the distribution of battle-deaths thus manifests itself in smaller wars in the period after the change-point. On the whole, we interpret our analyses as supporting a decrease in battle-deaths at some point in the time span we are considering. The exact position of the shift remains somewhat uncertain, but the most likely candidate is the Korean war.

Our claim rests upon two distinct analyses. First, we presented a non-parametric test of homogeneity. The test suggests that the sequence of war sizes has not been homogeneous when considering the higher quantiles of the war size distribution; see the results for the upper quartiles in Figure 2. With this test the null hypothesis of no change is rejected at the 5% level. Second, we have conducted a change-point analysis. Here, we needed a parametric model for the data, and we found suitable models among the class of models with power law tails.

We have also introduced the use of covariates – pointing towards further modelling efforts including mechanisms and explanations. In addition to enriching the long peace debate by generating hypotheses concerning the long-term characteristics of interstate wars, we have also introduced models and methods to the peace research literature. In the rest of this section, we will discuss our findings on various levels. First, we will take a critical look at our approach and report on some robustness checks we have conducted. Then we will explore connections between our contribution and related articles, both in terms of methods and results. Finally, we will discuss our findings in light of the general peace research literature, and in particular consider some theoretical explanations.

Robustness of our approach Statistical analyses require a series of assumptions and some level of abstraction to get from a real world question to a statistical question. Here, we return to some of the choices we discussed in the beginning and attempt to assess their influence on our results. For our statistical modelling we have been guided by previous works using power law distributions. There have been a few attempts to give a theoretical justification to the power law behaviour of war sizes (see e.g. Cederman, 2003), but for most authors, including Richardson, the power law models have been used as essentially descriptive models, that is, as ‘lower dimensional representations’ allowing us to assess potential regularities given the inherent variation in the data. In that case, it is particularly important that the model fits well to the data – that the distribution of war sizes according to the model is close to the actually observed war size distribution. We have therefore conducted various goodness of fit evaluations, for example the log-log plot in Figure 4. We see that the data in general have a good fit to the inverse Burr models on each side of the change-point. The clearest deviation from the model is found for the very largest wars, especially among those taking place after 1950. The three largest wars in this period have more battle-deaths than expected under the model. This particular aspect of the data was not successfully accounted for by any of the models we considered (see the corresponding figures in the Online appendix) and would necessitate a more complex model than those considered so far. We have also conducted some goodness of fit tests. On both sides of 1950, the observed data were consistent with having been generated by the fitted inverse Burr distributions ( p L = 0.64 and p R = 0.23 , see details in the Online appendix). Several models within the class of distributions with power law tails provide adequate fit to the data. In order to investigate the sensitivity of our results to the modelling assumptions, we present results for similar change-point analyses assuming two different models for the data in the Online appendix: the simple power law distribution and the inverse Pareto distribution. The inverse Pareto, like the inverse Burr, models the full sequence of 95 war sizes, and we obtained very similar results to those presented in Figures 3 and 4: the same point estimate for the change-point, τ ^ = 1950.483 , and similar looking confidence curves for both τ and the parameters representing the degree of change. This is not surprising since the inverse Pareto distribution is just a simplification of the inverse Burr. With the simple power law model the results were somewhat different. Here, we needed to set the tail-index threshold z 0 , and we used z 0 = 7061 , see details in the Online appendix. The subsequent change-point analysis then makes use of only the 51 wars larger than z 0 . Using this model we found τ ^ = 1965.103 as the point estimate for the change, corresponding to the Vietnam war. We provide the full confidence curve in the Online appendix, and it displays more uncertainty than we saw with the two other models (i.e. wider confidence intervals). In particular, the degree of change analysis indicates that the change was non-significant, in contrast with the analyses with the inverse Burr and inverse Pareto models. The increased uncertainty is related to the reduced sample size. The different estimated change-points, for the full battle-deaths distribution and only the large wars (the simple power law analysis), underscores an important aspect inherent to any change-point exercise. What constitutes a change-point when analysing some aspects of the available data will not necessarily be recognized as a change-point when examining other relevant data. Thus it should not be seen as a paradox that the Vietnam war in 1965 can be a change-point for the extreme tail of the battle-death distribution, whereas perhaps the Korean war in 1950 is more of a change-point when examining more complex models involving the full battle-death distribution. Some readers might question our choice of using a change-point framework at all. As mentioned in the beginning, change-point methods assume a very particular form of change, an abrupt shift in the distribution generating the data. In the case of our change-point method, we have in addition assumed that only a single such shift takes place. Is it realistic to assume that the long peace emerged in that way? Hardly, but a single change-point model could be considered a reasonable approximation to various other patterns, for example to more gradual changes. We are inclined to interpret the change-points we identify here as the culmination of a process that has unfolded over some time. This could apply to several of the mechanisms discussed below.