I implement random sampling from the extremes/order statistics of the Gompertz survival distribution, used to model human life expectancies, with the beta transformation trick and flexsurv /root-finding inversion. I then discuss the unusually robust lifespan record of Jeanne Calment, and show that records like hers (which surpass the runner-up’s lifespan by such a degree) are not usually produced by a Gompertz distribution, supporting the claim that her lifespan was indeed unusual even for the record holder.

The Gompertz distribution is a distribution often used to model survival curves where mortality increases over time, particularly human life expectancies. The order statistics of a Gompertz are of interest in considering extreme cases such as centenarians.

The usual family of random/density/inverse CDF (quantile) functions ( rgompertz / dgompertz / pgompertz / qgompertz ) are provided by several R libraries, such as flexsurv , but none of them appear to provide implementations of order statistics.

Using rgompertz for the Monte Carlo approximation works, but like the normal distribution case, breaks down as one examines larger cases (eg considering order statistics out of one billion takes >20s & runs out of RAM). The beta transformation used for the normal distribution works for the Gompertz as well, as it merely requires an ability to invert the CDF, which is provided by qgompertz , and if that is not available (perhaps we are working with some other distribution besides the Gompertz where a q* function is not available), one can approximate it by a short root-finding optimization.

So, given the beta trick where the order statistics of any distribution is equivalent to U(k)∼Beta(k,n+1−k), we can plug in the desired k-out-of-n parameters and generate a random sample efficiently via rbeta , getting a value on the 0–1 range (eg 0.998879842 for max-of-1000) and then either use qgompertz or optimization to transform to the final Gompertz-distributed values (see also inverse transform sampling).

As mentioned, other distributions work just as well. If we wanted a normal or log-normal distribution instead, then we simply use qnorm / qlnorm :

An example where simulating from the tails of the Gompertz distribution is useful would be considering the case of super-centenarian Jeanne Calment, who has held the world record for longevity for over 22 years now: Calment lived for 122 years & 164 days (122.45 years) and was the world’s oldest person for almost 13× longer than usual, while the global runner-up, Sarah Knauss lived only 119 years & 97 days (119.27 years), a difference of 3.18 years. This has raised questions about what is the cause of what appears to be an unexpectedly large difference.

Empirically, using the Gerontology Research Group data, the gaps between verified oldest people ever are much smaller than >3 years; for example, among the women, Knapp was 119, and #3–9 were all 117 year old (with #10 just 18 days shy of 117). The oldest men follow a similar pattern: #1, Jiroemon Kimura, is 116.15 vs #2’s 115.69, a gap of only 1.8 years, and then #3–4 are all 115, and #5–7 are 114, and so on.

In order statistics, the expected gap between successive k-of-n samples typically shrinks the larger k/n becomes (diminishing returns), and the Gompertz curve is used to model an acceleration in mortality that makes annual mortality rates skyrocket and is why no one ever lives to 130 or 140 as they run into a ‘mortality spike’. The other women and the men make Calment’s record look extraordinary, but order statistics and the Gompertz curve are counterintuitive, and one might wonder if Gompertz curves might naturally occasionally produce Calment-like gaps regardless of the expected gaps or mortality acceleration.

To get an idea of what the Gompertz distribution would produce, we can consider a scenario like sampling from the top 10 out of 10 billion (about the right order of magnitude for the total elderly population of the Earth which has credible documentation over the past ~50 years), and, using some survival curve parameters from a Dutch population paper I have laying around, see what sort of sets of top-10 ages we would expect and in particular, how often we’d see large gaps between #1 and #2:

simulateSample <- function (total_n, top_k) { sort ( sapply (top_k : 0 , function (k) { rKNGompertz ( 1 , total_n, total_n - k, 1.1124 , 0.000016443 ) } )) } round ( digits= 2 , simulateSample ( 1e+10 , 10 )) # [1] 110.70 110.84 110.89 110.99 111.06 111.08 111.25 111.26 111.49 111.70 112.74 simulateSamples <- function (total_n, top_k, iters= 10000 ) { t ( replicate (iters, simulateSample (total_n, top_k))) } small <- as.data.frame ( simulateSamples ( 10000000000 , 10 , iters= 100 )) large <- as.data.frame ( simulateSamples ( 10000000000 , 10 , iters= 100000 )) summary ( round (small $ V11 - small $ V10, digits= 2 )) # Min. 1st Qu. Median Mean 3rd Qu. Max. # 0.0000 0.0975 0.2600 0.3656 0.5450 1.5700 summary ( round (large $ V11 - large $ V10, digits= 2 )) # Min. 1st Qu. Median Mean 3rd Qu. Max. # 0.00000 0.15000 0.35000 0.46367 0.66000 3.99000 mean ((large $ V11 - large $ V10) >= 3.18 ) * 100 # [1] 0.019 library (ggplot2) library (reshape) colnames (small) <- as.character ( seq ( - 10 , 0 , by= 1 )) small $ V0 <- row.names (small) small_melted <- melt (small, id.vars= 'V0' ) colnames (small_melted) <- c ( "V0" , "K" , "Years" ) ggplot (small_melted, aes ( x = K, y = Years)) + geom_path ( aes ( color = V0, group = V0)) + geom_point () + coord_cartesian ( ylim = c ( 110 , 115 )) + theme ( legend.position = "none" )

100 simulated samples of top-10-oldest-people out of 10 billion, showing what are reasonable gaps between individuals

With these specific set of parameters, we see median gaps somewhat similar to the empirical data, but we hardly ever (~0.02% of the time) see #1–#2 gaps quite as big as Calment’s.

The graph also seems to suggest that there is in fact typically a ‘jump’ between #1 & #2 compared to #2 & #3 and so on, which I was not expecting; thinking about it, I suspect there is some sort of selection effect here: if a random sample from ‘#1’ falls below the random sample of ‘#2’, then they will simply swap places (because when sorted, #2 will be bigger than #1), so an ‘unlucky’ #1 disappears, creating a ratchet effect ensuring the final ‘#1’ will be larger than expected. Any k could exceed expectations, but #1 is the last possible ranking, so it can become more extreme. If I remove the sort call which ensures monotonicity with rank, the graph looks considerably more irregular but still has a jump, so this selection effect may not be the entire story:

100 simulated samples of top-10-oldest-people out of 10 billion; simulated by rank, unsorted (independent samples)

So, overall, a standard Gompertz model does not easily produce a Calment.

This doesn’t prove that the Calment age is wrong, though. It could just be that Calment proves centenarian aging is not well modeled by a Gompertz, or my specific parameter values are wrong (the gaps are similar but the ages are overall off by ~5 years, and perhaps that makes a difference). To begin with, it is unlikely that the Gompertz curve is exactly correct a model of aging; in particular, gerontologists note an apparent ceiling of the annual mortality rate at ~50%, which is inconsistent with the Gompertz (which would continue increasing arbitrarily, quickly hitting >99% annual mortality rates). And maybe Calment really is just an outlier due to sampling error (0.02% ≠ 0%), or she is indeed out of the ordinary human life expectancy distribution but there is a more benign explanation for it like a unique mutation or genetic configuration. But it does seem like Calment’s record is weird in some way.