We project the future accumulation of profiles belonging to deceased Facebook users. Our analysis suggests that a minimum of 1.4 billion users will pass away before 2100 if Facebook ceases to attract new users as of 2018. If the network continues expanding at current rates, however, this number will exceed 4.9 billion. In both cases, a majority of the profiles will belong to non-Western users. In discussing our findings, we draw on the emerging scholarship on digital preservation and stress the challenges arising from curating the profiles of the deceased. We argue that an exclusively commercial approach to data preservation poses important ethical and political risks that demand urgent consideration. We call for a scalable, sustainable, and dignified curation model that incorporates the interests of multiple stakeholders.

We, the Party, control all records, and we control all memories. Then we control the past, do we not? (Orwell, 1949: 313)

Data Three types of data were used to carry out the analysis: projected mortality over the 21st century, distributed by age and nationality; projected population data over the 21st century, also distributed by age and nationality; and current Facebook user totals for each age group and country. Mortality rates were calculated based on UN data, which provide the expected number of mortalities and total populations for every country in the world (United Nations, Department of Economic and Social Affairs, 2017). Numbers are available for each age group – 0 to 100, divided into five-year intervals – and all years from 2000 to 2100, likewise divided into five-year intervals. The estimates are based on official data from each country’s government, and in some cases external sources (esa.un.org/unpd/wpp/DataSources/). It is unclear from the data how precision varies by country and year. All projections are reported as point estimates, with no standard errors or confidence intervals. For a more detailed account of the UN data, see esa.un.org/unpd/wpp/. Facebook data were scraped from the company’s Audience Insights page (facebook.com/ads/audience-insights/) using a custom Python script that extracts Facebook’s active monthly users by country and age. These estimates are based on the self-reported age of users. Facebook provides lower and upper bounds for user totals across all ages and nationalities. For example, there are between 15 and 20 million 25-year-old Indians on the network.2 Variability increases with user counts, both of which are reported in round numbers divisible by 5 or 10, suggesting that they are not meant as serious estimates of standard errors or confidence intervals. We take the midpoint of each country-age window for our analysis. Facebook’s audience insights API provides by far the most comprehensive publicly available estimate of the network’s size and distribution. Nevertheless, we wish to draw attention to several limitations of this dataset. First, there are reasonable doubts about the accuracy of Facebook’s reported monthly active users. The site has recently been sued for allegedly inflating these numbers with the intent of overcharging advertisers (Todd, 2018), and Facebook explicitly notes that their estimates are not meant to be matched with population data. In addition to these concerns about false positives, we also expect false negatives due to users visiting the site less than once a month. Unfortunately, it is impossible to say exactly how this affects results without more fine-grained detail on the distribution of errors. Second, the data exclude users under 18, preventing us from evaluating network activity among 13–17 year olds (Facebook requires all users to be at least 13). Due to the relatively low (although varied) mortality rate of this age group, the missing data should not have much impact on the projection until relatively late in the century. Third, users aged 65+ are all put into the same age category. This gives us less detailed data on penetration rates among the elderly. But as we show in the following section, this problem can be mitigated by extrapolating from a smooth curve fit to data from younger users. Finally, we wish to emphasize that our model is devoted to the future development of death on Facebook, and therefore leaves out users who have already died and left profiles behind. Estimating the current number of dead profiles would require historical data on the age distribution of Facebook users in various countries, which are currently inaccessible through the site’s API. Furthermore, the aim of the study is to depict a larger, long-term trend, in which the current numbers play only an illustrative role.

Methodology Our methodological approach can be summarized by the following procedure for each country: estimate a function f mapping age and year to expected mortality rates (see Figure 1(a)); estimate a function g mapping age to expected active monthly Facebook users (see Figure 1(b)); extend g across time under two alternative scenarios (details below); multiply the outputs of f and g to estimate the number of Facebook profiles belonging to dead users of a given age in a given year (see Figure 1(c)); and integrate this product across all age groups to estimate the number of dead profiles in a given year. Download Open in new tab Download in PowerPoint This pipeline is repeated for each country to get a global estimate. Projections are integrated over several years to get national or global estimates over time. It should be noted that this approach makes a substantive and potentially problematic assumption, namely that each country’s Facebook users constitute a representative sample of the population, at least with respect to mortality rates. It is well established that internet usage, especially in developing economies, is strongly correlated with education and income (PEW Research Centre, 2018: 15). These two variables are in turn correlated with life expectancy, which means there is reason to believe that current Facebook users will live slightly longer than non-users on average. Our model does not account for this potential bias, which may result in an overestimation of dead users in developing countries. However, a recent PEW research report (2018: 15) indicates that the divide is rapidly shrinking. Between 2015 and 2017, social media penetration in countries such as Lebanon, Jordan and the Philippines rose by more than 20 percentage points, suggesting that connectivity is fast becoming increasingly accessible. This trend is expected to continue throughout the 21st century, mitigating any potential confounding effects on projections years or decades out. Furthermore, the closer we get to full market saturation, the smaller the bias becomes since people with high and low life expectancies are both joining the network in large numbers. In the face of this, it is important to stress that the value of the present study lies in the larger trends it identifies, not in the details of the immediate future development. This should be kept in mind when assessing very short-term scenarios. The model described in step (2) was trained on 2018 data. We vary projections for future Facebook growth according to two scenarios: (A) Shrinking. No new users join the network. All current users remain until their death. (B) Growing. The network grows at 13% per year across all markets until usership reaches 100%. To help extrapolate beyond the age of 64, the final age for which Facebook provides monthly active user totals, we anchored all regressions with an extra data point of zero users aged 100. This is almost certainly true in all markets, at least to a first approximation. Alternative anchor points may be justified, but do not have a major impact on results. All statistical analysis was conducted in R, version 3.5.1 (R Core Team, 2018). Predictive functions were estimated using generalized additive models (GAMs), which provide a remarkably flexible framework for learning nonlinear smooths under a wide range of settings (Hastie and Tibshirani, 1990). Regressions were implemented using the mgcv package (Wood, 2017). A supplemental methods section, including data and code for reproducing all figures and results, can be found online at: https://github.com/dswatson/digital_graveyard. We fit three separate models for each country Mortality _ Rate = f C ( Time , Age ) FB _ Users _ 2018 = g C ( Time = 2018 , Age ) Population = h C ( Time , Age ) The subscript C indicates that each model is country-specific. We omit the subscript for notational convenience moving forward. The mortality and population models provide nonlinear interpolations so that we can make predictions for any age-year in the data without the limitations imposed by the UN’s binning strategy. Under Scenario A, we extrapolate model g beyond 2018 by assuming that no new users join Facebook and current users leave the network if and only if they die. This means we see zero 18-year-olds on the network in 2019, zero 18- or 19-year-olds in 2020, and so on. Attrition from current users can be calculated recursively. For each year t and age a: Scenario A FB _ Users = g ( Time = t , Age = a ) = g ( Time = t - 1 , Age = a - 1 ) × ( 1 - f ( Time = t - 1 , Age = a - 1 ) ) In Scenario B, we extrapolate beyond g by assuming that Facebook will see constant growth of 13% per year in all markets until reaching a cap of 100% penetration. For each year t and age a: Scenario B upper _ bound = h ( Time = t , Age = a ) FB _ proj = g ( Time = t - 1 , Age = a - 1 ) × 1 . 13 t - 2018 FB _ Users = g ( Time = t , Age = a ) = min ( upper _ bound , FB _ proj ) In both cases, our true target is y = ∫ 13 100 ∫ 2018 2100 f ( Age , Time ) g ( Age , Time ) d ( Age ) d ( Time ) For the mortality rate model f, we used beta regression with a logit link function, a common choice for rate data. For the Facebook model g, we used negative binomial regression with a log link function, which is well suited for over-dispersed counts such as those observed in this dataset. We experimented with several alternatives for the population model h, ultimately getting the best results using Gaussian regression with a log link function. Parametric specifications for each model were evaluated using the Akaike information criterion (Akaike, 1974), a penalized likelihood measure. Age and time were incorporated as both main effects and interacting variables in models f and h, which were fit with tensor product interactions in a functional ANOVA structure (Wood, 2006). We use cubic regression splines for all smooths, with a maximum basis dimension of 10. Parameters were estimated using generalized cross-validation.

Uncertainty While there remains no good way to evaluate the precision of the underlying data – as noted above, neither the UN nor Facebook provides confidence intervals – we may quantify the uncertainty of the model using nonparametric techniques. GAMs provide straightforward standard errors for their predictions, but under both scenarios our true target y is a double integral of a product of two vectors. Unfortunately, there is no analytic method for calculating y’s variance as a function of those variables without making strong assumptions that almost certainly fail in this case. For that reason, we measure uncertainty using a Bayesian bootstrap (Rubin, 1981). To implement this algorithm, we sample n weights from a flat Dirichlet prior and fit the models using these random weights. We repeat this procedure 500 times for each country and scenario, providing an approximate posterior distribution for all predictions, from which we compute standard errors. These numbers are reported in parentheses next to point estimates in the text, and in their own column in all table summaries.

Conclusion This study has provided the first rigorous projection of the accumulation of Facebook profiles belonging to the deceased. Will the dead then, ‘take over’ Facebook? We have concluded that hundreds of millions of dead profiles will be added to the network in the next few decades alone, and that the dead may well outnumber the living before the end of the century, depending on how global user penetration rates evolve. Irrespective of how the network grows in the years to come, the vast majority of dead profiles will belong to users from non-western countries. Considering its global reach, we have argued that the totality of deceased user profiles amounts to something beyond the sum of its parts. These profiles are becoming part of our collective record as a species, and may prove invaluable to future generations. We believe that a multi-stakeholder approach is the best way to curate such a vast archive. We have also stressed that in crafting a future curation model, qualitative understanding of how different cultures make sense of death and the digital will be key. Likewise, the development poses difficult ethical problems that require careful consideration. The onus is now on policymakers and industry to rise to these challenges. We look forward to taking part in the debates to come.

Acknowledgements We wish to express our sincere gratitude to the four referees who reviewed this study. Their insights and input have substantially improved the final result. We would also like to express our thanks to Patrick Gildersleve for helping us with the Python script that scraped data from the Facebook API.

Declaration of conflicting interests The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding The author(s) received no financial support for the research, authorship, and/or publication of this article.

Notes 1

Users are also encouraged to select a ‘legacy contact’ that will steward the account upon their death. 2

For the record, it should be noted that while Facebook is popular in most countries, it faces considerable competition from for example VKontakte in Russia, and is almost completely absent in other places (China, North Korea, etc.). This, however, is not our main concern.