The standard approach for calculating years of life lost is to apply the distribution of ages among those who died from a specific cause to a standard life-table. For the purposes of international comparison, we opted to use the WHO 2010 Global Burden of Diseases table as the reference 13 , which presents YLL by age, but not by sex or extent of multimorbidity. This method involves summing the expected years of life remaining from the table according to the number (or for the mean YLL the proportion) of people dying within each age-band. We applied the age distribution of COVID-19 deaths in Italy from published data to estimate the YLL.

The remainder of the methods describes our approach to estimating YLL accounting for number and type of underlying LTC, along with age and sex. Our modelling comprised three main components: (i) estimating the prevalence of, and correlations between, LTCs among people dying with COVID-19; (ii) modelling UK life expectancy based on age, sex, and each combination of these LTCs separately; and (iii) combining these models to calculate the estimated YLL per death with COVID-19. These are summarised by age-group, sex, and multimorbidity counts (that take into account different combinations of LTCs).

Rapid review

To inform our estimates of number and type of LTCs, we performed a rapid review to identify data on underlying conditions for people dying with COVID-19. We searched the WHO repository of COVID-19 studies on 24th March 2020. To identify studies reporting data on LTCs among people who had died from Covid-19, we screened titles and abstracts of all epidemiological, clinical, case-series and review articles (n=1685). We identified and screened 77 potentially relevant full-text articles, of which four reported aggregate data on LTCs among people who had died of COVID-19. Three were small studies (32, 44, and 54 deaths, respectively) based in Wuhan, China5–7. However, the fourth was a comprehensive report from the Istituto Superiore di Sanità (ISS) (published each Tuesday and Wednesday) including data on 11 common LTCs (ischaemic heart disease, atrial fibrillation, heart failure, stroke, hypertension, diabetes, dementia, chronic obstructive pulmonary disease, active cancer in the past 5 years, chronic liver disease and chronic renal failure), as well as the number of patients who had 0, 1, 2 or ≥3 LTCs for 701 of the 6801 people who died with COVID-19 in Italy14. In view of the smaller sizes of the Chinese studies, and the greater dissimilarity of these populations with the UK relative to the Italian data, we opted not to include these in the analysis. These data were used to construct a plausible scenario for the prevalence of combinations of LTCs among people who died from COVID-19 for the modelling presented here.

Long-term condition prevalence and correlation models. This first stage of our modelling aimed to estimate the prevalence and correlation between specific LTCs among people dying with COVID-19.

We utilised aggregate data on COVID-19 deaths from the Istituto Superiore di Sanità in Italy. Since we were unable to obtain IPD for the Italian case-series of deaths from COVID-19, we had to infer the joint prevalence of LTCs from the summarised information available, i.e. the marginal distribution of multimorbidity counts (the row sums, or total number of diseases for each patient, wherein counts of ≥3 LTCs were collapsed into the single category of 3+) and the marginal distributions of LTC frequency (the columns sums, or the total number of patients with each LTC). To that end, we developed a Bayesian latent process model of disease prevalence and correlation and fitted it using Markov chain Monte Carlo (MCMC) to both elements in the published data. This analysis was applied jointly to the small number of deaths that had occurred in Scotland, primarily to aid convergence in Bayesian model fitting by providing some information about the correlation between LTCs15. The Scottish subset of the data contained a partial record of known LTCs for individual patients, but the multimorbidity count per patient, as well as the marginal frequency of each LTC, were missing (hence, modelled as latent). Bayesian priors for the correlations between diseases were specified with a tendency to zero (shrinkage). Numerical investigations indicated little sensitivity of convergence to the strength of shrinkage, so we opted for weak shrinkage as a precautionary approach. This model gave us the full matrix of correlations between every combination of LTCs at the level of individuals, therefore providing us with a complete dependence structure of LTCs presented within the sample of COVID-19 mortalities. In order to propagate uncertainty through the analysis, from this fitted model (effective sample size of MCMC 410) we simulated 10,000 notionally “typical” patients, with plausible combinations of LTCs (under the combined Italian and Scottish data).

To test the sensitivity of our findings to the estimated correlations, we also estimated the YLL under two opposite extremes (i) that LTCs were independent and (ii) that LTCs were highly correlated. Unlike the Bayesian LTC mode, these sensitivity analyses did not use the information on the multimorbidity counts from the ISS report, but only the proportion of patients with each of the eleven comorbidities. For the “independent” scenario we created 11 vectors comprising 1s and 0s (respectively with and without the long term condition) corresponding in length to the number of patients. We then sampled from these vectors with replacement to obtain 10,000 simulated patients. For the “highly correlated” scenario we first sorted each vector, then combined them to form a 710x11 matrix, then sampled each row with replacement to obtain 10,000 simulated patients.

Age models. Next, we modelled the relationship between age and multimorbidity counts among people dying with COVID-19. We were unable to obtain direct estimates of the association between age and extent of multimorbidity among patients who had died from COVID-19. Therefore, we modelled two scenarios: independence between age and multimorbidity count (i.e. no correlation between age and multimorbidity count among people dying of COVID-19), and a positive association between age and multimorbidity count. To inform the latter, we examined data within SAIL for 145 patients who had influenza recorded as the cause of death in their death certificate in 2011. We found that for men, age increased by 4.7 years per unit increase in the number of LTCs until the count reached 6 after which there was no evidence of further increase. For women, the figure was 2.6. Therefore, we performed the modelling assuming that for COVID-19 the mean age increased by 5 years per unit increase in multimorbidity count across the range from 0 to 6 LTCs in men. To allow for some degree of uncertainty around this estimate by sampling from a normal distribution. We arbitrarily chose a standard deviation of 0.5. We estimated this similarly for women, but using a mean increase of age of 3 years per increase in multimorbidity count. We incorporated this information in a model fitted to the summary age data provided in the Italian case report. We obtained 10,000 samples from the posterior distribution for inclusion in the YLL calculations. SAIL analyses were approved by SAIL Information Governance Review Panel (Project 0830). Approval for the use of individual patient data in the analysis was given by the NHS Public Health Scotland Caldicott officer.

Survival models. For patients aged 50 years or older at death, we estimated mortality according to age, sex and combinations of each LTC using the Secure Anonymised Information Linkage (SAIL) databank. SAIL is a repository of routinely collected healthcare data (including primary care, hospital episodes, and mortality data) from a representative sample covering approximately 70% of the population of Wales. From these data, we identified all participants aged over 49 years who were registered with a participating practice for the duration of 2011 (approximately 0.85 million people). This period was selected as electronic coding of diagnoses was well established, and it allowed >6 years of follow-up. Age and sex were extracted from primary care records. We also identified all LTCs for which we had information of COVID-19 deaths from Italy. LTCs were identified using a combination of primary care data (using Read diagnostic codes) and hospital episodes (using ICD-10 codes). Individuals were considered to have a LTC if they had a relevant diagnostic code entered prior to 31st December 2011. Relevant codes were identified from the Charlson comorbidity index and the Elixhauser comorbidity index16,17, which had established algorithms for identification from ICD-10 codes18, and have been adapted for using Read codes in primary care19. Code lists are available in the supplementary material15.

All-cause mortality was assessed by linkage to national mortality registers from 1st January 2012 until August 2018 (last available data). Participants were censored if they de-registered from a participating SAIL practice. We used the flexsurv package in R (version 1.1.1) to fit a Gompertz model treating age as the timescale20. We assessed the fit of this distribution graphically (supplementary material)15. In models stratified by sex we included all the LTCs as main effects as well as age-LTC interactions that improved the model fit in terms of the Akaike information criterion. In sensitivity analyses we also included two-way (comorbidity-comorbidity) and three-way (comorbidity-comorbidity-age) interaction terms for the four comorbidities with the largest effect measure estimates (COPD, heart failure, liver failure and dementia) requiring 12 additional parameters. To propagate uncertainty from the survival models we obtained 10,000 samples of the coefficient estimates by sampling from a multivariate normal distribution corresponding to the coefficients and variance-covariance matrix from the regression models.

Combination of comorbidity and mortality models. In the final analysis, we combined 10,000 samples from all three sources: LTC combination models, age models and survival models. We used the rate and shape parameters with the cumulative distribution function implemented in the flexsurv package to calculate the survival probabilities at 3-month intervals from aged 50 to 120 (to allow all curves to descend to zero). From these times and survival probabilities we estimated the mean survival, or life expectancy.

Bayesian models were written in the JAGS language21 and implemented using runjags for R (version 2.0.4)22, survival models were fit using the flexsurv package in R (version 1.1.1)20, and for the final analysis the model-outputs were also combined in R (version 3.6.1). The 95% uncertainty intervals were obtained using empirical bootstrapping, with the number of samples in the mean equal to the effective sample size from the LTC correlation model. All code, data (except individual-level data for Scotland), intermediate outputs and diagnostic plots are provided on GitHub (https://github.com/dmcalli2/covid19_yll_final)15.