Conclusions Major surgery is associated with a small, long term change in the average cognitive trajectory that is less profound than for major medical admissions. The odds of substantial cognitive decline after surgery was about doubled, though lower than for medical admissions. During informed consent, this information should be weighed against the potential health benefits of surgery.

Results After accounting for the age related cognitive trajectory, major surgery was associated with a small additional cognitive decline, equivalent on average to less than five months of aging (95% credible interval 0.01 to 0.73 years). In comparison, admissions for medical conditions and stroke were associated with 1.4 (1.0 to 1.8) and 13 (9.6 to 16) years of aging, respectively. Substantial cognitive decline occurred in 2.5% of participants with no admissions, 5.5% of surgical admissions, and 12.7% of medical admissions. Compared with participants with no major hospital admissions, those with surgical or medical events were more likely to have substantial decline from their predicted trajectory (surgical admissions odds ratio 2.3, 95% credible interval 1.4 to 3.9; medical admissions 6.2, 3.4 to 11.0).

Main outcomes measures The primary outcome was the global cognitive score established from a battery of cognitive tests encompassing reasoning, memory, and phonemic and semantic fluency. Bayesian linear mixed effects models were used to calculate the change in the age related cognitive trajectory after hospital admission. The odds of substantial cognitive decline induced by surgery defined as more than 1.96 standard deviations from a predicted trajectory (based on the first three cognitive waves of data) was also calculated.

Participants 7532 adults with as many as five cognitive assessments between 1997 and 2016 in the Whitehall II study, with linkage to hospital episode statistics. Exposures of interest included any major hospital admission, defined as requiring more than one overnight stay during follow-up.

We addressed these concerns using cognitive data from 7532 adults, investigating whether incident major surgical admissions are related to long term changes in the cognitive trajectory, using five waves of cognitive assessments spanning approximately 20 years, with adjustment for major medical admissions. To facilitate interpretation of results, we translate effect estimates to equivalent years of cognitive aging and relate changes to the effect of stroke, an event with an established impact on cognition. Reference points such as these enable discussions of informed consent with patients, allowing them to weigh the risks of cognitive injury more easily. We primarily aimed to establish the mean population effect of major surgery on cognitive decline. As a secondary outcome we developed a binary outcome of substantial cognitive decline, more analogous methodologically to prior studies of postoperative cognitive decline 10 and consistent with clinically important deviations from the age related cognitive trajectory. 20 It allows some further correspondence to the prior literature and emphasizes cognitive changes that could impact quality of life.

Research on postoperative cognitive decline has several limitations. First, most studies have a single preoperative assessment of cognitive function 8 17 and not a person specific cognitive trajectory before the surgical event. Consequently, any decline detected postoperatively could be falsely attributed to the surgery rather than to the individual’s preoperative cognitive trajectory. Second, studies also typically fail to consider the impact of medical events 5 18 such as stroke, 5 which likely have a large cognitive impact and cluster with major surgical events. Third, most studies are small, with a limited set of confounding factors. Fourth, studies often do not include “positive controls” to aid in the interpretation of any null or borderline finding. Finally, despite notable exceptions, 13 14 19 the duration of cognitive follow-up is typically less than one year, limiting inference on the long term impact of surgery on cognition.

Cognitive decline and dementia are major healthcare concerns at older ages owing to considerable personal and societal burdens. Cognitive decline starts before conventional definitions of old age 1 (often 65 years) and accelerates with aging and accumulation of comorbidities. 2 3 4 Certain health events, such as stroke, can lead to profound changes in the cognitive trajectory such that there is a permanent “step change” in cognitive function. 5 For 60 years a major concern has been that surgery might also drive long term changes in cognition 6 ; our recent survey suggested that 65% of the public are concerned about postoperative cognitive deficits, 7 perhaps leading to refusal of surgery that might otherwise have health benefits. 8 9 Yet studies investigating associations between surgery and long term cognitive outcomes have produced inconsistent results, with reports of cognitive harm, 10 11 no effect, 12 13 14 and cognitive improvement. 15 Despite inconclusive evidence, considerable concern remains about the potential for surgery to induce cognitive impairment. 7 16 Longer life expectancy implies an increasing number of surgical operations in older adults, hence a better understanding of the extent of any change in cognition after surgery is urgently required.

Methods

Study design and participants The Whitehall II study is a prospective cohort study comprised of employees from the British civil service in London based offices. A total of 10 308 people (6895 men and 3413 women, aged 35-55 at enrollment) were recruited between 1985 and 1988. In 1997, when participants were 45-69 years old, a cognitive test battery was introduced to the study and the test repeated four times. Age span across the follow-up was 44-86 years (median age 64); mean 3.8 assessments per person, maximum follow-up 19.4 years (mean 12.9 years).

Exposures The events (major surgical or admissions for medical conditions) were defined as hospital admissions requiring at least two overnight stays (excluding ambulatory or outpatient events) as identified in the hospital episode statistics database of National Health Service hospitals, which covers admissions in England, Scotland, and Wales. High quality data have been available since 1997, with audits of discharge reports indicating a 96% accuracy over our study period.21 Surgical admissions in hospital episode statistics were defined by Office of Population, Censuses, and Services (OPCS) codes (see appendix 1). Our primary definition of major surgery required a hospital admission of at least two nights (this being consistent with definitions currently used in major perioperative clinical trials222324) linked to an OPCS code. Emergency admissions were identified by specific OPCS codes designated as an emergency procedure. Minor surgery was an OPCS coded admission that did not incur a minimum stay of two nights. Medical admissions were identified by ICD-10 codes (international classification of diseases, 10th revision) and similarly required a hospital admission of at least two nights. To limit effects of transfers within hospitals, we linked any admissions within 14 days. If OPCS codes were identified during this time, we treated the entire admission as a surgical admission. This design was used to ensure that complications of surgical admissions were grouped with the operations but might, if anything, weight the analysis toward finding an exaggerated relation with surgery. We retained those admitted during the study period but without cognitive follow-up for use in adjusting baseline cognitive scores (such that surgical admissions thoughout the study period were treated the same when modeling cognitive scores, even if cognitive data were missing later). We also conducted a sensitivity analysis rating procedures for surgical risk based on BUPA (a private health insurance scheme) scores as used in the surgical risk scale.25 Procedures were rated by two authors (RDS and HJM). The surgical risk scale includes various parameters to estimate, including patient comorbidities and the planned procedure, rated for severity on the BUPA scale. We have previously shown that higher BUPA rating of procedure severity (rated by RDS and HJM) is associated with higher risk of 30 day postoperative mortality.26 In our sensitivity analysis, the definition of major surgery required both a BUPA definition of a major procedure and a hospital admission of at least two nights. This analysis was designated as BUPA major. Out of 43 692 entries in hospital episode statistics during the study period (for 7532 participants), 35 099 remained after linkage. We compared major surgical events for the entire group with those for whom cognitive follow-up was available, organized according to higher risk surgical categories that have plausible associations with cognitive outcomes (cardiac, thoracic, vascular, and intracranial operations).1011 The proportions were similar, suggesting that cognitive follow-up was available for a representative cross section of operations in the study population.

Outcomes The primary outcome was the global cognitive score, calculated from the cognitive domains tested (memory, executive function, and verbal fluency), as in previous analyses of these data.12728 The cognitive test battery was administered in 1997-99 (age range 44-68 years), 2002-04, 2007-09, 2012-13, and 2015-16 (age range 62-86 years). In 1997-99, 556 participants underwent retesting within three months of their initial assessment, with good test-retest reliability (range 0.6-0.9). Memory was tested using a 20 word free recall test where one or two syllable words were presented at two second intervals and participants were asked to write down as many of the words they could recall in two minutes. Executive function was assessed using the Alice Heim 4-I test, which includes 65 verbal and mathematical reasoning items with increasing difficulty. This test measures a participant’s ability to identify patterns and infer principles or rules over the 10 minute test. Verbal fluency was assessed using measures of phonemic and semantic fluency, with participants asked to write as many words as possible beginning with “s” (phonemic) or animal names (semantic) in one minute. The primary outcome of the global cognitive score was calculated by first standardizing the raw scores for each cognitive domain to z scores using mean and standard deviation from the first wave of cognitive data collection (1997-99). Then we summed the z scores across cognitive domains and standardized them to yield the global score. This approach minimizes potential measurement error in any individual test.

Covariates Most covariates were drawn from the 1997-99 assessment, though we coded covariates such as diabetes as occurring “ever” based on all assessments. These comprised of sex, ethnicity, education level, maximum occupational position, diabetes mellitus, and smoking status. Additionally, measures of married or cohabitating status and Framingham cardiovascular disease risk score29 were updated alongside the cognitive assessment. We also included the number of cognitive assessments for each participant as a covariate.

Statistical models We estimated the offset in age related cognitive trajectory associated with cumulative major surgical, non-surgical, and stroke related hospital admissions. We report the 95% credible intervals on these estimates, which are close to confidence intervals derived using maximum likelihood or restricted maximum likelihood estimation. We used Markov chain Monte Carlo simulations to fit linear mixed effects models with random intercepts for participant and random slopes for age, accounting for variation between participants in baseline cognitive performance and in cognitive trajectory. Fixed effects represented the age related cognitive decline and the shift in cognitive performance according to the cumulative incidence of surgical and medical hospital admissions before the time of cognitive assessment. We separated medical admissions for stroke events because of the expected substantial cognitive impact after stroke.5 A quadratic term for age was included to account for an accelerated rate of decline with increasing age.28 Because our focus was on changes after surgery, we also included baseline adjustments for numbers of surgery, medical, and stroke admissions (including events that occurred after cognitive follow-up but during the range of years analyzed), and for two way interactions between them to ensure that the association observed is not attributable to differences by subgroups in preoperative cognitive function. The form of the linear model is: Cognition ij =β×[1+Age ij +Age ij 2+(EverSurgery i +EverMedical i +EverStroke i )2+CognitiveAssessments i +(Surgery ij +Medical ij +Stroke ij )2+Covariates i +Covariates i ×Age ij +CovariatesTD ij ]+γ i ×[1+Age ij ]+ε ij Grouped squared terms indicate two way interactions. Subscript i indicates predictors that vary across participants and subscript j indicates predictors that vary across cognitive assessments within one participant. “Ever” surgical admissions, medical events, and strokes represent occurrence of at least one of that event any time during the study period; therefore, these represent constant adjustments to the participant’s baseline. Covariates i indicates adjustment for covariates that were measured at baseline, and CovariatesTD ij indicates time dependant covariates. Models were also adjusted for the total number of cognitive assessments to correct for increased dropout of participants who started with lower cognitive scores, and include random effects for participant and a random slope with age, presuming that participants start at different baselines and vary in rate of cognitive decline. The primary coefficients of interest are those representing the number of surgery, medical, and stroke admissions before a given cognitive assessment. Those coefficients represent a cognitive step change occurring at the time of admission and persisting. Overall, our approach presumes that participants differ in both their baseline cognitive abilities and comorbidities and their rate of decline with age. In the analysis we attempt to identify any additional cumulative change after surgery and hospital admission. We compared this approach to an alternative that treats hospital admissions as overall markers of ill health, resulting in differences in the overall cognitive trajectory without any particular impact at the time of the admission. In this model, the only variable that changes with time for a given subject is age: Cognition ij =β×[1+Age ij +Age ij 2+Age ij ×(TotalSurgery i +TotalMedical i +TotalStroke i )2+CognitiveAssessments i +Covariates i +Covariates i ×Age ij +CovariatesTD ij ] +γ i ×[1+Age ij ]+ε ij Because this model was inferior based on the deviance information criterion, our analyses focused on the step change model. All models were fit using the R package MCMCglmm30 and custom code written in R. For final model fits we used 3000 burn-in trials followed by 100 000 iterations thinned to every 10 trials. Autocorrelation plots and Gelman-Rubin diagnostic3132 were used to confirm model convergence with the help of the R package coda.33

Sensitivity analyses In parallel to the models fit to the entire study sample, we analyzed a subset of 4916 participants with at least four cognitive assessments. In this population, we can better follow the cognitive trajectory in individual participants and our estimates could be less susceptible to confounding factors that cause participants to drop out of the study. In further models, we tested the impact of surgery not requiring a hospital stay of two nights, focusing on only BUPA major operations, excluding participants in high risk surgical categories (cardiac, thoracic, vascular, and intracranial neurosurgery), emergency surgery, participants with surgery before age 65, and those who had surgery after the beginning of the study period but before they completed their first cognitive assessment, and incorporating covariates.

Interpretation of model parameter estimates We report posterior means and bayesian 95% credible intervals. Credible intervals indicate the range of parameter estimates that are likely given the data. We chose a bayesian approach because we believe that an effect or no effect judgment based on a null hypothesis test is not the most useful or informative statistic in the context we study, and related P values are often misinterpreted.34 Rather, the statistic that is most relevant and most easily interpreted by clinicians and patients is the range of expected average outcomes.35 For example, a small but statistically significant clinical risk could be irrelevant to patients’ decisions, whereas a large but statistically inconclusive clinical risk is important. Interpreting model coefficients in terms of a range of plausible outcomes, especially estimates of the upper bound to risk, is most important to patients and clinicians.

Missing data To account for occasional missing demographic data, we generated 100 imputed datasets using the R package MICE,36 fit Markov chain Monte Carlo models to each, and computed credible intervals for each fixed effect across the imputed models.37

Identifying substantial decline As an alternative approach, we tested for the most severe (rather than average) cognitive decline outcomes. We predicted the composite cognitive scores for the final cognitive assessment (either the fourth or the fifth study wave) based on extrapolation from the first three assessments in each participant. Those with medical or surgical admissions before the third cognitive assessment, as well as those with stroke at any point during the study, were excluded, leaving 3633 participants for this approach. We fit a linear mixed effects model with age and age squared as fixed effects, and participant as a random effect with random slopes for age. Using this model, we predicted the cognitive score for each participant at their final cognitive assessment for the study, including their random intercept and slope. We subtracted these predicted cognitive scores from the actual cognitive scores at that final assessment, and then z scored these residuals across participants. In accordance with prior studies, we defined participants with z scored residuals of less than −1.96 as those experiencing “substantial decline” relative to prediction.103839 We then fit a logistic regression for substantial decline as a function of having at least one surgery, at least one medical admission, or both before the final cognitive assessment, and adjusted for age at the final cognitive assessment.