Study population and study design

This study used data from the Whitehall II prospective occupational cohort study. All civil servants aged 35–55 working in the London offices of 20 Whitehall departments in 1985–1988 were invited to participate. The response rate was 73% and a sample of 6895 men and 3413 women was recruited (phase 1). These civil servants were employed in a wide variety of roles from clerical grades, through to senior administrative grades, reflecting different employment grades and salaries. Follow-up surveys were conducted every 2–3 years. All participants provided written consent and the University College London ethics committee approved this study.

The data for the present analyses were drawn from phases 5 (1997–1999), 7 (2002–2004), 9 (2007–2009), and 11 (2012–2013) of the Whitehall II Study when cognitive tests were administered during the clinical examinations. Phase 3 (1991–1994) was not used because cognitive testing was introduced midway through phase 3 and consequently only half of respondents completed the cognitive test at that phase. For the current study, participants were eligible for inclusion if they had data on cognitive function at least once before and once after retirement. We excluded participants who were already not working at phase 5 and those who did not retire during follow-up or returned to work after retirement. There were 3691 eligible participants who moved from work to retirement, but 258 of these were excluded due to missing cognition outcome (i.e. without cognition measures at least once prior and at least once after retirement.) The final sample comprised 3433 participants (11,858 observations). The process of sample selection is shown in Fig. 1. Participants’ average age when taking the cognitive tests was 54.0 years (range 45–68) at phase 5, 59.5 years (range 51–74) at phase 7, 64.3 years (range 56–79) at phase 9, and 68.2 years (range 60–83) at phase 11.

Fig. 1 Flowchart of sample selection process Full size image

Measures

Cognitive function

The cognitive test battery, including verbal memory, abstract reasoning, phonemic verbal fluency, and semantic verbal fluency, was introduced to the Whitehall II cohort study in phase 5 and was repeated using the same tests at all subsequent assessments (phases 7, 9, and 11). The tests have good test–retest reliability (range 0.6–0.9), assessed in 556 participants who were invited back to the clinic within 3 months of having taken the test in phase 5 [41]. Verbal memory was assessed with a 20-word free recall test. Participants were presented with a list of 20 one- or two- syllable words at two-second intervals and then had 2 min to recall in writing as many words as possible (maximum possible score = 20) [42]. Abstract reasoning was assessed by the Alice Heim 4 Part 1 test (AH4). This test measures the ability to identify patterns and to infer principles and rules, which is composed of a series of 65 questions (32 verbal and 33 mathematical) of increasing difficulty (maximum possible score = 65). Participants had 10 min to complete this section [43]. Phonemic verbal fluency was assessed by asking participants to write as many words beginning with the letter ‘S’ as they could (maximum score = 35), and semantic verbal fluency was assessed by recalling as many animal names as possible (maximum score = 35). One minute was allowed for each verbal fluency test [44].

Retirement and year of retirement

Respondents’ employment status was measured by self-reports at each phase. Participants were considered to be in employment if they were still working in the civil service or were in paid employment elsewhere (full or part time). Participants were classified as retired if they moved from work to retirement directly or moved from work to unemployed/other, and then to retirement.

All respondents who retired from the civil service provided their exact year of exit from the civil service, but those who retired from employment outside the civil service were not asked the exact year of exit. For these 1632 individuals (46% of selected sample) whose exact exit year was unknown, we used the mid-point between the last phase still in paid work and the subsequent phase no longer working. We used the year of retirement as the centre point to calculate the cognitive trajectories before and after retirement.

Health-related retirement

At each phase, participants who were not working could indicate whether this was because of long-term sickness. Participants who retired from the civil service answered whether this was on health grounds. We considered participants who were ‘long-term sick’ or who indicated that the route of leaving the civil service was ‘retirement on health grounds’ as health-related retirement.

Covariates

We included retirement age as a covariate. Because all the analyses in this paper were centred at the year of retirement (see statistical method section), including retirement age as a covariate can effectively adjust for age effects. We adjusted for birth year to take account of the possibility of period effects. Gender and self-reported highest educational qualification were also included as covariates. Educational qualification was grouped into: O-level or lower (‘low’), A-level or equivalent (‘middle’), and degree level or higher (‘high’). To account for practice effects (i.e. gains in scores on cognitive tests when a person was retested on the same or similar instruments), we controlled for the number of cognitive tests a participant had completed in previous phases. Although cognitive test scores in phase 3 were not used in the analysis, the practice effect at this phase was counted.

Time-fixed covariates based on the last interview before retirement were employment grade, still working in the civil service, psychosocial job demands, job decision latitude, and spouse’s or partner’s employment status. Employment grade was measured, in order of increasing salary, as clerical/support (‘low’), professional/executive (‘middle’), or administrative (‘high’) [37]. For those who had left the civil service, the last employment grade before leaving was used. Job demands were measured by four items such as ‘Do you have to work very fast?’ Decision latitude was measured by nine items such as ‘Do you have a choice in deciding how to do your work?’ [45]. Respondents rated each question item whether it was ‘often’, ‘sometimes’, ‘seldom’ or ‘never/almost never’ the case. Each answer was scored from 0 to 3 and was added up so that a higher score reflected greater job demands or higher job decision latitude. Continuous scores were divided into tertiles [46]. Spouse’s employment status was measured by asking whether a spouse is currently doing any paid work. Those reporting not being married/cohabiting were coded as ‘no spouse’.

Time-varying covariates (phases 5, 7, 9, and 11) included smoking status, alcohol consumption, depressive symptoms, systolic blood pressure (SBP), diastolic blood pressure (DBP), body mass index (BMI), total blood cholesterol, coronary heart disease (CHD), stroke, all malignant cancers, and diabetes/intermediate hyperglycaemia. By treating these variables as time-varying, we account for reported changes in health conditions and health behaviours over time. Smoking status (current, never, ex-smoker) and alcohol consumption in the past week (0, 1–10, more than 10 units) were based on self-reports. Symptoms of depression were measured by the depression subscale of the General Health Questionnaire (GHQ), and cut-off points of four out of 12 were used to identify depression cases [47]. Blood pressure (mm Hg), BMI (kg/m2), and total blood cholesterol (mmol/l) were objectively measured during the clinical examinations and were included as continuous covariates in the model. CHD (yes/no) includes diagnosed non-fatal myocardial infarction (MI) and ‘definite’ angina. Non-fatal MI was defined following MONICA criteria [48] based on study electrocardiograms, hospital acute ECGs, and cardiac enzymes and validated using discharge diagnoses from National Health Service (NHS) Hospital Episode Statistics (HES) data or General Practitioner (GP) confirmation up to the end of phase 11. Self-reports of non-fatal MI were not used [49]. ‘Definite’ angina included self-reported cases of angina only if they were subsequently validated by these other sources. Self-reported stroke events (yes/no) were collected throughout follow-up, and were validated by HES data linkage, GP’s confirmation, or retrieval of hospital medical records up to phase 9 [49, 50]. Cancer incidence data (yes/no) for the period 1971–2015 were obtained from the NHS Central Register for nearly all participants. Diabetes/intermediate hyperglycaemia (yes/no) was defined by the WHO criteria of oral glucose tolerance test and by a self-reported diagnosis of diabetes [51].

Statistical methods

To test a change of the response function (Y) of a varying independent variable (X), we used piecewise linear regression with two segments separated by a ‘knot’ [52, 53]. We used year of retirement as the knot (i.e. year 0), and thus, generated two independent variables reflecting ‘years before retirement’ (− 14 to − 1) and ‘years after retirement’ (1–14). Retired less than a year was counted as 1 year. Linear mixed models were fitted for each cognition outcome, in turn, and these two variables were entered into the model. The coefficients for the variable ‘years before retirement’ (i.e. slope before) represented the average change in cognition per year before retirement. Coefficients for the variable ‘years after retirement’ (i.e. slope after) represented the average change in cognition for each additional year after retirement. If retirement did not affect cognition, we would expect the trajectories of cognitive function to be similar before and after retirement. Therefore, to test whether retirement influenced cognitive decline, independent of age-related change, we examined differences in the slope for cognition before and after retirement. The ‘slope change’ was defined as the ‘slope after retirement’ minus the ‘slope before retirement’ (this was also expressed as percentage change, calculated as ‘slope change’ divided by slope before retirement multiplied by 100). All analyses were carried out in Stata 14. We also examined whether a nonlinear piecewise model was better than the linear model by adding quadratic terms of ‘years before retirement’ and ‘years after retirement’ into each model. To take account of the clustering of the data, mixed models with repeated measures and individuals as the two random-effects levels were conducted. The models allowed for both random intercepts (for each individual) and random coefficients (for the terms ‘years before’ and ‘years after’ retirement).

To assess whether the effect of retirement differed by cognition domains, we conducted a test of heterogeneity on the effect of retirement using multivariate multilevel models with all cognition outcomes included in one model.

To visualise the results from these regressions, we show predicted trajectories of each cognitive function outcome, both before and after retirement. These predicted trajectories from adjusted models were calculated at the sample mean of each covariate. In addition to piecewise linear trajectories (where ‘years before retirement’ and ‘years after retirement’ were treated as continuous), predicted adjusted means at each time point (where ‘years before retirement’ and ‘years after retirement’ were treated as categorical) are shown as dots in the figures.

We tested for potential moderators, including employment grade (based on last response before retirement) and sex in the association between retirement and cognition outcomes, by adding interaction terms (‘years before retirement × employment grade’ and ‘years after retirement × employment grade’; ‘years before retirement × sex’ and ‘years after retirement × sex’) in the model for each cognition outcome.

Missing data

For time-fixed covariates (employment grade, still working in the civil service, job demands, job decision latitude, and partner’s employment status), missing data in the last interview before retirement was first replaced by prior responses. The remaining missing data of time-fixed covariates and missing data of other covariates for the eligible participants were imputed in Stata, using multivariate imputation by chained equations, and 30 datasets were imputed. We included all variables from the analyses (i.e. independent variables, outcome variables, covariates, and moderators) in the imputation model. After running the imputation, we deleted imputed outcome values in the regression. Percentage of missing data was shown in Table 1.

Table 1 Descriptive statistics for the study sample (n = 3433)a Full size table

Sensitivity analyses

We conducted three sensitivity analyses to assess the reliability of our results and conclusions. Sensitivity analysis 1 aimed to assess potential bias to the results due to reverse causality. For this analysis, we excluded 500 participants from the analytic sample who retired due to health reasons or had a GHQ depression value of 4 or higher at the last interview before retirement. In addition, 172 participants for whom cognition was measured only twice (once before and once after retirement) were also excluded. Participants who moved from work to retirement via ‘unemployed/other’ (n = 278) were excluded from this sensitivity analysis, since they are likely to have higher levels of stress which may influence cognitive function. Some participants fulfilled several of the exclusion criteria, thus a total of 911 participants were excluded in this sensitivity analysis.

We compared the characteristics of ‘eligible participants but with missing cognition data (n = 258)’ and ‘the analytic sample (n = 3433)’, and found that they had several different demographic characteristics (online resources Table 1S). Therefore, it is possible that our analytic sample had different cognitive function compared to participants with missing cognitive data. Sensitivity analysis 2 aimed to assess the impact of missing cognitive data on results. This sensitivity analysis included these 258 participants and multiply imputed their missing cognitive measures.

Sensitivity analysis 3 assessed whether our results could be influenced by physical activity level, although one recently published Whitehall II study found no association between physical activity and cognitive decline [54]. We used the total physical activity level (< 8, 8–12, ≥ 12 h/week) at the last interview before retirement.