Participants & procedure

Participants were drawn from Waves 6 and 7 (2014–2015/2015–2016) of the Monitor of Engagement with the Natural Environment (MENE) survey (the only Waves where our key outcomes were consistently measured). The survey, which is part of the UK government’s National Statistics, is repeat cross-sectional (different people take part in each wave), and is conducted across the whole of England and throughout the year (approx. 4,000 people per week) to reduce potential geographical and seasonal biases49. As part of the UK’s official statistics, sampling protocols are extensive, to ensure as representative a sample of the adult English population as possible. Full details can be found in the annual MENE Technical Reports49 with key features including: (a) “a computerised sampling system which integrates the Post Office Address file with the 2001 Census small area data at output area level. This enables replicated waves of multi-stage stratified samples”; (b) “the areas within each Standard Region are stratified into population density bands and within band, in descending order by percentage of the population in socio-economic Grade I and II”; (c) “[in order to] maximise the statistical accuracy of the sampling, sequential waves of fieldwork are allocated systematically across the sampling frame to ensure maximum geographical dispersion”; (d) “to ensure a balanced sample of adults within the effective contacted addresses, a quota is set by sex (male, female housewife, female non-housewife); within the female housewife quota, presence of children and working status and within the male quota, working status”; and (e) “the survey data is weighted to ensure that the sample is representative of the UK population in terms of the standard demographic characteristics” (ref.49, p.5). Data is collected using in-home face-to-face interviews with responses recorded using Computer Assisted Personal Interviewing (CAPI) software.

Although the total sample for these years was n = 91,190, the health and well-being questions were only asked in every fourth sampling week (i.e. monthly, rather than weekly) resulting in a reduced sample of n = 20,264. In order to account for any residual biases in sampling at this monthly level, special ‘month’ survey weights are included in the data set. These were applied in the current analysis to ensure that results remained generalisable to the entire adult population of England. All data were anonymised by Natural England and are publically accessible at: http://publications.naturalengland.org.uk/publication/2248731?category=47018. Ethical approval was not required for this secondary analysis of publically available National Statistics.

Outcomes: Self-reported health & subjective well-being

Self-reported health (henceforth: health) was assessed using the single-item: ‘How is your health in general?’ (sometimes referred to as ‘SF1’). Response options were: ‘Very bad’, ‘Bad’, ‘Fair’, ‘Good’ and ‘Very good’. Responses are robustly associated with use of medical services50 and mortality51; and crucially, for current purposes, neighbourhood greenspace13. Following earlier work we dichotomised responses into ‘Good’ (‘Good/very good’, weighted = 76.5%) and ‘Not good’ (‘Fair/bad/very bad’, 23.5%)52. Subjective well-being (henceforth: well-being) was assessed using the ‘Life Satisfaction’ measure, one of the UK’s national well-being measures53: ‘Overall how satisfied are you with life nowadays?’ with responses ranging from 0 ‘Not at all’ to 10 ‘Completely’. Again, following earlier studies we dichotomised responses into ‘High’ (8–10, 60.2%) and Low (0–7, 39.8%) well-being54. Histograms of the (non-normal) distributions for both outcome variables are presented in Appendix A. Of note although the dichotomisation points were based on prior research, they are consistent with the current data; the 50th percentile for health was in the ‘good’ response and for wellbeing in ‘8’. Sensitivity analyses conducted on ordinal (both health and wellbeing) and linear (wellbeing only) variations of these variables are presented in Appendix E.

Exposure: Recreational nature contact in last 7 days

Recreational nature contact, or time spent in natural environments in the last week, was derived by multiplying the number of reported recreational visits per week by the length of a randomly selected visit in the last week. Participants were introduced to the survey as follows: “I am going to ask you about occasions in the last week when you spent your time out of doors. By out of doors we mean open spaces in and around towns and cities, including parks, canals and nature areas; the coast and beaches; and the countryside including farmland, woodland, hills and rivers. This could be anything from a few minutes to all day. It may include time spent close to your home or workplace, further afield or while on holiday in England. However this does not include: routine shopping trips or; time spent in your own garden.” Then they were asked “how many times, if at all, did you make this type of visit yesterday/on <DAY> ” for each of the previous seven days. Ninety-eight percent of respondents reported ≤7 visits last week. The remaining 2% were capped at 7 visits to avoid dramatically skewing weekly duration estimates.

After basic details of each visit (up to 3 per day) were recorded, a single visit was selected at random by the CAPI software, for the interviewer to ask further questions about, including: “How long did this visit last altogether?” (Hours & Minutes). Due to random selection, even if the selected visit was not necessarily representative for any given individual, the randomisation procedure should reduce potential bias at the population level at which our analyses were conducted. Weekly duration estimates were thus derived by multiplying the duration for this randomly selected visit by the number of stated visits in the last seven days (capped at 7). Following the approach of earlier exposure-response studies in the field (e.g. Shanahan et al., 2016), duration was categorised into 7 categories: 0 mins (n = 11,668); 1–59 mins (n = 355); 60–119 mins (n = 1,113); 120–179 mins (n = 1,290); 180–239 mins (n = 1,014); 240–299 mins (n = 882); ≥300 mins (n = 3,484). An alternative banding at 30 minutes was problematic because of very low Ns for some bands (e.g. 1–29 mins, n = 85), reflecting the fact that weekly duration estimates clustered around the hour marks, e.g. 78% of the unweighted observations within the 120–179 mins band were precisely 120 mins (See Appendix A, Figure C for duration histogram). The highest band was capped at ≥300 mins due to the large positive skew of the data.

Control variables

Health and well-being are associated with socio-demographic and environmental characteristics at both neighbourhood (e.g. area deprivation) and individual (e.g. relationship status) levels55. As many of these variables may also be related to nature exposure they were controlled for in the adjusted analyses.

Area level control variables

Area level covariate data was assigned on the spatial level of the Census 2001 Lower-layer Super Output Areas (LSOAs) in which individuals lived. There were 32,482 LSOAs in England, each containing approximately 1,500 people within a mean physical area of 4km2.

Neighbourhood greenspace

In order to understand how much greenspace is in an individual’s neighbourhood, we derived an area density metric using the Generalised Land Use Database (GLUD)56. The GLUD provides, for each LSOA in England, the area covered by greenspace and domestic gardens. These were summed and divided by the total LSOA area to provide the greenspace density metric. This metric was allocated to each individual in the sample, based on LSOA of residence. Following previous literature, individuals were assigned to one of five quintiles of greenspace based on this definition (ranging from least green to most green)33. Rather than derive quintiles of greenspace from the current sample (i.e. divide the current sample into five equal parts based on the percentage of greenspace in their LSOA), we assigned individuals instead to one of five pre-determined greenspace quintiles based on the distribution of greenspace across all 32,482 LSOAs in England. Although this meant that we did not get exactly equal 20% shares of our current sample across greenspace quintiles (although due to the sampling protocol we were still very close to this, see Appendix B) this approach allowed inferences to be made across the entire country, rather than simply to the current sample. In exploratory sensitivity analyses we defined greenspace as the GLUD category ‘greenspace’ only, with the GLUD category ‘gardens’ excluded. This produced very similar results, so we focused on the more inclusive definition including both aspects. In further exploratory sensitivity analyses, we assigned individuals to five greenspace categories defined by equal ranges of greenspace coverage (e.g. 0–20%, 21–40%, 41–60% etc.) rather than quintiles based on percentages of the population. This also produced very similar results, so again we decided to go with the most common approach. In subsequent analyses the least green quintile acted as the reference category.

Area deprivation

Each LSOA in England is assessed in terms of several parameters of deprivation, including unemployment and crime, levels of educational, income, health metrics, barriers to housing and services, and the living environment. A total Index of Multiple Deprivation (IMD) score is derived from these subdomains57. Following previous studies52, we assigned individuals into deprivation quintiles based on the LSOA in which they lived. As with greenspace, the cut points for area deprivation quintiles were also based on all LSOAs in England, rather than those in the current sample, to allow inference to the population as a whole (most deprived quintile = ref).

Air pollution

An indicative measure of air pollution was operationalised as LSOA background PM 10 assigned to tertiles of all LSOAs in England (lowest particulate concentration = ref). PM10 concentrations, based on Pollution Climate Mapping (PCM) model simulations58, were averaged over the period 2002–2012, and aggregated from 1 km square resolution to LSOAs.

Individual level controls

Individual level controls comparable to earlier studies in this area6,7,12,13,15 included: sex (male = ref); age (categorised as 16–64 = ref; 65+); occupational social grade (AB (highest, e.g. managerial), C1, C2 and DE (lowest, e.g. unskilled labour, = ref) as a proxy for individual socio-economic status (SES); employment status (full-time, part-time, in education, retired, not working/unemployed = ref); relationship status (married/cohabiting; single/separated/divorced/widowed = ref); ethnicity (White British; other = ref); number of children in the household (≥1 vs. 0 = ref); and dog ownership (Yes; No = ref).

Two further control variables were particularly important. First, the survey asked: ‘Do you have any long standing illness, health problem or disability that limits your daily activities or the kind of work you can do?’ (‘Restricted functioning’: Yes; No = ref). Including this variable, at least in part, controls for reverse causality. If similar associations between nature exposure and health and well-being are found for both those with and without restricted functioning, this would support the notion that the associations are not merely due to healthier, more mobile people visiting nature more often.

We also controlled for the number of days per week people reported engaging in physical activity >30 mins; in the current analysis dichotomised as either meeting or not meeting guidelines of 150 mins per week (i.e. 5 days in the week with physical activity >30 mins). Some people achieve this guideline though physical activity in natural settings35, thus, any association between time spent in nature and health may simply be due to the physical activity engaged in these settings. We believe this is not the case in the current context because the (rank order) correlation between weekly nature contact and the number of days a week an individual engaged in >30 mins of physical activity was just r s = 0.27. Nevertheless, by controlling for weekly activity levels, modelled relationships between time in nature and health have less bias from this source, and, therefore, improved estimates of association with nature exposure per se.

Temporal controls

Due to the multi-year pooled nature of the data, year/wave was also controlled for. Preliminary analysis found no effect of the season in which the data were collected so this was excluded from final analyses.

Analysis strategy

Survey weighted binomial logistic regressions were used to predict the relative odds that an individual would have ‘Good’ health or ‘High’ well-being as a function of weekly nature exposure in terms of duration categories per week. Model fit was provided by pseudo R2; here the more conservative Cox and Snell estimate. The outcome binary variables were first regressed against the exposure duration categories to test direct relationships; adjusted models were then specified to include the individual and area level control variables. Due to missing area level data for a small minority of participants (n = 456), our estimation samples for these adjusted models were n = 19,808. Preliminary analysis found that the weighted descriptive proportions among this reduced estimation sample differed only negligibly from those among all available observations in the wider MENE sample, suggesting our complete case analysis approach did not distort the population representativeness of the estimation sample. The full n = 20,264 sample was maintained for the unadjusted model to provide the most accurate, weighted representation of the data, as reducing unadjusted models to n = 19,808 produced practically identical results. Although our main analyses used duration categories of weekly nature contact, an exploratory analysis used generalized additive models incorporating a penalized cubic regression spline of duration as a continuous variable (adjusting for the same set of covariates). This enabled us to produce a ‘smoother’ plot of the data. Analyses and plotting was done using R version 3.4.1, using packages mgcv and visreg59.

To explore the generalisability of any pattern across different socio-demographic groups, we also a priori stratified the analyses on several area and individual covariates (as defined above) which have been found to be important in previous studies: (a) Urbanicity; (b) Neighbourhood greenspace; (c) Area deprivation; (d) Sex; (e) Age; (f) Restricted functioning; (g) Individual socio-economic status (SES); (f) Ethnicity; and (g) Physical activity. In the case of the three multi-category predictors (area greenspace/deprivation, individual SES), binary classifications were derived for the stratified analyses to maintain robust sample sizes in each category. In the case of LSOA greenspace and deprivation binary splits were made based on the median cut-point for all LSOAs in England; SES was dichotomised by collapsing the social grade categories in the standard way, A/B/C1 vs. C2/D/E.