In this article, we use data from a mixed-methods independent evaluation of a nine-site UK patient safety improvement programme to report on clinical teams’ experiences of using locally selected measures. We aimed specifically to describe their experiences of planning and conducting measurement activities, collecting data and analysing and interpreting data for their improvement projects.

Some of the problems in measuring improvement are likely to be linked to ongoing controversies about the relevant dimensions of quality and safety and the prioritisation of different types of measures, including, for example, the process versus outcome debate. 8 9 12 Other problems are likely to be more mundane, relating, for example, to issues in establishing data collection systems. 10 Importantly, the literature suggests that some problems may also be linked to ownership: studies of measurement have tended to focus on quality measures generated externally to clinical teams (eg, by regulators or payers), which clinicians may not perceive as directly relevant to their everyday concerns. One suggestion is that clinicians may lack engagement because they perceive externally imposed measures as having little or no relevance to their clinical work and as little more than an administrative burden. 8 Yet little evidence exists on what happens when clinical teams themselves choose their own measures (rather than having to use those selected externally) and design and implement data collection systems that they see as fitted to their own local circumstances. A well-characterised account of responses to such an opportunity would be very useful.

Failure to produce reliable data about improvement and to interpret it correctly is an important challenge for quality improvement, limiting the inferences that can be made about the success or otherwise of improvement interventions, as well as eroding confidence in the evidence base for improvement. 11 This problem manifests in summative evaluations and also affects programmes while they are running, when data have potential to be used formatively to optimise the improvement effort.

Measurement is essential to improving quality and safety in healthcare processes and outcomes. 1 2 Yet the available evidence suggests that many quality improvement projects may fail to generate reliable or useful data because of challenges in measurement, data collection and interpretation. 3–6 Characteristic problems include missing data or insufficient data points; insufficient baseline periods; poorly chosen, unclear or changing sampling strategies; poorly annotated data; failure to verify data entry; and poorly chosen or executed analytic strategies. 7 8 Benn and colleagues 9 found many of these problems when teams sought to implement data collection and analysis systems in local settings as part of a large-scale quality improvement programme. Similarly, a study of a national system for surveillance of healthcare-associated infections 10 found variability in how well intensive care units designed their data collection systems and in how they interpreted data.

Analysis of the qualitative data was based on the constant comparative method, inductively generating thematic categories and using the literature on measurement and quality improvement as sensitising concepts. 15 We first analysed data site by site to ensure that it was understood in terms of relevant context. Then, for each site, we integrated qualitative data and findings from the expert review to produce a comprehensive and in-depth picture of sites’ experiences of measurement. Finally, we conducted cross-site analysis in order to develop higher level concepts and broader learning on measurement.

Towards the end of each phase of the programme, TW reviewed and independently analysed the raw data from a selection of the participating sites (four sites in the main phase and four sites in the extension phase). We initially sought to select sites that would ensure the greatest diversity of projects. However, some sites did not produce data or produced data unsuitable for analysis that could not be included. Table 1 illustrated the four sites in each phase that had their raw data subject to independent SPC analysis.

All the measurement plans prepared by the participating teams were reviewed by one author (TW), who is a specialist in measurement for improvement and an expert in SPC. Published checklists aimed at improving the quality of measurement were used as review criteria. 3 14 The level of information on each step of the measurement process provided by teams was deemed unclear if it was not sufficient for author TW to know how to repeat the measurement process. The reviews of these plans were used for evaluation purposes (eg, to assess the quality of the plans) and to provide formative feedback to the participating teams. This feedback was provided to each team through ad hoc coaching sessions, led by author TW and the programme support team in the main phase, and author TW alone in the extension phase. Up to two such sessions, conducted by telephone, were offered to each team.

Observations and interviews were conducted by non-clinical researchers who were members of the evaluation team. Interviews were conducted in person or on the phone, were recorded digitally and transcribed verbatim. All interviewees signed an informed consent form. Observations were unstructured and included routine clinical activities, team meetings and informal chats with relevant staff. Extensive field notes were taken during visits, and researchers were then debriefed by other members of the evaluation team on return from visits.

We conducted semistructured interviews with members of the participating teams and unstructured ethnographic observations of teams’ activities related to programme participation. Particular effort went into capturing how teams undertook tasks relating to measurement (eg, identified and selected their quality measures and developed and implemented a measurement plan to assess the impact of their improvements). We were also interested in characterising the challenges and hurdles faced by the teams in doing so.

We undertook an independent evaluation of the Safer Clinical Systems programme using a mixed-method design. We combined a qualitative study, which aimed to describe how participating teams experienced taking part in the Safer Clinical Systems programme, with expert review of measurement plans and analysis of data collected for the programme.

In the main programme phase, participating sites received training and guidance and were monitored in their progress, by a dedicated programme support team. Support on measurement included approximately 1 day of training on principles of measuring improvement, SPC and use of software for capturing their data and generating charts. In the extension phase, teams were expected to use the Safety Clinical Systems approach on their own, without the support and control that characterised the main phase.

Funded by an independent charity (The Health Foundation), the programme was run in a total of nine UK hospitals in two phases: the main phase, which ran 2011–2014 and included eight sites, and the extension phase, which ran 2014–2016 and included six sites (five of the original sites plus an additional one that had not taken part in the main phase). Each of the nine hospitals taking part in the programme used the Safer Clinical Systems approach to proactively assess risks and hazards in their clinical pathways and to develop effective risk-control interventions ( table 1 ).

A distinctive feature of Safer Clinical Systems is that it does not try to impose predefined solutions but instead seeks to help organisations develop their own capacity to detect and address weaknesses in their systems and to measure and report their improvement outcomes. It does so by offering training on a range of improvement tools and techniques (including how to measure for improvement) and emphasising the need to engage local staff (clinical and managerial) in improvement attempts.

Control charts are the main analytical tool used in SPC. 18 A control chart shows a time series of how the measure varies over time. The centre line represents typical performance of the process or outcome that the team is seeking to improve. Control limits (dotted lines parallel to the centre line) show the degree of variation that is to be expected assuming that the process or outcome being measured has not changed. SPC provides sets of rules that are used to assess a time series for the presence of special cause variation – evidence that performance has changed.

SPC is an approach to understanding and acting on variation observed in measured properties of a system. In this approach, data are used to gain insight into how a healthcare system or process is performing, and how this performance is changing over time. These insights inform actions on the system, targeted at causes of poor performance. Continuing analysis is used to understand whether these actions have led to improvement.

Measurement is a key element of the Safer Clinical Systems approach, which emphasises local ownership and local selection of measures for the monitoring of improvement. The approach does not recommend or impose any external measures, though it does recommend that Statistical Process Control (SPC) be used as a means of monitoring and analysis of data.

Measurement is a key element of Safer Clinical Systems. During the course of the programme, participating teams were expected to: (1) develop a detailed measurement plan to set out outcome and/or process measures that were appropriate for collecting useful data; (2) establish data collection systems; and (3) analyse and interpret their data using Statistical Process Control (SPC) ( Box 1 ).

The study we report is based on data from an independent evaluation of a patient safety improvement programme run in the UK, which used an approach known as Safer Clinical Systems. 7 Based on methods of risk management and improvement used in other hazardous industries, the Safer Clinical Systems approach seeks to enable organisations to make improvements to local clinical systems and pathways using a structured methodology for identifying risks and for modifying or re-engineering systems to control risk and enhance reliability. 7 13 It involves a series of steps in which teams define a clinical pathway and its context; do a detailed diagnostic assessment of the pathway to identify risks and hazards; assess and select options for change and develop an action plan to implement them; and undertake system improvement cycles involving implementation, evaluation of progress against a measurement plan and revision of interventions.

Results

Across the two phases of the study (main and extension), the qualitative evaluation study involved 862 hours of observation and 143 interviews (table 2) covering all aspects of the programme (not just measurement). The participating site teams specified, between them, a total of 67 measures that they planned to use to monitor their processes before and after introduction of their risk-control interventions. The data for 49 of these measures—which were sourced from four of the eight sites participating in the main phase and from four of the six sites participating in the extension phase—were independently analysed by the evaluation team (table 2).

Table 2 Data collected in the evaluation of each phase of the programme

The clinical teams participating in the programme typically comprised a clinical lead (often a senior physician), a project manager, others from a clinical or managerial background and an executive sponsor (a senior individual who reported to the board but was not involved in day-to-day running of the project). The participating sites varied in the extent to which they enjoyed active support from executive or non-executive board members and from other clinicians; the interaction of the work with infrastructure such as large IT system projects; a pre-existing audit culture and organisational capability for managing complex data; and the resources available to the teams, including release of staff to undertake project work. In the account below, we offer an analysis of measurement-specific issues and specifically on teams’ ability to: (1) manage the tasks associated with developing measurement plans; (2) establish and use reliable data collection systems; and (3) analyse and report data in appropriate ways. Our analysis is focused on drawing out generalisable learning across the programme and does not seek to compare/contrast sites. Table 1 provides a summary of each project’s aim and measures and keychallenges and achievements in measurement.

Developing a detailed measurement plan The measurement plans that the teams were asked to develop were intended to identify and define suitable measures in advance of any improvement interventions being implemented and to specify a sampling and analytical strategy. In the main phase (in which participating teams received dedicated measurement support and guidance), all teams produced a measurement plan document; in the extension phase, two out of the six teams did so. When no measurement plan was available, the evaluation team assessed any written material provided by the teams that included elements of measurement planning. Our review of the measurement plans (or related documents) indicated that most demonstrated great enthusiasm and also multiple problems; here, we describe six. The first problem was the overambitious nature of the plans. Several teams initially identified very many measures (up to 15 in some cases) that were highly diverse in character. Given the formative nature of the evaluation, these sites were asked to reduce the number of measures in their final measurement plans to five or six and to concentrate their efforts on those (table 1 reports the final number of measures used by each team after feedback). Second, many plans did not demonstrate the level of specification or understanding of the underlying methodological principles necessary to gather good quality data, consistent with varying confidence about measurement expressed in interviews. I am very confident actually, very confident our data is accurate, given the sort of work we did around some of the reliability and the training. (Interview, main phase of the programme) The measures are the bit that we're struggling with the most at the moment, using the BaseLine software I'm not finding easy at all, I'm struggling with it…. I don't feel that I've had enough training in it. (Interview, main phase of the programme) Every measurement plan contained examples of operation definitions of measures that were imprecise, lacked important details or were difficult to understand by those outside the project team. For example, some sites used compliance with a care bundle (eg, medication reconciliation or review) as a measure but did not always specify the operational definition of the individual components of the bundle. One team used a measure labelled as ‘Number of patients […] who have their medicines 100% correct at 24 hours’ without specifying how staff should ascertain that medicines were correct. Similarly, terms such as ‘delay”, ‘error’, ‘time zero’ and ‘baseline’ were not fully defined, leaving room for different interpretations between observers and over time. When sites reported that some data were ‘not applicable’ to certain measures, they did not always give a reason for this. Different names are used to refer to the same measures in this document when compared with the others, and also in different parts of the same document. For example, the following two measure names seem to be used interchangeably: ‘% of patients on EAU [emergency assessment unit] who have all their medicines correct at 24 hours’ and ‘Accuracy of prescription at 24 hours on EAU’. (Evaluation team’s review of the measurement plan, extension phase) The third problem was that some measures selected by the teams were insufficiently sensitive to capture the spectrum of improvements sought by the sites. For example, one site’s definition of compliance with its medication reconciliation bundle stated that all 10 elements of care in the bundle should be in place. Even if nine elements of the bundle were in place and one was not, the patient’s care was deemed non-compliant. Bundles should usually include fewer elements (three to five),16 17 suggesting perhaps suboptimal design of the bundle and also indicating that full compliance was unrealistic and that use of this measure might fail to detect potential improvements. Fourth, specification of sampling procedures was typically weak, and it was often unclear what procedure was to be used for random selection. Inclusion and exclusion criteria necessary for determining who or what should be counted were often unclear. For instance, one site in the extension phase reported in their measurement plan that ‘each week a random sample of 5 patient case notes should be selected for admission, transfer and discharge’, without specifying how such random sampling should be done. If, for example, staff selected patients randomly from physical stacks of notes, bias might be introduced if some patients’ notes were unavailable. The fifth problem was that some selected measures were not logically linked to the improvement actions they implemented. For example, one site opted to measure the average proportion of patients going to the operating theatre with a completed perioperative care plan, but then struggled to implement an intervention that would increase completion of the plans. In this site, due to uncertainty about the renewal of the hospital IT contract, it was difficult to make available documents relevant to surgery on the trust’s IT system at the operation stage, and this improvement action was therefore abandoned. Thus, although the site recorded an improvement between the two measurement periods (from 65% to 78% of patients with a complete plan), it was difficult to attribute the site’s improvement to its Safer Clinical Systems project. Sixth, in general, the measurement plans produced by the teams did not look sufficiently far ahead. For example, the plans did not contain the specifics of how the data would be analysed, thus impacting on important considerations such as the appropriate length of the baseline and how much data over what period would be needed to establish whether an improvement had been made. Most plans did not touch on who was responsible for taking action for improvement based on the findings of the analysis or on embedding measurement in routine care.

Collecting data Interviews and observations showed that teams generally struggled to set up and run data collection systems and that running the systems consumed a huge amount of time and resource at several sites. Some challenges were related to teams’ decisions to use entirely new measures for the first time (including ‘home-grown’ measures). Some teams started by using lengthy, unwieldy manual data collection forms that were sometimes amended or abandoned after a short time. In other cases, teams used routinely collected data, but these data were often not as clean or well set up as originally anticipated and often required extensive effort to bring them up to a standard suitable for use. It’s been a nightmare actually… We’ve been looking… at readmissions, and in retrospect I don’t think the organisation had a consistent metric for readmissions, in terms of what it meant and how they were collecting it. A lot of people that were being classified as readmissions weren’t being readmitted, a lot of people were double-counted or triple-counted or worse, and of course then we had really untidy data… (Interview, main phase) I had to write a database with the coders, had to pay… the data people to give us the feed of the patients going to [operating rooms]. [This] took a huge amount of time and it meant that until February I was manually having to get that data from systems which was an absolute nightmare. But it's better now. (Interview, main phase) Data collection often depended on voluntary, unpaid or extra activity that was unsustainable. It certainly has been extra work for all of us, for example observing the handover is not something that would normally be part of my day-to-day job. But obviously it has been a time investment. We have used the Health Foundation money to pay for part of it, but there certainly has been extra goodwill from people who collected the data. (Interview, main phase) Data collection systems were not always run exactly as designed, sometimes resulting in missing data. One team struggled to get reliable data collection at weekends. In one site, a special form that was supposed to be used for data collection was not consistently used, with data instead collected in a notebook or on odd pieces of paper in non-standard formats. In another example, attempts to collect data from doctors at the end of night shifts was met with difficulty, as the physicians were tired and wanted to finish their clinical tasks before going off duty. A further challenge was that teams did not reliably collect baseline data before they introduced interventions aimed at improvement. In part, this was because once the participating sites became aware of the many (and, in some cases, severe) hazards threatening patient safety in their clinical systems through use of the Safer Clinical Systems diagnostic tools, they were understandably eager to address these hazards quickly. Accordingly, some sites proceeded to implement improvement actions before measurement had started. The consequent absence of a baseline period, while well justified in terms of addressing risk, meant that it was difficult to demonstrate that any improvement was attributable to the programme or indeed that the risk was now well controlled.