Our analysis of information documented in EMRs showed that there were immediate reductions in quality-of-care measures for all 12 indicators in the first year after the removal of financial incentives, with only small additional changes in the following 2 years. Reductions were generally largest for indicators related to documented provision of health advice (Table 1), with absolute reductions ranging from 46.1 to 71.6 percentage points 3 years after removal of financial incentives, and for the indicator related to documentation of seizure-free status in patients with epilepsy, with an absolute reduction of 53.6 percentage points (Table 2). These were indicators for which the physician was required to check boxes in the EMR to indicate that care had been delivered, and the large reductions observed could indicate either that care was no longer given or that it was no longer documented. Changes were smaller, although still substantial, for performance on clinical-process and intermediate-outcome indicators, for which data such as blood pressure and smoking status are routinely recorded in coded form and laboratory test values are automatically entered into the EMR. The smallest change in performance at 3 years was a reduction of 9.2 percentage points in thyroid-function testing in patients with hypothyroidism. There were no large changes in documented quality on indicators for which incentives were not removed. Removal of incentives was usually associated with increased variation in documented quality among practices, but socioeconomic disparities narrowed rather than widened after incentive removal.

Five previous studies of financial-incentive removal in health care had conflicting results.8-11,18 One study showed that incentive removal had little effect on documented quality for eight QOF indicators, but seven of them were process indicators (e.g., measurement of blood pressure) for which linked outcome indicators continued to incentivize measurement.9 Similarly, performance was sustained after incentive removal for six of seven measures of quality of care in 128 Veterans Health Administration (VA) hospitals in the United States10 and for nine primary care prescribing-safety indicators in the United Kingdom.18 However, in both cases, financial incentives were part of more comprehensive improvement interventions, including blends of goal setting, comparative feedback, facilitation, and education. In contrast, there were reductions in diabetic retinopathy screening (by approximately 8%) and cervical cancer screening (by approximately 4%) after incentive removal in 35 Kaiser Permanente facilities, although the reductions occurred gradually over a period of several years rather than abruptly.8 Similarly, quality declined after incentive removal in VA facilities participating in a randomized, controlled trial of incentives to improve hypertension control.11 In both cases, reductions after removal of the incentives were similar to gains associated with the introduction of the incentives.

Our study also showed moderate reductions in documented clinical-process and outcome quality, which were of a scale similar to that of observed increases in documented quality when the QOF was introduced.12,24 However, we observed much larger reductions in clinical-process documentation in patients with serious mental illness, in documentation of whether patients with epilepsy were seizure-free, and in documentation of health advice. Further research is needed to examine the reasons for these larger reductions. One possible explanation is that most practices do not offer nurse-led clinics for the management of serious mental illness or epilepsy, whereas such clinics are offered for more common chronic conditions such as diabetes, so that care processes for the less common chronic diseases may not be routinely embedded in clinical practice. The simultaneous removal from EMRs of pop-up reminders to opportunistically deliver care or document activity may also have contributed, particularly for aspects of incentivized care, such as giving health advice,25 which clinicians may value less than care for more established chronic diseases. The introduction of the QOF was associated with the narrowing of quality differences between practices serving more affluent populations and those serving less affluent populations,26 but previous studies of incentive removal did not examine such disparities. Despite reductions in documented quality for all indicators, it is reassuring that socioeconomic disparities more often narrowed than widened after incentive removal.

The strengths of this study include the use of interrupted time-series analysis to examine incentive removal in routine care in approximately one third of all English practices, with a sensitivity analysis that included all available data showing findings that were consistent with the primary analysis. A weakness is that the time series has relatively few data points, but this is inevitable with annual reporting. As with all observational studies, residual confounding cannot be excluded as an explanation for the observed changes in documented quality, although the stability of indicators for which incentives were maintained reduces the likelihood of residual confounding.

The key limitation of the study is that on the basis of the available data, we cannot distinguish between changes in clinical activity and changes in documentation of clinical activity in the EMR. The four serious mental illness indicators provide the most direct test of this. The two indicators for which incentives were maintained did not change (which is consistent with patients still being regularly reviewed by practices), but there were marked reductions in documentation of glycated hemoglobin measurement (automatically imported to the EMR from the laboratory) and body-mass index (automatically calculated if weight is entered, although measurement of weight and other vital signs is not routine when patients visit primary care practices in the United Kingdom). These findings are consistent with a true change in clinical practice, although proof of a true change would require a detailed record review to examine whether care was still being delivered but was documented in free text rather than in the coded fields used for measurement. Our overall interpretation is that observed reductions in quality for core clinical care (e.g., blood-pressure management, retinopathy screening, and laboratory measurements for processes and outcomes) do reflect changes in clinical practice but that the much larger reductions in documentation of clinical advice and seizure-free status in patients with epilepsy should be more cautiously interpreted. For example, family doctors in the United Kingdom may now be giving less advice about long-acting, reversible contraception, may simply not be documenting that advice, or both. We know that the introduction of incentives to give advice was associated with increases in the use of long-acting contraceptives.27 What we need to know now is whether that increased use persists once incentives to give advice are removed and, more broadly, whether any gains with the introduction of incentives are maintained or lost when those incentives are removed.

If pay for performance is to contribute widely to quality improvement, then it is inevitable that incentives will be removed from some indicators to allow resources for quality improvement to be redeployed.28 A key implication of this study is that although the effect of incentive removal probably depends on the context, reductions in quality are likely, and several studies show that what is gained on incentive introduction is essentially lost on incentive withdrawal. Therefore, at a minimum, payers planning to remove incentives should monitor the quality of care after removal. In doing so, they face the same conundrum involved in introducing incentives: the uncertainty about whether changes in documented quality represent true changes in patient care. Options include using chart review or examining data that directly measure clinical practice, such as laboratory claims or prescribing data. However, collecting such data will add to the cost and potentially the viability of pay-for-performance programs. Better still, randomized removal of incentives or removal from some practices or individual physicians would provide a more robust means of evaluating the effects of incentive removal.29 More generally, such effects will be mediated by the wider context of quality improvement, including public reporting, the underlying quality-of-care infrastructure, and other interventions used alongside incentives to improve quality.8-11,18 Financial incentives that simply pay providers to deliver specified levels of quality therefore seem unlikely to deliver sustainable improvement unless they are aligned with more comprehensive interventions that change the organization of care and future clinical practice.8,10,11