Screening can result in both the benefit of a reduction in mortality and the harm of overdiagnosis. Our analysis suggests that whatever the mortality benefit, breast-cancer screening involved a substantial harm of excess detection of additional early-stage cancers that was not matched by a reduction in late-stage cancers. This imbalance indicates a considerable amount of overdiagnosis involving more than 1 million women in the past three decades — and, according to our best-guess estimate, more than 70,000 women in 2008 (accounting for 31% of all breast cancers diagnosed in women 40 years of age or older).

Over the same period, the rate of death from breast cancer decreased considerably. Among women 40 years of age or older, deaths from breast cancer decreased from 71 to 51 deaths per 100,000 women — a 28% decrease.6 This reduction in mortality is probably due to some combination of the effects of screening mammography and better treatment. Seven separate modeling exercises by the Cancer Intervention and Surveillance Modeling Network investigators provided a wide range of estimates for the relative contribution of each effect: screening mammography might be responsible for as little as 28% or as much as 65% of the observed reduction in mortality (the remainder being the effect of better treatment).13

Our data show that the true contribution of mammography to decreasing mortality must be at the low end of this range. They suggest that mammography has largely not met the first prerequisite for screening to reduce cancer-specific mortality — a reduction in the number of women who present with late-stage cancer. Because the absolute reduction in deaths (20 deaths per 100,000 women) is larger than the absolute reduction in the number of cases of late-stage cancer (8 cases per 100,000 women), the contribution of early detection to decreasing numbers of deaths must be small. Furthermore, as noted by others,14 the small reduction in cases of late-stage cancer that has occurred has been confined to regional (largely node-positive) disease — a stage that can now often be treated successfully, with an expected 5-year survival rate of 85% among women 40 years of age or older.15,16 Unfortunately, however, the number of women in the United States who present with distant disease, only 25% of whom survive for 5 years,15 appears not to have been affected by screening.

Whereas the decrease in the rate of death from breast cancer was 28% among women 40 years of age or older, the concurrent rate decrease was 42% among women younger than 40 years of age.6 In other words, there was a larger relative reduction in mortality among women who were not exposed to screening mammography than among those who were exposed. We are left to conclude, as others have,17,18 that the good news in breast cancer — decreasing mortality — must largely be the result of improved treatment, not screening. Ironically, improvements in treatment tend to deteriorate the benefit of screening. As treatment of clinically detected disease (detected by means other than screening) improves, the benefit of screening diminishes. For example, since pneumonia can be treated successfully, no one would suggest that we screen for pneumonia.

Our finding of substantial overdiagnosis of breast cancer with the use of screening mammography in the United States replicates the findings of investigators in other countries (Table S5 in the Supplementary Appendix). Nevertheless, our analysis has several limitations. Overdiagnosis can never be directly observed and thus can only be inferred from that which is observed — reported incidence. Figure 1 and Figure 2 are based on unaltered, long-standing, carefully collected federal data that are generally considered to be incontrovertible. Table 1 and Table 2, however, are based on assumptions that warrant a more critical evaluation.

First, our results might be sensitive to the period (1976 through 1978) that we chose to obtain data for the baseline incidence of breast cancer (before mammography). If the period were expanded to begin with the first years of SEER data (i.e., 1973 through 1978), the baseline incidence of early-stage cancer would be slightly lower (0.9%) and the incidence of late-stage cancer would be slightly higher (1.4%). These changes offset each other and have a negligible effect on our estimates.

Second, our ability to remove the effect of hormone-replacement therapy (Fig. S1 in the Supplementary Appendix) is admittedly imprecise. Although there is general agreement that this effect had largely ceased by 2006, its onset is not as discrete. We chose to cap the incidence of each disease stage as far back as 1990. However, the pattern of regional disease (Figure 2) suggests that the bulk of the effect of hormone-replacement therapy probably began later, in the mid-1990s, such that our assumption probably overcorrects for the effect of hormone-replacement therapy.

Third, we were forced to make some assumptions about the pattern of the underlying incidence — the incidence that would have been observed in the absence of screening. The simplest approach was to assume that the underlying incidence was constant (the base case). In our best-guess estimate, however, we posited that the underlying incidence was that observed in the population of women without exposure to mammography; this underlying incidence was increasing at a rate of 0.25% per year. Our assumption of an increase of 0.5% per year (in the extreme and very extreme estimates) was admittedly arbitrary. It was twice the rate of increase observed among women younger than 40 years of age and was outside the 95% confidence interval. Perspective on the uncertainty about the underlying incidence, however, is provided in Figure 2. The finding of a stable rate of distant disease argues against dramatic changes in the underlying incidence of breast cancer.

Fourth, our best-guess estimate of the frequency of overdiagnosis — 31% of all breast cancers — did not distinguish between DCIS and invasive breast cancer. Our method did not allow us to disentangle the two. We did, however, estimate the frequency of overdiagnosis of invasive breast cancer under the assumption that all cases of DCIS were overdiagnosed. This analysis suggested that invasive disease accounted for about half the overdiagnoses shown in Table 2 and that about 20% of all invasive breast cancers were overdiagnosed; these findings replicate those of other studies.19

Finally, some investigators might point out that our best-guess estimate of the frequency of overdiagnosis — 31% — was based on the wrong denominator. Our denominator was the number of all diagnosed breast cancers. Many investigators would argue that because overdiagnosis is the result of screening, the correct denominator is screening-detected breast cancers. Unfortunately, because the SEER program does not collect data on the method of detection, we were unable to distinguish screening-detected from clinically detected cancers. Self-reported data from the National Health Interview Survey, however, suggest that approximately 60% of all breast cancers were detected by means of screening in the period from 2001 through 2003.20

Breast-cancer overdiagnosis is a complex and sometimes contentious issue. Ideally, reliable estimates about the magnitude of overdiagnosis would come from long-term follow-up after a randomized trial.21 Among the nine randomized trials of mammography, the lone example of this is the 15-year follow-up after the end of the Malmö Trial,22 which showed that about a quarter of mammographically detected cancers were overdiagnosed.23 Unfortunately, trials also provide a relatively narrow view involving one subgroup of patients, one research protocol, and one point in time. We are concerned that the trials — now generally three decades old — no longer provide relevant data on either the benefit with respect to reduced mortality (because treatment has improved) or the harm of overdiagnosis (because of enhancements in mammographic imaging and lower radiologic and pathological diagnostic thresholds).

Our investigation takes a different view, which might be considered the view from space. It does not involve a selected group of patients, a specific protocol, or a single point in time. Instead, it considers national data over a period of three decades and details what has actually happened since the introduction of screening mammography. There has been plenty of time for the surplus of diagnoses of early-stage cancer to translate into a reduction in diagnoses of late-stage cancer — thus eliminating concern about lead time.24 This broad view is the major strength of our study.

Our study raises serious questions about the value of screening mammography. It clarifies that the benefit of mortality reduction is probably smaller, and the harm of overdiagnosis probably larger, than has been previously recognized. And although no one can say with certainty which women have cancers that are overdiagnosed, there is certainty about what happens to them: they undergo surgery, radiation therapy, hormonal therapy for 5 years or more, chemotherapy, or (usually) a combination of these treatments for abnormalities that otherwise would not have caused illness. Proponents of screening should provide women with data from a randomized screening trial that reflects improvements in current therapy and includes strategies to mitigate overdiagnosis in the intervention group. Women should recognize that our study does not answer the question “Should I be screened for breast cancer?” However, they can rest assured that the question has more than one right answer.