Highlights:

In the history of U.S. social policy, the federal government has commissioned 13 large randomized controlled trials (RCTs) to evaluate the effectiveness of ongoing, Congressionally-authorized federal programs, such as Head Start, Job Corps, Abstinence Education, and Washington, D.C. school vouchers.

Eleven of the 13 RCTs found that the programs produced either no significant positive effects on the key targeted outcomes or small positive effects that dissipated shortly after participants completed the program. One RCT found sizable program effects and another found modest effects.

However, among the 11 disappointing findings, evidence suggests that a subset of activities funded by these programs were indeed effective. Thus, reforming the programs to incorporate evidence-based funding criteria could produce much better results.

In a world where most attempts to make progress fail and a few succeed, spending as usual without a clear focus on evidence about what works is unlikely to solve the nation’s problems.

In the history of U.S. social policy, there have been 13 instances in which the federal government commissioned large randomized controlled trials (RCTs) to evaluate the effectiveness of entire, Congressionally-authorized federal programs. To be specific, this group includes programs authorized by Congress in statute as ongoing spending initiatives (as opposed to demonstration or pilot projects). Some of the programs, such as Head Start and Job Corps, were initially championed primarily by Democrats; others, such as Abstinence Education and the D.C. Opportunity Scholarship Program (i.e., private school vouchers) were championed primarily by Republicans. Most of the programs have enjoyed longstanding bipartisan support.

At the end of this report, we list these 13 “whole program” RCTs and provide links to the final study reports.

So what did the studies find?

Eleven of the 13 RCTs found that the programs produced either no significant positive effects on the key targeted outcomes or small positive effects that dissipated shortly after program completion. In other words, individuals who participated in the program (i.e., the treatment group) did little or no better over time than those who did not (i.e., the control group).

The most recent whole program RCT to report final results—the evaluation of the U.S. Department of Education’s Teacher Incentive Fund (TIF)—is a fairly typical example. Between 2006 and 2012, TIF awarded $1.8 billion in grants to school districts to support the establishment of pay-for-performance bonus systems for teachers and principals in high-need schools. The RCT—a large, well-conducted study with a sample of 138 schools in 10 districts—found that TIF produced no statistically-significant effects on student reading or math achievement at the final follow-up four years after random assignment (the nonsignificant effects were very small—one to two percentile point gains on state tests, equating to 0.04 standard deviations). Our two-page summary of the study is linked here, and the full study report is linked here.

Among the 13 whole program RCTs, there was one clear finding of effectiveness—for the Defense Department’s National Guard Youth ChalleNGe, an intensive, residential youth development program for high school dropouts ages 16 to 18. At the three-year follow-up, the study found sizable effects on both educational and workforce outcomes (e.g., a 20 percent earnings increase in the final year of the study, or about $2,600 in 2018 dollars). In addition, there was one borderline case—the RCT of the Department of Labor’s Job Training Partnership Act (JTPA)-Adult Program, an employment training program for economically disadvantaged adults. The national RCT of JTPA found a sustained effect on adults’ earnings during the three to five years after random assignment, but it was quite modest in size (a 5 percent to 10 percent increase, or about $850 per year in 2018 dollars), and it was statistically significant in some years but not others.[i]

What lessons can we draw from these findings?

First, these RCT results are another example of the “800-pound gorilla” that we described in earlier Straight Talk reports as stymying efforts to make progress across many fields (e.g., social policy, medicine, business):

The bottom line is that it is harder to make progress than commonly appreciated…. The pattern of disappointing effects for most rigorously-evaluated programs—along with findings of important positive effects for a few—is compelling and transcends multiple fields. It needs to be taken seriously.

Second, the findings for these 13 federal programs provide clear evidence that most of them are not producing the hoped-for progress in addressing problems such as poverty, educational failure, and teen pregnancy. It is not a great leap of logic to infer that many other government programs would also be found not to produce their intended effects if they were rigorously evaluated.

Third, we believe it is important not to over-interpret the findings as showing that, for the 11 federal programs with disappointing findings, none of the projects or activities that they funded were effective. Most of these federal programs are actually broad funding streams that allow state and local funding recipients great flexibility in how to spend their funds, resulting in a heterogeneous set of program services. The federal Head Start program, for example, funds local preschool activities that vary in numerous dimensions, such as the setting in which they are delivered (e.g., classroom versus home), teacher hiring and training practices, curriculum used, degree of parental involvement, and duration of services (e.g., school year versus full year, half day versus full day). Although the national Head Start RCT found that the program’s average effect at the first- and third-grade follow-ups was insignificant, it is likely that a subset of Head Start-funded centers and activities were indeed effective, but that their impact was diluted by others that were ineffective or even harmful. Recent studies suggest that there is indeed substantial variation in effectiveness across Head Start sites and activities [1, 2].

Our previous Straight Talk report provided a concrete example of this phenomenon: RCT evidence shows that charter schools as a general educational strategy do not, on average, improve student achievement compared to the regular public schools in their jurisdiction; however, some specific charter school models, such as the Knowledge is Power Program (KIPP), have been shown in high-quality RCTs to produce much better achievement outcomes than their regular school counterparts.

The above findings and observations, we believe, underscore the imperative to reform federal programs such as Head Start to incorporate (i) rigorous evaluations aimed at identifying the subset of funded activities that are effective; and (ii) once such activities are identified, strong incentives or requirements for program grantees to adopt and faithfully implement them on a larger scale. (In a previous report, we outlined one possible strategy—“tiered evidence”—for achieving this goal.) In the world of the 800-pound gorilla—where most attempts to make progress fail and a few succeed—spending as usual without a clear focus on evidence about what works is unlikely to solve the nation’s problems.

List of Federally-commissioned RCTs of Whole Federal Programs

Notes :

U.S. Department of Health and Human Services = HHS

U.S. Department of Education = ED

U.S. Department of Labor = DOL

U.S. Department of Defense = DoD

Other whole program RCTs that are underway and have only reported early results. These include, for example, evaluations of (i) DOL’s Workforce Investment Act (WIA) Adult and Dislocated Worker programs (ii) DOL’s YouthBuild program, and (iii) ED’s DC Opportunity Scholarship program (this is a second RCT of the program).

[i] In an earlier article, we had also characterized the findings for Early Head Start—a program for low-income families with infants or toddlers—as modestly positive based on the age-5 findings of significant positive effects on some child social-emotional and language outcomes. However, a subsequent follow-up found that, unfortunately, these early effects faded, and by age 10 there were no longer significant effects on any child outcome (academic or social-emotional) or family outcome (well-being, mental health, economic self-sufficiency).