One of our newly “standout” charities, Development Media International (DMI), is in the midst of a randomized controlled trial. So far, all we have from the trial is information about self-reported behavior change, and we’ve tried to use that information to estimate how many lives the program will likely save (for purposes of our cost-effectiveness analysis). We estimate that the measured behavior changes should equate to about a 3.5% reduction in child mortality. However, DMI is hoping for a 19% reduction, and by our estimate, if it falls short of 10-14%, it will likely fail to find a statistically significant impact. What should we put more credence in – GiveWell’s projection based on available data about behavior change, or DMI’s projection?

Ordinarily, I’d be happy to consider the GiveWell estimate a best guess. I’m used to charities’ estimates turning out to be optimistic, and DMI’s estimate is based on a general model rather than on the actual data we have about its impact on behavior.

However, I find myself very uncomfortable predicting a figure of 3.5% when the people carrying out a study – and paying the considerable expenses associated with it – are expecting 10-20%. I’m uncomfortable with this discrepancy for two reasons:

It’s a little hard to imagine that an organization would go to this level of expense – and reputational risk – if they weren’t fairly confident of achieving strong results. Most predictions and projections charities put out are, in a sense, “cheap talk,” by which I mean it costs a charity little to make strong claims. However, in this case DMI is conducting a study costing millions of dollars*, and by being public about the study, they face a significant public relations risk if the results are disappointing (as our projection implies they will be).

I also struggle to think of examples of studies like this one – large, expensive, publicized studies focused on developing-world health or economic empowerment – that have turned out to be “disappointing” from the perspective of people carrying out (and/or paying for) the study. Though I do know of a fair number of studies showing “no impact” for an intervention, I believe they’ve generally been academic studies looking at very common/popular interventions (e.g. improved cookstoves, microlending). These “no impact” results were noteworthy in themselves, and didn’t necessarily reflect poorly on the people conducting or paying for the studies. I have a much harder time thinking of cases in which a major developing-world study found results that I’d consider disappointing or embarrassing for those carrying out or funding the study. The only one that comes to mind is the DEVTA trial on vitamin A and deworming.

I haven’t taken the time to systematically examine the intuition that “developing-world studies rarely find results that are disappointing/embarrassing for those carrying out the study.” It’s possible that the intuition is false; it’s also possible that it’s an artifact of the sort of publication bias that won’t affect DMI’s study, since the DMI study’s existence and hypothesis are already public. Finally, it seems worth noting that I don’t have the same intuition about clinical trials: indeed, failed clinical trials are frequent (especially in the relatively expensive Phase II).

With that said, if my intuition is correct, there are a couple of distinct possible explanations:

Perhaps, in developing-world settings, it is often possible to have a good sense for whether an intervention will work before deciding to run a formal study on it. Accordingly, perhaps expensive studies rarely occur unless people have a fairly good sense for what they’re going to find. Perhaps publication-bias-type issues remain important in developing-world randomized studies. In other fields, I’ve seen worrying suggestive evidence that researchers “find what they want to find” even in the presence of seemingly strong safeguards against publication bias. (Example.) Even with a study’s hypothesis publicly declared, we believe there will still be some flexibility in terms of precisely how the researchers define outcomes and conduct their analysis. This idea is something that continues to worry me when it comes to relying too heavily on randomized studies; I am not convinced that the ecosystem and anti-publication-bias measures around these studies are enough to make them truly reliable indicators of a program’s impact.

Even with #2 noted as a concern, the bottom line is that I see a strong probability that DMI’s results will be closer to what it is projecting than to what we are projecting, and conditional on this, I see a relatively strong probability that this result will reflect legitimate impact as opposed to publication bias. Overall, I’d estimate a 50% chance that DMI’s measured impact on mortality falls in the range of 10-20%; if I imagine a 50% chance of a 15% measured impact and a 50% chance of a 3.5% measured impact (the latter is what we are currently projecting), that comes out to about a 9% expected measured impact, or ~2.5x what we’re currently projecting.

In either case, I’ll want our cost-effectiveness estimate to include a “replicability adjustment” assigning only a 30-50% probability that the result would hold up upon further scrutiny and replication (this adjustment would account for my reservations about randomized studies in general, noted under #2 above). Our current cost-effectiveness estimate assigns a 50% probability. Overall, then, it could be argued that DMI’s estimated cost-effectiveness with the information we have today should – based on my expectations – be 1.5-2.5x what our review projects. That implies a “cost per life saved” of ~$2000-$3300, or about 1-1.7x as strong as what we estimate for AMF. It is important to note that this estimate would be introducing parameters with a particular sort of speculativeness and uncertainty, relative to most of the parameters in our cost-effectiveness calculations, so it’s highly debatable how this “cost per life saved” figure should be interpreted alongside our other published estimates.

DMI has far less of a track record than our top charities this year. In my view, slightly better estimated cost-effectiveness – using extremely speculative reasoning (so much so that we decided not to include it in our official cost-effectiveness estimate for DMI) – is not enough to make up for that. Furthermore, we should know fairly soon (hopefully by late 2015) what the study’s actual results are; given that situation, I think it makes sense to wait rather than give now based on speculation about what the study will find. But I do have mixed feelings on the matter. People who are particularly intent on cost-effectiveness estimates, and agree with my basic reasoning about what we should expect from prominent randomized studies, should consider supporting DMI this year.

*The link provided discusses DMI’s overall expenses. Its main activity over the time period discussed at the link has been carrying out this study.