Although there are other examples of randomized studies in health policy, the vast majority have far less rigorous designs.

Some of them are sponsored by the Center for Medicare and Medicaid Innovation, created by the Affordable Care Act. It has spent about $1 billion a year on dozens of programs that pay for Medicare and Medicaid services in new ways intended to enhance quality and reduce spending. Most of the innovation center’s pilots lack randomized designs, for which it has been criticized.

Also potentially problematic: Most of its programs rely on voluntary participation by health care organizations. There might be crucial differences between those that opt in and those that don’t.

Mandatory participation poses its own set of challenges. “If you force a hospital to join a new program, but not its competitor down the street, you might put the hospital at an unfair financial disadvantage,” said Nicholas Bagley, a University of Michigan health law professor. Also, testing voluntary participation makes sense if the program is never intended to be mandatory in the first place.

In considering a mandatory program, you also have to be mindful of politics.

“There will always be winners and losers,” said Darshak Sanghavi, a former senior official for the Center for Medicare and Medicaid Innovation. “If losers are forced to remain in a program, that could cause a political backlash that might blow the whole thing up.”

Randomization can also be challenging; it can be complex and hard to maintain. “A program with desirable features for evaluation, like randomization, that falls apart could be less valuable than one that was designed more realistically from the start,” he said.

Problems can also plague rollouts that are voluntary and not randomized. Programs showing promise suffer from diminishing participation as health care organizations drop out. The innovation center’s pioneer accountable care organization program offered health care organizations the opportunity to earn bonuses in exchange for accepting some financial risk, provided they meet a set of quality targets. It started with 32 participants in 2012. Although studies showed it reduced spending and at least maintained, if not improved, quality, only nine remained by 2016 when the program ended.