Dr. John Haygarth knew that there was something suspicious about Perkins’s Metallic Tractors. He’d heard all the theories about the newly patented medical device—about the way flesh reacted to metal, about noxious electrical fluids being expelled from the body. He’d heard that people plagued by rheumatism, pleurisy, and toothache swore the instrument offered them miraculous relief. Even George Washington was said to own a set. But Haygarth, a physician who had pioneered a method of preventing smallpox, sensed a sham. He set out to find the evidence.

The year was 1799, and the Perkins tractors were already an international phenomenon. The device consisted of a pair of metallic rods—rounded on one end and tapering, at the other, to a point. Its inventor, Elisha Perkins, insisted that gently stroking each tractor over the affected area in alternation would draw off the electricity and provide relief. Thousands of sets were sold, for twenty-five dollars each. People were even said to have auctioned off their horses just to get hold of a pair. And, in an era when your alternatives might be bloodletting, leeches, and purging, you could see the appeal.

Haygarth had a pair of dummy tractors created, carved carefully from wood and painted to resemble the originals. They were to be used on five unsuspecting patients at Bath General Hospital, in England, each suffering from chronic rheumatism. Using the lightest of touches, the fakes were drawn over the affected areas, with remarkable results. Four of the five patients declared that their pain was relieved. One reported a tingling sensation that lasted for two hours. Another regained the ability to walk.

The following day, Haygarth repeated his test using the true metallic tractors, with the same results. Other physicians soon followed his lead, using increasingly elaborate fakes of their own: nails, pencils, even old tobacco pipes in place of the tractors. Each brought the truth more clearly into focus: the tractors were no better than make-believe.

This humble experiment wasn’t the only one of its kind. By the start of the nineteenth century, experimentation had already driven two centuries of significant changes in science. The Royal Society of London, the scientific academy of which Haygarth was an elected fellow, began insisting that all claims needed to be verified and reproduced before they could be accepted as scientific fact. A shakeup was under way. Astronomy had split off from astrology. Chemistry had become disentangled from alchemy. The motto of the society neatly encapsulated the new spirit of inquiry: Nullius in Verba. Translation: “Take nobody’s word for it.”

Physics, chemistry, and medicine have had their revolution. But now, driven by experimentation, a further transformation is in the air. That’s the argument of “The Power of Experiments” (M.I.T.), by Michael Luca and Max H. Bazerman, both professors at the Harvard Business School. When it comes to driving our decisions in a world of data, they say, “the age of experiments is only beginning.”

In fact, if you’ve recently used Facebook, browsed Netflix, or run a Google search, you have almost certainly participated in an experiment of some kind. Google alone ran fifteen thousand of them in 2018, involving countless unsuspecting Internet users. “We don’t want high-level executives discussing whether a blue background or a yellow background will lead to more ad clicks,” Hal Varian, Google’s chief economist, tells the authors. “Why debate this point, since we can simply run an experiment to find out?”

Luca and Bazerman focus on a new breed of large-scale social experiments, the power of which has already been demonstrated in the public sector. As they note, governments have used experiments to find better ways to get their citizens to pay taxes on time, say, or to donate organs after death. N.G.O.s have successfully deployed experiments in developing countries to test the effects of everything from tampons to textbooks. The impact of a simple experiment can be dramatic, particularly in monetary terms.

A few years ago, if you searched for eBay on Google, the top two results would take you directly to the auction site’s home page. The second one was produced organically by the Google algorithm; the first was an advertisement, paid for by eBay and meant to pop up whenever its name appeared as a keyword in someone’s search.

Steve Tadelis, a professor of economics at the University of California, Berkeley, was spending a year at eBay at the time, and was suspicious about the value of placing such ads. Wouldn’t people get to eBay anyway if they were searching for it, without the sponsored results? But, as Luca and Bazerman recount, eBay’s marketing group defended the millions of dollars spent on the ads each year, noting that many people who clicked on them ended up buying things on eBay.

An experiment was in order. By turning Google ads on and off, Tadelis and his research team tracked the traffic coming to their site and discovered that—as Tadelis had suspected—much of the money eBay had been shelling out was wasted. The marketing team had an exaggerated notion of how valuable those ads were: without the sponsored result, searchers would simply click on the free organic links instead. The company could (and did) save itself millions.

There’s an important point in all of this: instead of going by our possibly unreliable intuition, we can, in a range of cases, know for sure whether an intervention has an effect by running a trial and collecting the evidence. It’s a step that Esther Duflo, who shared a Nobel Prize in Economics for her work using experiments to study how global poverty can be alleviated, makes a particularly strong case for. Without gathering and analyzing the evidence, she has said, “we are not any better than the medieval doctors and their leeches.”

The most reliable way to test an intervention is by using a type of experiment known as a “randomized controlled trial” (R.C.T.). You randomly assign subjects to groups and then treat each group differently. One group will receive the intervention, while another, the “control” group, will not. Control here is key. The aim is to make the groups as similar as possible, to constrain as many variables as you can manage, because if the only thing allowed to change freely is the intervention itself you can study its true effect. In the tech world, the “intervention” might simply be a different Web-page layout or a new pricing plan. Here, the usual term is “A/B testing,” but the objective is the same: to create some basis for comparison.

Such studies tell you whether something works, though not why. Haygarth’s experiment wasn’t a randomized trial by modern standards, but he nonetheless proved the power of experimenting: by directly comparing the experiences of patients on the day they got treated with the tractors with their experiences on the day they were treated with the fakes, he could show that the tractors were duds. The second set of observations served as a kind of control group.

Without a properly randomized control group, there is no real way to measure whether something is working. Take the case of the Scared Straight program, developed in the United States to discourage at-risk kids from choosing a life of crime. The theory seemed sound. By taking young offenders on organized visits to prison and allowing them to meet murderers and armed robbers, they’d see the terrifying consequences of breaking the law, and be less likely to do so themselves in the future.