Everyone agrees health care in the United States is a colossal mess, and IBM is betting that artificially intelligent supercomputers are just what the doctor ordered. But some health professionals say robodoctors are just flashy toys.

Such are the deep questions raised by the medical incarnation of Watson, the language-processing, information-hunting AI that debuted in 2011 on the quiz show Jeopardy!, annihilating the best human player ever and inspiring geek dreams of where its awesome computational power might be focused next.

IBM has promised a Watson that will in microseconds trawl the world’s medical knowledge and advise doctors. It sounds great in principle, but the project hasn’t yet produced peer-reviewed clinical results, and the journey from laboratory to bedside is long. Still, some doctors say Watson will be fantastically useful.

“It’s not humanly possible to practice the best possible medicine. We need machines,” said Herbert Chase, a professor of clinical medicine at Columbia University and member of IBM’s Watson Healthcare Advisory Board. “A machine like that, with massively parallel processing, is like 500,000 of me sitting at Google and Pubmed, trying to find the right information.”

Others, including physician Mark Graber, a former chief of the Veterans Administration hospital in Northport, New York, are less enthused. “Doctors have enough knowledge,” said Graber, who now heads the Society to Improve Diagnosis in Medicine. “In medicine, that’s not the problem we face.”

Chase and Graber embody the essential tensions of applying Watson to healthcare, even if the machine is inarguably a wonder of artificial intelligence. Winning Jeopardy! might seem like a trivial, so to speak, accomplishment, but it was an enormous computational achievement.

Watson wasn’t programmed with the information it needed, but given the cognitive tools necessary to acquire the knowledge itself, teasing out answers to complicated questions from vast amounts of electronic information. And it did this not in response to computer-language queries posed through an arcane interface, but with everyday conversational English.

'A machine like that is like 500,000 of me sitting at Google and Pubmed.'

After all, doctors make mistakes. Lots of mistakes. Enough to kill about 200,000 Americans annually. Experts put misdiagnosis rates around 10 percent, a number that varies widely by condition but in some situations, such as complicated cancers, goes far higher. Watson’s programmers say the machine might prevent many of those mistakes. It would constantly be updated with the latest medical knowledge, bringing to every doctor insights that often take years to filter out of academia, and merging those insights with each patient’s own data.

“We have all these different dimensions of data about an individual. How do we match the different characteristics they have — personal, medical — with a set of knowledge, of information, that is going to define what the best thing for them to do is?” said Basit Chaudhry, lead research clinician for Watson, at the Wired Health Conference on Oct. 16.

IBM launched partnerships with insurance giant WellPoint and the Sloan-Kettering Cancer Center in New York and is expected offer Watson commercially to hospitals within the next few years. Yet though Watson is clearly a powerful tool, doctors like Graber wonder if it’s the right tool. “Watson may solve the small fraction of cases where inadequate knowledge is the issue,” he said. “But medical school works. Doctors have enough knowledge. They struggle because they don’t have enough time, because they didn’t get a second opinion.”

Several years ago, Graber, who has written extensively on diagnostic error, led an Archives of Internal Medicine analysis of 100 cases of misdiagnoses. Only in a few cases did doctors err because they lacked necessary information. Most misdiagnoses arose from cognitive problems, such as overconfidence or inattention, or systemic problems of miscommunication, inefficiency and poor teamwork. Doctors had many problems, but being unable to find the latest journal article wasn’t one of them.

According to Chase, one of Watson’s great virtues could be in providing unbiased second opinions. “The machine says, you thought of 10 things. Here are the other five,” he said. “You’ve probably seen Jerome Groopman’s book, How Doctors Think, about the mistakes doctors make. A simple one is anchoring: You get stuck to some diagnosis. We’ve all had that experience. A machine can change its diagnostic profile on a dime based on new information. One of the things a machine is not is biased.”

Graber, who uses existing computerized decision-support services like Isabel and Dxplain in his clinical practice and teaching, agreed that Watson could help synthesize information that a doctor might know, “but for whatever reason didn’t come to mind.” He warned, however, that doctors will need to guard against a new source of bias: over-reliance on Watson. “When I use my GPS too much, I never really learn the layout of a new city,” Graber said. “Same story.”

He and Chase also disagree on the implications for health costs. Chase sees Watson helping doctors and patients reduce eliminate unnecessary tests and treatments, which now cost $750 billion per year. Graber fears that Watson’s ability to identify many possible diagnoses will encourage patients to ask for even more tests and procedures, setting off a cost-inflating “diagnostic cascade.”

Susan Saleeb, a pediatric cardiologist at Children’s Hospital Boston, said the types of evidence used by physicians to make might not be contained in databases Watson will scan. Research led by Children’s Hospital chief cardiologist James Lock found that nearly 80 percent of decisions made by children’s heart doctors weren’t grounded in already published data, and less than 3 percent referenced a specific study. The doctors were thinking on their feet.

“There is limited hard and fast data to use to plug into a computer to start the process,” said Saleeb, who has helped develop a patient management program called SCAMPS — short for Standardized Clinical Assessment and Management Plans — that does something similar to what’s predicted for Watson: Patient information is entered into a decision-guiding framework, and treatment recommended by algorithm.

Building SCAMPS, however, is essentially done by hand, involving collaboration between Children’s Hospital doctors, nurses and statisticians, who manually amass and analyze thousands of anecdotes and clinical records, generate hypothetical decision-trees, and compare them to real-world cases. Each SCAMP is labor-intensive and involves just one condition, such as their most recent plan for chest pain in children.

Watson might take advantage of a SCAMP once it’s built, said Saleeb, but medical experts need to make it. That said, she thinks an artificial intelligence could help during construction. “The Watson system seems like it would be ideal for data analysis and hypothesis testing,” she said. The program “could break free of human biases and limitations of data manipulation.”

'Making a decision on a patient goes beyond inputting a bunch of data points.'

“The real power, that IBM isn’t addressing now but will down the road, isn’t mastering the medical literature, but going straight to raw data and telling us what we don’t know,” said Shapiro, who is also on the Watson Healthcare Advisory Board. “The beauty of Watson is that it doesn’t need the structured format.”

At his Wired Health Conference presentation, Chaudhry mentioned Watson’s ability to analyze informal information, such as nurse’s notes, that even doctors often overlook. “A lot of the care activity experience is captured in these kinds of notes,” Chaudhry said. “Being able to capture that kind of information, organize the knowledge involved with it, is going to be really important to what we can do.”

Another concern about Watson is one heard often as computers become as common as stethoscopes in doctors’ offices: the fear that patient-doctor interaction will become too electronically mediated, ultimately depriving patients of the medical benefits of personal attention. The average American already spends less than 30 minutes every year in direct personal contact with a doctor, much less than in other developed countries.

“The little things that doctors can see and hear from listening to you talk, the strange things that a doctor might ask because she has a hunch about something — computers aren’t very good at that,” said Evan Falchuk, vice chairman of Best Doctors, a company that offers second opinions. “The amount of judgment involved in making a decision on a patient goes beyond inputting a bunch of data points and running searches on the internet.”

“And, very importantly, what is it like to be a patient?” Falchuk continued. “When you go to your doctor, there’s a degree to which what you’re doing is sharing deeply personal stuff with someone you trust. That relationship between two human beings is something that’s very important and unique. And I haven’t seen anything that replicates that.”

Steve Shapiro thinks Watson could actually free doctors to concentrate on relationships. “Watson is dependent on the input of information. The physician has to ask the right questions,” he said. “We still need the physicians, the history-taking. That’s where the art of medicine is,” he said. “Watson isn’t the answer, but it can be an aide.”

“There are some tasks that humans are better at than machines, and vice versa,” said Chase. “The challenge is to exploit the tasks that machines are better at, freeing physicians to do what they do best.”

Update 10/18: Michael Magill, chairman of the University of Utah School of Medicine’s family and preventive medicine department, sent a very thoughtful commentary by email. It arrived too late for inclusion in the main article, but is presented here in its entirety:

“I am very hopeful that IBM Watson will become a powerful new tool in support of improved health care, but I don’t think its best early uses will be for types of decision making most would initially consider (such as the traditional diagnostic process). Nor will it in any way reduce the importance of the doctor-patient relationship, clinical acumen, or human judgment required to develop evidence-based clinical guidelines.

Rather, I suspect it will be most useful to help organize information in a way we have not previously been able to do, such as discovering patterns of illness based on analysis of large amounts of clinical data in stored in electronic medical record repositories. In other words, its uses may be in research sooner than in support of direct clinical care. That said, it is still too early to know which of many potential applications will emerge first, and it may prove useful in selected clinical circumstances sooner than others.”