Take notes for the digital doctors’ aide Jim Varney/Getty

Blood count. Biopsy. Drug cocktails. Snippets like these tell the story of a person’s experience of cancer. Gather up the stories of hundreds of thousands of people and you could learn about the disease itself.

A team at Memorial Sloan Kettering Cancer Center in New York is training an artificial intelligence to find similarities between cases that human doctors might miss. The software combs through 100 million sentences taken from clinical notes about people with cancer.

“We’re looking into the exhaust of all that data to try to find something interesting,” says Gunnar Rätsch, who presented the work at the annual meeting of the American Association for the Advancement of Science in Washington DC last week.


His idea is to build computational models that capture how a person is doing, how they compare to others and how their disease is likely to progress in the future. “Once we have that, we can think about how to treat the patient best.”

Secret similarities

Rätsch’s team built a machine learning algorithm to crunch through anonymised clinical notes from 200,000 people with cancer. Their program sorted millions of sentences – including patients’ symptoms, medical histories and doctors’ observations – into 10,000 related clusters.

Each cluster represented a common observation found across several medical records. For example, a doctor’s note recommending a particular course of treatment, or picking up on a noteworthy symptom. Connections between clusters were then mapped, showing the relationships between different comments or courses of treatment.

In a second study building on Rätsch’s work, the clusters are now being compared against the records of about 2000 people with different types of cancer. The researchers are looking for hidden associations between written notes and patients’ gene and blood sequencing. For example, patients with similar genetic results might have the same kind of note in their files. These connections can reveal similarities doctors might not have noticed before.

Digital diagnosis

The hope is that these associations will inspire ideas for research. “You can take the genetic information and make this connection in order to find new hypotheses, which can then be tested,” says Rätsch.

Machine learning is proving useful in a range of medical applications. For example, computers are also being trained to diagnose problems using biological images like X-rays and MRI scans. Another system at Chicago hospitals learned to predict when people were likely to experience a heart attack in the near future.

“The human mind is limited, hence you need to use statistics and computer science,” says Rätsch.