Regina Barzilay is working with MIT students and medical doctors in an ambitious bid to revolutionize cancer care. She is relying on a tool largely unrecognized in the oncology world but deeply familiar to hers: machine learning.

Barzilay, the Delta Electronics Professor of Electrical Engineering and Computer Science, was diagnosed with breast cancer in 2014. She soon learned that good data about the disease is hard to find. “You are desperate for information — for data,” she says now. “Should I use this drug or that? Is that treatment best? What are the odds of recurrence? Without reliable empirical evidence, your treatment choices become your own best guesses.”

Across different areas of cancer care — be it diagnosis, treatment, or prevention — the data protocol is similar. Doctors start the process by mapping patient information into structured data by hand, and then run basic statistical analyses to identify correlations. The approach is primitive compared with what is possible in computer science today, Barzilay says.

These kinds of delays and lapses (which are not limited to cancer treatment), can really hamper scientific advances, Barzilay says. For example, 1.7 million people are diagnosed with cancer in the U.S. every year, but only about 3 percent enroll in clinical trials, according to the American Society of Clinical Oncology. Current research practice relies exclusively on data drawn from this tiny fraction of patients. “We need treatment insights from the other 97 percent receiving cancer care,” she says.

To be clear: Barzilay isn’t looking to up-end the way current clinical research is conducted. She just believes that doctors and biologists — and patients — could benefit if she and other data scientists lent them a helping hand. Innovation is needed and the tools are there to be used.

Barzilay has struck up new research collaborations, drawn in MIT students, launched projects with doctors at Massachusetts General Hospital, and begun empowering cancer treatment with the machine learning insight that has already transformed so many areas of modern life.

Machine learning, real people

At the MIT Stata Center, Barzilay, a lively presence, interrupts herself mid-sentence, leaps up from her office couch, and runs off to check on her students.

She returns with a laugh. An undergraduate group is assisting Barzilay with a federal grant application, and they’re down to the wire on the submission deadline. The funds, she says, would enable her to pay the students for their time. Like Barzilay, they are doing much of this research for free, because they believe in its power to do good. “In all my years at MIT I have never seen students get so excited about the research and volunteer so much of their time,” Barzilay says.

At the center of Barzilay’s project is machine learning, or algorithms that learn from data and find insights without being explicitly programmed where to look for them. This tool, just like the ones Amazon, Netflix, and other sites use to track and predict your preferences as a consumer, can make short work of gaining insight into massive quantities of data.

Applying it to patient data can offer tremendous assistance to people who, as Barzilay knows well, really need the help. Today, she says, a woman cannot retrieve answers to simple questions such as: What was the disease progression for women in my age range with the same tumor characteristics?

What a machine can see

Working closely with collaborators Taghian Alphonse, chief of breast radiation oncology at Massachusetts General Hospital (MGH); Kevin Hughes, co-director of the Avon Comprehensive Breast Evaluation Center at MGH; and Constance Lehman, the chief of the breast imaging division at MGH, Barzilay intends to bring data science into clinical research nationwide. But first, she’s content with connecting her world with theirs.

Barzilay’s work in natural language processing (NLP) enables machines to search, summarize, and interpret textual documents, such as those about cancer patients in pathology reports. Using NLP tools, she and her students extracted clinical information from 108,000 reports provided by area hospitals. The database they've created has an accuracy rate of 98 percent. Next she wants to incorporate treatment outcomes into it.

For another study, Barzilay has developed a database that Hughes and his team can use to monitor the development of atypias, which help identify which patients are at risk of developing cancer later in life.

Machines are good at making predictions — “Why not throw all the information you have about a breast cancer patient into a model?” she says — but Barzilay is wary of having the recommendations arrive as highly complex, computational recommendations without explanation. Jointly with Tommi Jaakkola, a professor of electrical engineering and computer science at MIT, and graduate student Tao Lei, she is also developing interpretable neural models that can justify and explain the machine-based predictive reasoning.

Barzilay is also looking at how new tools can help do preventive work. Mammograms contain lots of information that may be hard for a human eye to decipher. Machines can detect subtle changes and are more capable of detecting low-level patterns. Jointly with Lehman and graduate student Nicolas Locascio, Barzilay is applying deep learning for automating analysis of mammogram data. As the first step, they are aiming to compute density and other scores currently derived by radiologists who manually analyze these images. Their ultimate goal is to identify patients who are likely to develop a tumor before it’s even visible on a mammogram, and also to predict which patients are heading toward recurrence after their initial treatment.

Ultimate success, Barzilay says, will involve drawing on computer science in unexpected ways, and pushing it in a variety of new health-related directions.

Outside her door, several of Barzilay’s students are talking ideas, hunching over laptops, and drinking coffee. An object set against the back wall resembles an odd coatrack. Guided by an idea from Taghian, six undergraduate students, led by graduate student Julian Straub, built a device that uses machine-learning to detect lymphedema, a swelling of the extremities that can be caused by the removal of or damage to lymph nodes as part of cancer treatment. It can be disabling and incurable unless detected early. Because of their high cost, these machines — lymphometers — are rare in the U.S.; very few hospitals have them.

Students have created an affordable version. And they hope to start testing this device at MGH in a couple of months. “These students are doing amazing work,” says Barzilay. “These innovations will make a really big difference. It is an entry point. There is so much to do. We are just getting started.”