Lung cancer is the second most common type of cancer and leading cause of cancer death in men and women, with non-small cell lung cancer (NSCLC) accounting for up to 90% of cases. Somatic mutations heavily impact the sensitivity of NSCLC patients to various drug treatments, and are critical for choosing the most effective targeted therapies for this cancer. Most NSCLC patients develop resistance to their targeted therapies during the first year of treatment. The reason for resistance is still unknown.

"Currently, there is no computational method to link information from medical records to somatic mutations and targeted therapy responses," says Saeed Hassanpour, PhD, a computer scientist at Dartmouth's Norris Cotton Cancer Center. Hassanpour has received a 4-year $1.5M grant from the National Cancer Institute to build and validate machine learning approaches that can reveal relationships between clinical and pathologic findings, patient genetic profiles and drug resistance. Linking these data could mean better, personalized treatment strategies for NSCLC patients.

Our overall objective is to use pathology reports of NSCLC tumors and available data from electronic medical records to build computational models for identifying NSCLC patients with clinically-actionable somatic mutations and predicting their responses to targeted therapies. We think that pathological findings of NSCLC cells and tissues, in combination with relevant information in medical records, such as medical and family history, demographics and smoking status, will be reliable indicators to achieve this objective. Saeed Hassanpour, PhD, a computer scientist at Dartmouth's Norris Cotton Cancer Center

Hassanpour's team will test their hypothesis by building and validating novel information-extraction and machine learning approaches that leverage textual information to identify statistically significant connections. The results will then be used to identify NSCLC patients with clinically-actionable mutations and predict their resistance to targeted therapies.

Revealing these relationships with use of a well-designed bioinformatics approach will allow Hassanpour's clinical collaborators to better understand of how NSCLC tumors develop and respond to treatment. "We expect our machine learning methods to identify NSCLC patients with clinically-actionable mutations based on tumor pathology reports and EMR data, and to provide an accurate, fast, and inexpensive pre-selection method that can be utilized before performing time-intensive and expensive DNA sequencing to find the same mutations," explains Hassanpour. "As a result, this prediction method is expected to prioritize DNA screening of NSCLC patients who are the most likely to have clinically-actionable mutations, thus reducing screening turnaround time and increasing the accuracy of treatment administration."

In addition, the team's pre-selection method will improve the finding and tracking of NSCLC patients with clinically-actionable mutations for translational research, help with recruitment of NSCLC patients for clinical trials, assist care providers in selecting the best treatment strategies, improve survival outcomes for NSCLC patients, and extend quality of life. "In precision cancer care, even identifying the high likelihood of resistance to a targeted therapy has important implications on the choice of the 'best' treatment strategy for NSCLC patients and their responsiveness," says Hassanpour. "Our application will have a meaningful, positive impact on public health and the promotion of precision medicine."