Samara Rosenfeld

Deep learning models may not perform as accurately as expected if AI in the medical space is not tested more carefully.

Artificial intelligence (AI) performed worse at detecting pneumonia through images from different health systems than in data from a single organization, suggesting that AI must be tested carefully for performance across a wide range of populations, according to a new study Researchers from the Icahn School of Medicine at Mount Sinai used convolutional neural networks (CNNs) to analyze chest X-ray images to help provide a pneumonia diagnosis. The researchers used CNNs across three hospital systems — National Institutes of Health Clinical Center, Mount Sinai Hospital and Indiana University Network for Patient Care — for a simulated pneumonia screening task.In three out of five comparisons, researchers determined that the CNNs’ performance in diagnosing diseases on X-rays from hospitals outside its own network was significantly lower than on X-rays from the original health system. But CNNs exhibited a high degree of accuracy in detecting the hospital system where an X-ray was acquired.According to the researchers, deep learning models use too many parameters, which makes it difficult to identify specific variables driving predictions and complicates their effectiveness in healthcare.“Our findings should give pause to those considering rapid deployment of artificial intelligence platforms without rigorously assessing their performance in real-world clinical setting reflective of where they are being deployed,” senior author Eric Oermann, M.D., instructor in neurosurgery at the Icahn School of Medicine at Mount Sinai, said in a statement.CNN systems used for medical diagnosis need to be tailored to consider clinical questions, tested for real-world scenarios and assessed to determine how they affect accurate diagnoses, first author John Zech, a medical student at the Icahn School of Medicine at Mount Sinai, said in a statement.CNNs’ performance in diagnosing diseases on X-rays may reflect their ability to identify disease-specific imaging findings and exploit confounding information. Additionally, CNNs’ performance may overstate their real-world performance.Deep learning models may not perform as accurately as expected if AI in the medical space is not tested more carefully, the study found.A total of 158,323 chest radiographs were drawn from the three participating institutions. Researchers elected to study the diagnosis of pneumonia on chest X-Rays due to its common occurrence, clinical significance and prevalence in the research community. Before computer-aided devices can be used in real-world clinical settings, they must first be able to generalize across a variety of hospital systems.Get the best insights in healthcare analytics directly to your inbox