PHILADELPHIA — The Penn Medicine Institute for Biomedical Informatics has launched a free, open-source automated machine learning system for data analysis that is designed for anyone to use, from a high school student looking to gain insight on their baseball team’s statistics, to trained researchers looking for associations between cancer and environmental factors. “Penn AI,” the first widely available tool of its kind, seeks to lower the barrier for entry into artificial intelligence, allowing users to bring in their own datasets or use the several hundred that are available for download within the tool. With a user-friendly dashboard easily run on a laptop, Penn AI is also designed to learn as it goes, ultimately making analysis suggestions based on the “experience” it gains through use.

“The problem with machine learning tools is that machine learning people build them, so they’re usually only usable by those with high levels of training,” said Jason Moore, PhD, director of the Institute for Biomedical Informatics and a professor of Biostatistics, Epidemiology and Informatics. “My team has taken three years to develop this system so that it can be approachable by anyone, regardless of their training or experience. Our goal has been to make a free and simple system that is still robust enough to transform the way we approach biomedical research—which I think we’ve accomplished.”

Penn AI is an automated machine learning system, which means that the artificial intelligence engine behind it can work out different analyses with different variables and methods on its own, without needing human input. Machine learning without automation requires someone to choose a specific method and manually adjust each parameter that the AI engine works on, often requiring more advanced knowledge of data in order to get meaningful results. Even for people with that know-how, there is still some guesswork involved. However, automation can eliminate much of that, and as Penn AI is used more and more, it will continually “learn” the best methods for analyzing data and will provide recommendations for its users based on what they are looking to glean.

By removing some of the complexities and adding the element of automation, Moore and his team believe that they can also make it much more common in clinical spaces.

“I want this to be self-service, clinical AI,” Moore said. “I believe that this tool can make it so that it will soon be routine for a doctor to say, ‘I want to look at the associations between sex, age, smoking and different diseases,’ and then have this tool answer their questions.”

Additionally, by making Penn AI’s analysis open source, it allows doctors to see the mechanisms behind each analysis—how the tool got to each endpoint. Other programs available are expensive, don’t do all of the things that Penn AI can and, most importantly, don’t allow a look inside their coding.

“If you’re going to use machine learning for patients, you want to trust it completely,” Moore said. “You want to be able to look under the hood. This allows for that, which builds some faith among clinicians, which is important for user buy-in.”

In Moore’s words, Penn Medicine is a “data rich environment,” but that may not be the case for every health system. As such, Moore hopes that Penn AI will provide an outlet for researchers in any locations to dig deeper into what information they have collected and to share it—of course, in a de-identified way.

“I think this is really going to accelerate biomedical research,” Moore said. “We’ll be able to do almost instantly what it takes weeks and months—and thousands or millions of dollars—to do now.”

Moving forward, Moore envisions adding more complex features that more advanced users could utilize. For example, he’d like to add in ensemble approaches, a technique that allows multiple machine learning apparatuses to work on the same dataset at the same time in order to develop a more robust analysis.

The development of this tool was supported by National Institutes of Health research grants (R01 LM010098 and R01 AI116794) and National Institutes of Health infrastructure and support grants (UC4 DK112217, P30 ES013508, and UL1 TR001878).

Members of the development team included Heather Williams, Steve Vitale, Sharon Tartarone, Weixuan Fu, William La Cava, Josh Cohen, Randal Olson, Patryk Orzechowski, John Holmes, Moshe Sipper, and Ryan Urbanowicz.