In theage of big data we often seem to be drowning in a constant torrent ofresearch and information. The massive challenge we now face is how tosort through all the work that has been produced. In an excitingcollaboration between computer scientists and cancer researchers atthe University of Cambridge, a novel AI system has been developed tohelp sort through millions of scientific studies and help researchersuncover previously missed connections.

Science, by its very nature, is a piecemeal process. Each tiny newdiscovery or development adds to our greater body of knowledge, butwe are now reaching a point where there is such a giant volume ofdata available on every research topic, no single human mind can reasonably wade throughit.

"As a cancer researcher, even if you knew what you were lookingfor, there are literally thousands of papers appearing every day,"says Anna Korhonen, one of the developers of the new AI system.

Called LION LBD, the system is initially focusingon cancer research due to the broad volume of research on the topicspanning a number of different scientific fields. The systemincorporates machine learning, natural language processing (NLP) andtext mining methods modeled on a technique called literature-baseddiscovery (LBD).

Originally developed in the 1980s by information scientist DonSwanson, the LBD technique was designed to try to help researchershome in on data in studies that could be useful but otherwiseremained buried as secondary to the study's overall hypothesis.Swanson developed the technique after noticing how broad andfragmented scientific research had become.

"The fragmentation of science into specialities makes it likelythat there exist innumerable pairs of logically related, mutuallyisolated literatures," Swanson wrote in a study demonstrating thepotential of LBD back in 1988.

LBD originally arose as a painstaking manual process but in recentyears it has proven perfect for computerized appropriation, with 21stcentury technology allowing machines to help find connections orpatterns in different studies that humans would have never been ableto detect.

"For example, you may know that a cancer drug affects thebehaviour of a certain pathway, but with LION LBD, you may find thata drug developed for a totally different disease affects the samepathway," explains Korhonen, discussing the potential of the new AIsystem.

At this early stage, the LION LBD system is still relativelylimited. It can only produce connections between two keywords orconcepts, and has been initially programmed using just publiclyavailable PubMed abstracts. However, these limitations promise to improveswiftly as the researchers behind it are making theentire system open source and freely accessible.

The LION LBD system is currently accessible to all through a web portal and the entire software code and API is also free todevelopers keen to collaborate and improve it.

The system is described in a new paper published in the journalBioinformatics.

Source: University of Cambridge