5th November 2015

First AI-based scientific search engine will accelerate research process

A new search engine – Semantic Scholar – uses artificial intelligence to transform the research process for computer scientists.

The Allen Institute for Artificial Intelligence (AI2) this week launches its free Semantic Scholar service, which allows scientific researchers to quickly cull through the millions of scientific papers published each year to find those most relevant to their work. Leveraging AI2's expertise in data mining, natural-language processing and computer vision, Semantic Scholar provides an AI-enhanced way to quickly search and discover information. At launch, the system searches over three million computer science papers, and will add new scientific categories on an ongoing basis.

"No one can keep up with the explosive growth of scientific literature," said Dr. Oren Etzioni, CEO at AI2. "Which papers are most relevant? Which are considered the highest quality? Is anyone else working on this specific or related problem? Now, researchers can begin to answer these questions in seconds, speeding research and solving big problems faster."

With Semantic Scholar, computer scientists can:

• Home in quickly on what they are looking for, with advanced selection tools. Researchers can filter results by author, publication, topic, and date published. This gets the most relevant result in the fastest way possible, and reduces information overload.

• Instantly access a paper's figures and tables. Unique among scholarly search engines, this feature pulls out the graphic results, which are often what a researcher is really looking for.

• Jump to cited papers and references and see how many researchers have cited each paper, a good way to determine citation influence and usefulness.

• Be prompted with key phrases within each paper, to winnow the search further.

Using machine reading and vision methods, Semantic Scholar crawls the web – finding all PDFs of publicly available papers on computer science topics – extracting both text and diagrams/captions, and indexing it all for future contextual retrieval. Using natural language processing, the system identifies the top papers, extracts filtering information and topics, and sorts by what type of study and how influential its citations are. It provides the scientist with a simple user interface (optimised for mobile) that maps to academic researchers' expectations. Filters such as topic, date of publication, author and where published are built in. It includes smart, contextual recommendations for further keyword filtering as well. Together, these search and discovery tools provide researchers with a quick way to separate wheat from chaff, and to find relevant papers in areas and topics that previously might not have occurred to them.

Only a small number of free academic search engines are currently in widespread use. Google Scholar is by far the largest, with 100 million documents. However, researchers have noted problems with the current generation of these search engines.

"A significant proportion of the documents are not scholarly by anyone's measure," says Péter Jacsó, an information scientist at the University of Hawaii who identified a series of basic errors in search results from Google Scholar. While some of the issues have recently been fixed, says Jacsó, "there are still millions and millions of errors."

"Google has access to a lot of data. But there's still a step forward that needs to be taken in understanding the content of the paper," says Jose Manuel Gomez-Perez, who works on search engines and is director of research and development in Madrid for the software company Expert System.

Semantic Scholar builds on the foundation of current research paper search engines, adding AI methods to overcome information overload and paving the way for even more advanced and intelligent algorithms in the future.

"What if a cure for an intractable cancer is hidden within the tedious reports on thousands of clinical studies? In 20 years' time, AI will be able to read – and more importantly, understand – scientific text," says Etzioni. "These AI readers will be able to connect the dots between disparate studies to identify novel hypotheses and to suggest experiments which would otherwise be missed. AI-based discovery engines will help find the answers to science's thorniest problems."

Comments »