Who’s the most influential biomedical scientist? Computer program guided by artificial intelligence says it knows

Eric Lander, president and founding director of the Broad Institute and a biologist at the Massachusetts Institute of Technology in Cambridge, is the most influential biomedical researcher of the modern era, according to a computer program. Lander, a geneticist and mathematician, ranks first on a new list of top biomedical researchers produced by the scientific literature search tool Semantic Scholar.

Semantic Scholar, launched in 2015, is an academic search engine aiming to tackle the problem of information overload. It uses artificial intelligence (AI) to help users sift through huge numbers of scientific papers and understand (to a limited extent) their content. The free tool was developed by the Allen Institute for Artificial Intelligence (AI2), a nonprofit based in Seattle, Washington, that was co-founded in 2014 by Microsoft Co-Founder Paul Allen.

Semantic Scholar’s archive of searchable literature initially focused on computer science, and last year expanded to include neuroscience. Today, it is expanding again, to include the millions of biomedical research papers indexed by PubMed and other sources; overall, Semantic Scholar’s archive is now approaching 40 million papers.

Last year, Semantic Scholar’s programmers also added functionality that allows it to measure the influence of researchers and organizations, based on what they call “highly influential citations”—which takes into account the context around citations, excluding any self-citations—and other information. In April 2016, the tool ranked computer scientists , and when its corpus was expanded to neuroscience in November 2016, it was also used to judge the most influential brain scientists. Now, Semantic Scholar is ranking biomedical researchers. Here’s the list of the top 10, provided to Science Insider:

Eric Lander, Massachusetts Institute of Technology (biology) Karl Friston, University College London (neuroscience) Raymond Dolan, University College London (neuroscience) Shizuo Akira, Osaka University (immunology) David Botstein, Calico (biology) Dennis Smith, Pfizer (pharmacokinetics) Eugene Koonin, National Center for Biotechnology Information (biology) Walter Willett, Harvard School of Public Health (epidemiology) Rudolf Jaenisch, Massachusetts Institute of Technology (genetics) Bert Vogelstein, Johns Hopkins Medical School (oncology)

(Friston and Dolan, neuroscientists who hold the second and third spots on the list, respectively, also held the top two positions on Semantic Scholar’s list of most influential neuroscientists.)

The absence of women on the list has drawn attention on social media, with some researchers wondering if the result reflected a bias in Semantic Scholar’s ranking algorithm, or is another expression of long-documented differences in gender representation in the biomedical sciences and scientific publishing.

In a statement, AI2’s Marie Hagman, a senior product manager who oversees Semantic Scholar, said: "I think the fact that there are no women in the Top 10 authors by the highly influential citation analysis done by AI2 is spotlighting the well-reported problem of publication bias in science and in the context of the current global conversation on gender. It's encouraging to see that people are paying more attention to this issue, as the all-male list last year didn't receive this kind of buzz."

Information overload

With scientific literature doubling roughly every 9 years, keeping up is becoming increasingly difficult, Hagman says. There’s “a ton of information trapped in these articles and we want to bring it to life,” she says. “We think there are potential cures or ways to improve or save human lives that may be buried away in a PDF somewhere.”

Semantic Scholar gets used on average a million times each month, Hagman says. Ultimately, she hopes that the tool can go even further in the content it extracts, perhaps by even suggesting hypotheses for researchers to test. And she envisions the tool pulling data and comparing similar experiments from different papers. “An automated meta-analysis is certainly something we believe is on the horizon,” Hagman says.

One limitation of the tool is that it can’t trawl paywalled papers. Hagman notes, however, that her group is negotiating with publishers for varying levels of access.

Many other academic search engines, such as Google Scholar and Microsoft Academic Search, already exist. And any of these search tools will do the job for those who are experts in a particular field and know what they are looking for, Hagman says. But for those exploring connections between different fields or looking into new areas, she believes no other tool provides the “discovery experience” offered by Semantic Scholar.

Randy Olson, an AI researcher at the University of Pennsylvania (UPenn), says Semantic Scholar is “far more useful” than Google Scholar. “Could Semantic Scholar’s AI piece together that a relatively unimportant discovery in one field is a groundbreaking solution to a major challenge in another field?” he asks. “Only time will tell, but I’m optimistic.”

But in the future, “general purpose search engines may become so advanced that there’s no need for academic engines,” notes Daniel Himmelstein, a data scientist at UPenn. “It’s going to be hard to beat search engines trained on decades of searches across the entire web at information retrieval.”

*Update, 19 October, 3:22 p.m.: This story has been updated to include a comment from AI2 on the lack of women in the top 10 list of influential biomedical researchers.

*Correction, 19 October, 3:47 p.m.: An earlier version of this story incorrectly stated that there was one woman on the top 10 list. There are none.