The Virtual Scientist. This idea has been inspired by writers such as H.G. Wells (with the World Brain) and J.L. Borges (with The Library of Babel, for example), who envisioned a future where every published information is readily available at the tip of a finger. And it really changes what we are as a species. Suddenly many applications open up for us. One of them being the creation of a Virtual Scientist.

Now, what exactly is a Virtual Scientist ? A Virtual Scientist would be a digital entity – an AI – that would behave like a researcher and help scientists do their job faster. It would be a personal assistant that would scan all the scientific literature in an instant to find answers to your questions.

This essay will focus on three things basically: defining what a Virtual Scientist would be, how it would “behave” and what it would be able to accomplish; looking at current approaches toward building a Virtual Scientist; and finally, explaining how I’d create a Virtual Scientist myself.

So, how would it work? A Virtual Scientist would be capable of creating hypothesis with a set of data, to answer any question in natural language about anything (anything that is fact-checkable at least, don’t try to ask what is the meaning of life). Thanks to the access to a Universal Library that the Virtual Scientist would have, it would be able to understand the written informations contained in it. And, benefiting from its silicon power, it would mine facts to construct an answer at lightning speed. You can expect it to answer questions such as “what genes are likely responsible for breast cancer?” by looking at the medical papers about breast cancer and genetics. You can expect it to help you build a bibliography, to interpret sentences contained in written works to help you find the best references about a very precise subject you’re researching for example. You would ask a question, and get your answer. It would an all knowing AI, a kind of Digital Aristotle that could help people learn, and research.

The Virtual Scientist would allow research to be done faster. It would help science. With an access to the Universal Library, you wouldn’t miss any paper relevant to your research.

You could also program the Virtual Scientist to create new hypothesis when you give it a gigantic data set. Computers are already capable of creating answers to mathematical problems that humans would be incapable to come up with. The Virtual Scientist could use its existing knowledge to create new hypothesis and help you process big data.

Daniel Dewey summed this up quite nicely: “Think about how long it took humans to arrive at the idea of natural selection. The ancient Greeks had everything they needed to figure it out. They had heritability, limited ressources, reproduction and death. But it took thousands of years for someone to put it together. If you had a machine that was designed specifically to make inferences about the world, instead of a machine like the human brain, you could make discoveries like that much faster”

Today two types of approaches are working toward this super AI, the biological and the computerized one.

The Computerized approach involves initiatives such as the Paul Allen Institute’s Project Aristo, IBM’s Watson, and Google’s “Star Trek Computer”.

Paul Allen plans to create a Digital Aristotle with his Project Aristo (aka: Project Halo): “What if you could collect all the world’s information in a single computer mind, one capable of intelligent thought, and be able to communicate in simple human language?” said an article by The Verge earlier this year about the project. For now, the project focuses on training an AI to succeed at a high school biology test by giving it the course’s material and asking it questions about it. They estimate it will take them five years to have an AI that can pass the grade in biology. It’s a pretty long time, but that would be a breakthrough. They would have coded a computer that can effectively learn from a textbook, that can “understand”, so to speak, a biology course. The future for such a program could well be a Virtual Scientist. It would “just” have to learn more courses, to “read” more books. And when Paul Allen is talking about a Digital Aristotle, he means a digital erudite that could teach, research, and would be all knowing, like Aristotle was in his time. No human being can now be all knowing, we just have too many knowledge for one (wo)man to absorb, but at the time, Aristotle probably mastered most of what was known to humankind. Paul Allen could be the one to create the first software to come close to this.

IBM’s Watson on the other hand is a bit more primitive and much more business-oriented (IBM already uses Watson to crunch Big Data for Financial and Medical institutions). However, Watson won Jeopardy against top players. That wasn’t true intelligence, but that is not what they were looking for. Watson was able, most of the time, to interpret the meaning of the Jeopardy game’s questions, and it looked for answers in its database made of mostly web content (literally scanning millions of pages ranging from Wikipedia to the Constitution to find the most “probable answer”). Watson is all about probabilities. But as far as I know, it was not able to search through most of the literature, through say the millions of books Google has digitized with its Google Books project, or through academic papers from Elsevier for example. Watson winning Jeopardy was an amazing achievement, and it proved that an AI can find answers to very complex questions if it is designed to do so. Watson still has some issues, but I think if it had access to the Universal Library, Watson could be able to achieve much more (and I truly hope IBM is working with the major digital libraries around the world to make that happen).

IBM is pushing people and developers to actively use Watson and to find new uses for it. Here is the opportunity to let it become even smarter by giving it access to a Universal Library and to train it to understand scientific literature!

Google’s on a quest to build a Star Trek Computer. It wants its search engine to answer any question that humankind has found an answer for. With its enormous array of services, Google has what it takes to build the Star Trek Computer. Google Books for example. When you are looking for an answer, books are the first place where you can find answers to your questions, after the internet. And Google does business in these two fields. The Star Trek Computer isn’t very far: as Google fine tunes its algorithms, and as its computers dig deeper and deeper in the Google Books texts, we can expect answers from Google Search becoming more and more precise. Google now has what it calls a Knowledge Graph, which allows its search engine to “understand” concepts, like “a lion is an animal” and such. The Knowledge Graph allows Google Search to give more direct answers. For example when you type “CEO of Facebook” the first answer will be a white box with “Mark Zuckerberg” written in it along with a picture. Google also develops natural language processing for its search engine, so that it actually understands what you mean by “CEO” , “of”, and “Facebook”. Ray Kurzweil is the one developing this at Google.

Without the Knowledge Graph, Google would have simply be able to give you the best link where you could find your answer, for example an article from a newspaper or a Wikipedia article, but you wouldn’t have your answer directly.

Also, with Google Scholar, Google could mine information from peer reviewed papers and Google Search could be a good candidate to transform itself into a Virtual Scientist. I’m sure we can expect to see their Star Trek Computer taking off in a few years!

The biological approach on the other hand tries to understand how the brain works, and how to recreate it in code. The biological approach uses connectomes – wiring diagrams of the brain – to understand what intelligence is. Neuroscientists hope that connectomes will help us understand what makes us intelligent but also at what level other species (like chimps, whales, octopuses, etc) are intelligent too.

The Human Brain Project in Switzerland tries to create in less than ten years a simulation of the brain down to the molecular level, a goal it is very unlikely (to say the least) to achieve, even with funding from the European Union (a billion euros over ten years) and from private partners, and with the help of some top notch scientists. The main questions (if this controversial project were to succeed) are: how would a simulation of the human brain behave? Would it be conscious? Could it communicate? Could it learn new things? If yes, it could also be capable of reasoning like a scientist, and could benefit from the amazing capabilities of the human brain to interpret things, but also from the capabilities of silicon to crunch enormous data sets. If a simulated brain is realizable (and I believe it is), and if it behaves like a real brain, The Human Brain Project could also lead to the creation of a Virtual Scientist, but it would raise big philosophical concerns about what a Virtual Scientist is. Because of course, if a simulated brain can have critical thinking, do we have the right to turn it off? Is it a human being? Or a non-human person, at least?

Other projects are simply mapping the brain, and not only the human brain.

The Human Connectome Project, backed by the Obama administration, aims at mapping the human brain down to the individual synapses. It aims at becoming the most precise atlas of the human brain on Earth. But is intelligence explainable only by the way our neurons are connected? I hope so. I think what we need to understand intelligence is a lot of connectomes from a lot of different subjects. Like for genetics, we need a broad set of subjects to recognize patterns. Every connectome will be different, as every genome is different, but we will hopefully be able to distinguish common patterns for the way the human brain is wired, and that could explain why we are intelligent: because we’ll all have certain circuits in common. We could even find out which circuits make some people smarter than others.

It would also be amazing to have connectomes for many other species on Earth (and not just for AI research, but because it would be a great way to understand the very complex machine the brain is). Chimps, whales (I’m really looking toward understanding cetaceans’ brains), but also animals like octopuses which are extremely intelligent and whose brain varies so greatly from the kind of ones we, mammals, have (Wired wrote a great piece on the quest to understand the intelligence of the cephalopods recently). Some projects, like connectomes.org, openconnectomeproject.org, already propose open data on the partial connectomes of the mouse and of the human. You can even find the complete connectome of C. elegans, a very simple round worm used frequently in biology (it is the only animal for which we have a full connectome, but it only has 302 neurons – the human brain is made of about 85 000 000 000 neurons). Understanding the brain of every species on this planet would be a tremendous achievement for modern science. An international collaboration would be needed, an entire network of neuroscientists, technicians, working toward building a global database of connectomes for every species on Earth. Such a database would be awesome to play with.

Whatever approach we use, trying to mimic the brain with code will be extremely tough. David Deutsch wrote a good article in Aeon Magazine about the challenges the AI field faces. He wrote: “What is needed is nothing less than a breakthrough in philosophy, a new epistemological theory that explains how brains create explanatory knowledge and hence defines, in principle, without ever running them as programs, which algorithms possess that functionality and which do not”. In other words, we don’t know how the brain “computes” new knowledge, and without knowing how it does that (metaphysically, not biologically), we can’t expect to simulate the brain in software.

The actual problem with computers and AIs is that they are not programmed to get the concept of absurdity. For example, a typical AI doesn’t get that saying “green” when the question is “How many varieties of kiwis exist?” is absurd because it is a color and it isn’t related to the topic of the question. It won’t seem absurd to the AI because the AI doesn’t “understand” what it is talking about. Absurdity is our ability to eliminate false answers by analyzing what we know. If I read a book on kiwis (the fruit) and if I understand it, I can answer a question about kiwis’ diversity without saying absurd things, because the knowledge I gained by reading the book is clearly understood by me. The concept of absurdity is proper to mankind, and comes from the understanding of some kind of notion.

If I had the chance to work on this AI, this Digital Aristotle, I would probably adopt the Google way, using machine learning to teach the computer the meaning of words and sentences. For me, the Virtual Scientist is impossible if the computer running the program can’t access a Universal Library. You can’t pretend to have an all knowing AI if it can’t look through every books ever written by mankind to find some information. That’s why Google Books would probably be the backbone of what I would build (given that Google lets individual developers access the APIs to dig through Google Books’ database).

I would also train the AI starting with simple text studies like we do in elementary school. With very simple texts and questions asking the AI to sort informations from the text and to deduce its general meaning. Then I would try with more complicated texts, until it reaches a state where it can deduce the meaning of basically any kind of text. Of course Google has a totally different approach since they have much more data and business constraints to deal with, but their technique seems to work well so far, as Google Search is getting better and better. I think they will be the first to build something really significant.

We probably have all the computing power we need to make the Virtual Scientist a reality. I truly hope Daniel Dewey and I will be around to see it in action.