Contemplating The Flood leads us to some interesting conclusions.

No one person can link together the existing web of literature about a single topic. And that web may never stop growing. It’s at the point where a single author of 72 papers a year cannot even link together their own papers.

Which means there is no single person who will know “the answer” to a given complex problem. So “the answer” can come in one of two guises.

The first is a computational model of the problem. We see this epitomised in climate research. Huge computer models of the Earth’s climate synthesise an extraordinary amount of data, and bring together a vast array of individual bits of knowledge — about cycles of glacial melt and run-off, of reflections and trapping of sunlight, of feedback between carbon dioxide, temperature, and foliage, to name but a few. The model itself becomes the culmination of a vast research enterprise.

Running those models gives us answers to complex problems: they might answer the questions of how temperature depends on carbon dioxide, of where those temperature changes will hit hardest and first, and of what changes to the Earth’s environment will stop those temperature changes from careening out of control. And building these models tell us what we don’t know, of where we need to fill in the gaps to make the models better, leaner, smarter.

The second guise is an AI. Groan. But the idea that an AI will someday “know” the answer should come as no surprise. After all, we already use machine-learning extensively to make sense of data-sets that are too big for one person to comb through. And what bigger data-set is there than the collective scientific knowledge of humanity? (Answer: Lego’s database of all possible permutations of small plastic bricks that make money).

It’s already happening. Do you not use Google or PubMed to search the literature? People are already using machine-learning to do systematic literature searches (Iris.ai, Semantic Scholar), to find links between research findings, and show them in a comprehensible form. These are fancy classifiers, learning to group together published work and data-sets by key words. Developments in language processing are beginning to let the machines link findings together to suggest hypotheses, like linking gene expression changes to mental disorders. The next step after that will be to have the machines write the literature reviews, synthesising existing knowledge into a form we mere mortals can understand, and pointing out to us what we don’t know.

Building a machine to tell us what we don’t know is exactly what Jessica and Bradley Voytek did. Scraping 3.5 million abstracts from PubMed, and linking them by key-words for brain regions, disorders, and cognitive functions, they built a model of neuroscientific knowledge. This model naturally has a hierarchy: “cortex”, “thalamus”, and “striatum” are all children of “brain”, for example. Which opened up a simple but effective hypothesis generator: find two concepts that share a parent, but have not been linked together in the existing literature. That pair of concepts are then candidates for linking together. (One can imagine generalising this even to a flat web of links, by seeking a pair of concepts that are each strongly associated with a third concept, but not (yet) strongly associated with each other.) Here is a dumb machine that already gives answers no human could possibly find on their own.

And as The Flood grows then, even in narrow disciplines, the machines will be the only ones who know all the links, the only ones that can put together the big picture. So even if we never develop a true AI that can by itself infer new hypotheses and create new ideas, even if that sci-fi scientist is off the table, we will still become dependent on dumb AI for the “answers”.