Two years after the breakup of the Soviet Union, British economist Paul Seabright was talking with a senior Russian official who was visiting the UK to learn about the free market. “Please understand that we are keen to move towards a market system,” the official said, “But we need to understand the fundamental details of how such a system works. Tell me, for example: who is in charge of the supply of bread to the population of London?” [1]

The familiar but still astonishing answer to this question is that in a market economy, everyone is in charge. As the market price of bread goes up and down, it informs our collective behaviour: whether to plant a new wheat field, or leave it fallow; whether to open that new bakery you’ve been thinking about opening on the corner; or simply whether to buy two or three loaves of bread this week. The price thus aggregates an enormous amount of what would otherwise be hidden knowledge from all the people interested in the production or consumption of bread, that is, nearly everyone. By using prices to aggregate this knowledge and inform further actions, the market produces outcomes superior to even the brightest and best informed individuals.

Unfortunately, markets don’t always aggregate knowledge accurately. When participants in a market are mistaken in systematic ways, markets don’t so much aggregate knowledge as they aggregate misunderstanding. The result can be an enormous collective error in judgement; when the misjudgement is revealed, the market crashes.

My subject in this essay is not economics, it’s science. So what’s all this got to do with science?

The connection involves the question of what it means to understand something. In economics, many basic facts, such as prices, have an origin which isn’t completely understood by any single person, no matter how bright or well informed, because none of those people have access to all the hidden knowledge that determines those prices.

By contrast, until quite recently the complete justification for even the most complex scientific facts could be understood by a single person.

Consider, for example, astronomer Edwin Hubble’s discovery in the 1920s of the expansion of the Universe. By the standards of the time, this was big science, requiring a complex web of sophisticated scientific ideas and equipment – an advanced telescope, spectroscopic equipment, and even Einstein’s special theory of relativity. To understand all those things in detail requires years of hard work, but a dedicated person like Hubble could master it all, and so in some sense he completely understood his own discovery of the expansion of the Universe.

Science is no longer so simple; many important scientific facts now have justifications that are beyond the comprehension of a single person.

For example, in 1983 mathematicians announced the solution of an important longstanding mathematical problem, the classification of the finite simple groups. The work on this mathematical proof extended between 1955 and 1983, and required approximately 500 journal articles by 100 mathematicians. Many minor gaps were subsequentely found in the proof, and at least one serious gap, now thought (by some) to be resolved; the resolution involved a two-volume, 1300-page supplement to the proof. Although mathematicians are working to simplify the proof, even the simplified proof is expected to be exceedingly complex, beyond the grasp of any single person.

The understanding of results from the Large Hadron Collider (LHC) will be similarly challenging, requiring a deep knowledge of elementary particle physics, many clever ideas in the engineering of the accelerator and the particle detectors, and complex algorithms and statistical techniques. No single person understands all of this, except in broad detail. If the discovery of the Higgs particle is announced next year, there won’t be any single person in the world who can say “I understand how we discovered this” in the same way Hubble understood how he discovered the expansion of the Universe. Instead, there will be a large group of people who collectively claim to understand all the separate pieces that go into the discovery, and how those pieces fit together.

Two clarifications are in order. First, when I say that these are examples of scientific facts beyond individual understanding, I’m not saying a single person can’t understand the meaning of the facts. Understanding what the Higgs particle is requires several years hard work, but there are many people in the world who’ve done this work and who have a solid grasp of what the Higgs is. I’m talking about a deeper type of understanding, the understanding that comes from understanding the justification of the facts.

Second, I don’t mean that to understand something you need to have mastered all the rote details. If we require that kind of mastery, then there’s no one person who understands the human genome, for certainly no-one has memorized the entire DNA sequence. But there are people who understand deeply all the techniques used to determine the human genome; all that is missing from their understanding is the rote work identifying all the DNA base pairs. The examples of the LHC and the classification of the finite simple groups go beyond this, for in both cases there are many distinct deep ideas involved, too many to be mastered by any single person.

Science as complex as the LHC and the classification of finite simple groups is a recent arrival on the historical scene. But there are two forces that will soon make science beyond individual understanding far more common.

The first of these forces is rapid internet-fueled growth in the number of large-scale scientific collaborations. In the short term, these collaborations will mostly just crowdsource rote work, as is being done, for example, by the galaxy classification project Galaxy Zoo, and so the results will pose no challenge to individual understanding. But as the collaborations get more sophisticated we can expect to see many more online collaborations that delegate large amounts of specialized work, building up to a whole whose details aren’t fully understood by any single person.

The second of these forces is the use of computers to do scientific work. A nascent example is the proof of the four-colour theorem in mathematics. A small group of mathematicians outlined a proof, but to complete the proof, they had to check a large number of cases of the theorem, more than they could check by hand. Instead, a computer was used to check those cases. This isn’t an instance of science beyond individual understanding, though, because mathematicians familiar with the proof feel the computer was simply doing rote work. But the people doing computational science are getting cleverer in how they use computers to make discoveries. Machine learning, data mining and artificial intellgience techniques are being used in increasingly sophisticated ways to produce real insights, not just rote work. As the techniques get better, the number of insights found will increase, and we can expect to see examples of science beyond individual understanding generated this way: “I don’t understand how this discovery was made, but my computer and I do together”.

More powerful than either of these forces will be their combination: large-scale computer-assisted collaboration. The discoveries from such collaboration may well not be understood by any single individual, or even by a group. Instead, it will reside inside a combination of the group and their networked computers.

Such scientific discoveries raise challenging issues. How do we know whether they’re right or wrong? The traditional process of peer review and the criterion of reproducibility work well when experiments are cheap, and one scientist can explain to another what was done. But they don’t work so well as experiments get more expensive, when no one person fully understands how an experiment was done, and when experiments and their analyses involve reams of data or ideas.

Might we one day find ourselves in a situation like in a free market where systematic misunderstandings can infect our collective conclusions? How can we be sure the results of large-scale collaborations or computing projects are reliable? Are there results from this kind of science that are already widely believed, maybe even influencing public policy, but are, in fact, wrong?

These questions bother me a lot. I believe wholeheartedly that new tools for online collaboration are going to change and improve how science is done. But such collaborations will be no good if we can’t assess the reliability of the results. And it would disastrous if erroneous results were to have a major impact on public policy. We’re in for a turbulent and interesting period as scientists think through what’s needed to arrive at reliable scientific conclusions in the age of big collaborations.

Acknowledgements

Thanks to Jen Dodd for providing feedback that greatly improved an early draft of this essay. The essay was stimulated in part by the discussion during Kevin Kelly’s session at Science Foo Camp 2008. Thanks to all the participants in that discussion.

Further reading

This essay is adapted from a book I’m currently working on about “The Future of Science”. The basic thesis is described here, and there’s an extract here. If you’d like to be notified when the book is available, please send a blank email to the.future.of.science@gmail.com with the subject “subscribe book”. You’ll be emailed to let you know when the book is to be published; your email address will not be used for any other purpose.

Subscribe to my blog here.

You may enjoy some of my other essays.



Footnote

[1] “Who is in charge of the supply of bread to the population of London?” – see Paul Seabright’s The Company of Strangers .