Building a better collective memory

In your High School science classes you may have learnt Hooke’s law, the law of physics which relates a spring’s length to how hard you pull on it. What your High School science teacher probably didn’t tell you is that when Robert Hooke discovered his law in 1676, he published it as an anagram, “ceiiinossssttuv”, which he revealed two years later as the Latin “ut tensio, sic vis”, meaning “as the extension, so the force”. This ensured that if someone else made the same discovery, Hooke could reveal the anagram and claim priority, thus buying time in which he alone could build upon the discovery.

Hooke was not unusual. Many great scientists of the age, including Leonardo, Galileo and Huygens, used anagrams or ciphers for similar purposes. The Newton-Leibniz controversy over who invented calculus occurred because Newton claimed to have invented calculus in the 1660s and 1670s, but didn’t publish until 1693. In the meantime, Leibniz developed and published his own version of calculus. Imagine modern biology if the human genome had been announced as an anagram, or if publication had been delayed thirty years.

Why were Hooke, Newton, and their contemporaries so secretive? In fact, up until this time discoveries were routinely kept secret. Alchemists intent on converting lead into gold or finding the secret of eternal youth would often take their discoveries with them to their graves. A secretive culture of discovery was a natural consequence of a society in which there was often little personal gain in sharing discoveries.

The great scientific advances in the time of Hooke and Newton motivated wealthy patrons such as the government to begin subsidizing science as a profession. Much of the motivation came from the public benefit delivered by scientific discovery, and that benefit was strongest if discoveries were shared. The result was a scientific culture which to this day rewards the sharing of discoveries with jobs and prestige for the discoverer.

This cultural transition was just beginning in the time of Hooke and Newton, but a little over a century later the great physicist Michael Faraday could advise a younger colleague to “Work. Finish. Publish.” The culture of science had changed so that a discovery not published in a scientific journal was not truly complete. Today, when a scientist applies for a job, the most important part of the application is their published scientific papers. But in 1662, when Hooke applied for the job of Curator of Experiments at the Royal Society, he certainly was not asked for such a record, because the first scientific journals weren’t created until three years later, in 1665.

The adoption and growth of the scientific journal system has created a body of shared knowledge for our civilization, a collective long-term memory which is the basis for much of human progress. This system has changed surprisingly little in the last 300 years. The internet offers us the first major opportunity to improve this collective long-term memory, and to create a collective short-term working memory, a conversational commons for the rapid collaborative development of ideas. The process of scientific discovery – how we do science – will change more over the next 20 years than in the past 300 years.

This change will not be achieved without great effort. From the outside, scientists currently appear puzzlingly slow to adopt many online tools. We’ll see that this is a consequence of some major barriers deeply embedded within the culture of science. The first part of this essay is about these barriers, and how to overcome them. The second part of the essay illustrates these ideas, with a proposal for an online collaboration market where scientists can rapidly outsource scientific problems.

Part I: Toward a more open scientific culture

How can the internet benefit science?

How can the internet improve the way we do science? There are two useful ways to answer this question. The first is to view online tools as a way of expanding the range of scientific knowledge that can be shared with the world:







Many online tools do just this, and some have had a major impact on how scientists work. Two successful examples are the physics preprint arXiv, which lets physicists share preprints of their papers without the months-long delay typical of a conventional journal, and GenBank, an online database where biologists can deposit and search for DNA sequences. But most online tools of this type remain niche applications, often despite the fact that many scientists believe broad adoption would be valuable. Two examples are the Journal of Visualized Experiments, which lets scientists upload videos which show how their experiments work, and open notebook science, as practiced by scientists like Jean-Claude Bradley and Garrett Lisi, who expose their working notes to the world. In the coming years we’ll see a proliferation of tools of this type, each geared to sharing different types of knowledge:







There is a second and more radical way of thinking about how the internet can change science, and that is through a change to the process and scale of creative collaboration itself, a change enabled by social software such as wikis, online forums, and their descendants.

There are already many well-known but still striking instances of this change in parts of culture outside of science [1]. For example, in 1991 an unknown Finnish student named Linus Torvalds posted a short note in an online forum, asking for help extending a toy operating system he’d programmed in his spare time; a volunteer army responded by assembling Linux, one of the most complex engineering artifacts ever constructed. In 2001 another young unknown named Larry Sanger posted a short note asking for help building an online Encyclopedia; a volunteer army responded by assembling the world’s most comprehensive Encyclopedia. In 1999, Garry Kasparov, the greatest chessplayer of all time, played and eventually won a game of chess against a “World Team” which decided its moves by the votes of thousands of chessplayers, many rank amateurs; instead of the easy victory he expected, he got the most challenging game of his career, a game he called “the greatest game in the history of chess”.

These examples are not curiosities, or special cases; they are just the leading edge of the greatest change in the creative process since the invention of writing.

Science is an example par excellence of creative collaboration, yet scientific collaboration still takes place mainly via face-to-face meetings. With the exception of email, few of the new social tools have been broadly adopted by scientists, even though it is these tools which have the greatest potential to improve how science is done.

Why have scientists been so slow to adopt these remarkable tools? Is it simply that they are too conservative in their habits, or that the new tools are no better than what we already have? Both these glib answers are wrong. We’ll resolve this puzzle by looking in detail at two examples where excellent online tools have failed to be adopted by scientists. What we’ll find is that there are major cultural barriers which are preventing scientists from getting involved, and so slowing down the progress of science.

A failure of science online: online comment sites

Like many people, when I’m considering buying a book or electronic gadget, I often first browse the reviews at amazon.com. Inspired by the success of amazon.com and similar sites, several organizations have created comment sites where scientists can share their opinions of scientific papers. Perhaps the best-known was Nature’s 2006 trial of open commentary on papers undergoing peer review at Nature. The trial was not a success. Nature’s final report terminating the trial explained:

There was a significant level of expressed interest in open peer review… A small majority of those authors who did participate received comments, but typically very few, despite significant web traffic. Most comments were not technically substantive. Feedback suggests that there is a marked reluctance among researchers to offer open comments.

The Nature trial is just one of many attempts at comment sites for scientists. The earliest example I’m aware of is the Quick Reviews site, built in 1997, and discontinued in 1998. Physics Comments was built a few years later, and discontinued in 2006. A more recent site, Science Advisor, is still active, but has more members (1139) than reviews (1008). It seems that people want to read reviews of scientific papers, but not write them [2].

The problem all these sites have is that while thoughtful commentary on scientific papers is certainly useful for other scientists, there are few incentives for people to write such comments. Why write a comment when you could be doing something more “useful”, like writing a paper or a grant? Furthermore, if you publicly criticize someone’s paper, there’s a chance that that person may be an anonymous referee in a position to scuttle your next paper or grant application.

To grasp the mindset here, you need to understand the monklike intensity that ambitious young scientists bring to the pursuit of scientific publications and grants. To get a position at a major University the most important thing is an impressive record of scientific papers. These papers will bring in the research grants and letters of recommendation necessary to be hired. Competition for positions is so fierce that 80 hour plus work weeks are common. The pace relaxes after tenure, but continued grant support still requires a strong work ethic. It’s no wonder people have little inclination to contribute to the online comment sites.

The contrast between the science comment sites and the success of the amazon.com reviews is stark. To pick just one example, you’ll find

approximately 1500 reviews of Pokemon products at amazon.com, more than the total number of reviews on all the scientific comment sites I described above. The disincentives facing scientists have led to a ludicrous situation where popular culture is open enough that people feel comfortable writing Pokemon reviews, yet scientific culture is so closed that people will not publicly share their opinions of scientific papers. Some people find this contrast curious or amusing; I believe it signifies something seriously amiss with science, something we need to understand and change.

A failure of science online: Wikipedia

Wikipedia is a second example where scientists have missed an opportunity to innovate online. Wikipedia has a vision statement to warm a scientist’s heart: “Imagine a world in which every single human being can freely share in the sum of all knowledge. That’s our commitment.” You might guess Wikipedia was started by scientists eager to collect all of human knowledge into a single source. In fact, Wikipedia’s founder, Jimmy Wales, had a background in finance and as a web developer for an “erotic search engine”, not in science. In the early days few established scientists were involved. Just as for the scientific comment sites, to contribute aroused suspicion from colleagues that you were wasting time that could be spent writing papers and grants.

Some scientists will object that contributing to Wikipedia isn’t really science. And, of course, it’s not if you take a narrow view of what science is, if you’ve bought into the current game, and take it for granted that science is only about publishing in specialized scientific journals. But if you take a broader view, if you believe science is about discovering how the world works, and sharing that understanding with the rest of humanity, then the lack of early scientific support for Wikipedia looks like an opportunity lost. Nowadays, Wikipedia’s success has to some extent legitimized contribution within the scientific community. But how strange that the modern day Library of Alexandria had to come from outside academia.

The challenge: achieving extreme openness in science

These failures of science online are all examples where scientists show a surprising reluctance to share knowledge that could be useful to others. This is ironic, for the value of cultural openness was understood centuries ago by many of the founders of modern science; indeed, the journal system is perhaps the most open system for the transmission of knowledge that could be built with 17th century media. The adoption of the journal system was achieved by subsidizing scientists who published their discoveries in journals. This same subsidy now inhibits the adoption of more effective technologies, because it continues to incentivize scientists to share their work in conventional journals, and not in more modern media.

The situation is analogous to the government subsidies for corn-based ethanol in the United States. In the early days these seemed to many people to be a good idea, encouraging the use of what people hoped would be a more efficient fuel. But now we understand that there are more energy-efficient alternatives, such as grass-based cellulose ethanol. Unfortunately, the subsidies for corn-based ethanol are still in place, and now inhibit the adoption of the more efficient technologies.

We should aim to create an open scientific culture where as much information as possible is moved out of people’s heads and labs, onto the network, and into tools which can help us structure and filter the information. This means everything – data, scientific opinions, questions, ideas, folk knowledge, workflows, and everything else – the works. Information not on the network can’t do any good.

Ideally, we’ll achieve a kind of extreme openness. This means: making many more types of content available than just scientific papers; allowing creative reuse and modification of existing work through more open licensing and community norms; making all information not just human readable but also machine readable; providing open APIs to enable the building of additional services on top of the scientific literature, and possibly even multiple layers of increasingly powerful services. Such extreme openness is the ultimate expression of the idea that others may build upon and extend the work of individual scientists in ways they themselves would never have conceived.

The challenge of achieving a more open culture is also being confronted in popular culture. People such as Richard Stallman, Lawrence Lessig, Yochai Benkler, Cory Doctorow, and many others have described the benefits openness brings in a networked world, and developed tools such as Creative Commons licensing and free and open source software to help promote a more open culture, and fight the forces inhibiting it. As we have seen, however, science faces a unique set of forces that inhibit open culture – the centuries-old subsidy of old ways of sharing knowledge – and this requires a new understanding of how to overcome those forces.

How can we open up scientific culture?

To create an open scientific culture that embraces new online tools, two challenging tasks must be achieved: (1) build superb online tools; and (2) cause the cultural changes necessary for those tools to be accepted. The necessity of accomplishing both these tasks is obvious, yet projects in online science often focus mostly on building tools, with cultural change an afterthought. This is a mistake, for the tools are only part of the overall picture. It took just a few years for the first scientific journals (a tool) to be developed, but many decades of cultural change before journal publication was accepted as the gold standard for judging scientific contributions.

None of this is to discount the challenge of building superb online tools. To develop such tools requires a rare combination of strong design and technical skills, and a deep understanding of how science works. The difficulty is compounded because the people who best understand how science works are scientists themselves, yet building such tools is not something scientists are typically encouraged or well suited to do. Scientific institutions reward scientists for making discoveries within the existing system of discovery; there is little place for people working to change that system. A technologically-challenged Head of Department is unlikely to look kindly on a scientist who suggests that instead of writing papers they’d like to spend their research time developing general-purpose tools to improve how science is done.

What about the second task, achieving cultural change? As any revolutionary can attest, that’s a tough order. Let me describe two strategies that have been successful in the past, and that offer a template for future success.

The first is a top-down strategy that has been successfully used by the open access (OA) movement [3]. The goal of the OA movement is to make scientific research freely available online to everyone in the world. It’s an inspiring goal, and the OA movement has achieved some amazing successes. Perhaps most notably, in April 2008 the US National Institutes of Health (NIH) mandated that every paper written with the support of their grants must eventually be made open access. The NIH is the world’s largest grant agency; this decision is the scientific equivalent of successfully storming the Bastille.

The second strategy is bottom-up. It is for the people building the new online tools to also develop and boldly evangelize ways of measuring the contributions made with the tools. To understand what this means, imagine you’re a scientist sitting on a hiring committee that’s deciding whether or not to hire some scientist. Their curriculum vitae reports that they’ve helped build an open science wiki, and also write a blog. Unfortunately, the committee has no easy way of understanding the significance of these contributions, since as yet there are no broadly accepted metrics for assessing such contributions. The natural consequence is that such contributions are typically undervalued.

To make the challenge concrete, ask yourself what it would take for a description of the contribution made through blogging to be reported by a scientist on their curriculum vitae. How could you measure the different sorts of contributions a scientist can make on a blog – outreach, education, and research? These are not easy questions to answer. Yet they must be answered before scientific blogging will be accepted as a valuable professional scientific contribution.

A success story: the arXiv and SPIRES

Let’s look at an example illustrating the bottom-up strategy in action. The example is the well-known physics preprint arXiv. Since 1991 physicists have been uploading their papers to the arXiv, often at about the same time as they submit to a journal. The papers are made available within hours for anyone to read. The arXiv is not refereed, although a quick check is done by arXiv moderators to remove crank submissions. The arXiv is an excellent and widely-used tool, with more than half of all new papers in physics appearing there first. Many physicists start their day by seeing what’s appeared on the arXiv overnight. Thus, the arXiv exemplifies the first step for achieving a more open culture: it is a superb tool.

Not long after the arXiv began, a citation tracking service called SPIRES decided they would extend their service to include both arXiv papers and conventional journal articles. SPIRES specializes in particle physics, and as a result it’s now possible to search on a particle physicist’s name (example), and see how frequently all their papers, including arXiv preprints, have been cited by other physicists.

SPIRES has been run since 1974 by one of the most respected and highly visible institutions in particle physics, the Stanford Linear Accelerator Center (SLAC). The effort SLAC has put into developing SPIRES means that their metrics of citation impact are both credible and widely used by the particle physics community. It’s now possible for a particle physicist to convincingly demonstrate that their work is having a high impact, even if it has only been submitted to the arXiv, and has not been published in a conventional scientific journal. When physics hiring committees meet to evaluate candidates in particle physics, people often have their laptops out, examining and comparing the SPIRES citation records of candidates.

The arXiv and SPIRES have not stopped particle physicists from publishing in peer-reviewed journals. When you’re applying for jobs, or up for tenure, every ounce of ammunition helps, especially when the evaluating committee may contain someone from another field who is reluctant to take the SPIRES citation data seriously. Still, particle physicists have become noticeably more relaxed about publication, and it’s not uncommon to see a CV which includes preprints that haven’t been published in conventional journals. This is an example of the sort of cultural change that can be achieved using the bottom-up strategy. In the next part, we’ll see how far these ideas can be pushed in pursuit of new tools for collaboration.

Part II: Collaboration Markets: building a collective working memory for science

The problem of collaboration

Even Albert Einstein needed help occasionally. Einstein’s greatest contribution to science was his theory of gravity, often called the general theory of relativity. He worked on and off on this theory between 1907 and 1915, often running into great difficulties. By 1912, he had come to the astonishing conclusion that our ordinary conception of geometry, in which the angles of a triangle add up to 180 degrees, is only approximately correct, and a new kind of geometry is needed to correctly describe space and time. This was a great surprise to Einstein, and also a great challenge, since such geometric ideas were outside his expertise. Fortunately for Einstein and for posterity, he described his difficulties to a mathematician friend, Marcel Grossman. Grossman said that many of the ideas Einstein needed had already been developed by the mathematician Bernhard Riemann. It took Einstein three more years of work, but Grossman was right, and this was a critical point in the development of general relativity.

Einstein’s conundrum is familiar to any scientist. When doing research, subproblems constantly arise in unexpected areas. No-one can be expert in all those areas. Most of us instead stumble along, picking up the skills necessary to make progress towards our larger goals, grateful when the zeitgeist of our research occasionally throws up a subproblem in which we are already truly expert. Like Einstein, we have a small group of trusted collaborators with whom we exchange questions and ideas when we are stuck. Unfortunately, most of the time even our collaborators aren’t that much help. They may point us in the right direction, but rarely do they have exactly the expertise we need. Is it possible to scale up this conversational model, and build an online collaboration market [4] to exchange questions and ideas, a sort of collective working memory for the scientific community?

It is natural to be skeptical of this idea, but an extremely demanding creative culture already exists which shows that such a collaboration market is feasible – the culture of free and open source software. Scientists browsing for the first time through the development forums of open source programming projects are often shocked at the high level of the discussion. They expect amateur hour at the local Karaoke bar; instead, they find professional programmers routinely sharing their questions and ideas, helping solve each other’s problems, often exerting great intellectual effort and ingenuity. Rather than hoarding their questions and ideas, as scientists do for fear of being scooped, the programmers revel in swapping them. Some of the world’s best programmers hang out in these forums, swapping tips, answering questions, and participating in the conversation.

Innocentive

I’ll now describe two embryonic examples which suggest that collaboration markets for science may be valuable. The first is Innocentive, a service that allows companies like Eli Lilly and Proctor and Gamble to pose Challenges over the internet, scientific research problems with associated prizes for their solution, often many thousands of dollars. For example, one of the Challenges currently on Innocentive asks participants to find a biomarker for motor neuron disease, with a one million dollar prize. If you register for the site, it’s possible to obtain a detailed description of the Challenge requirements, and attempt to win the prize. More than 140,000 people from 175 countries have registered, and prizes for more than 100 Challenges have been awarded.

Innocentive is an example of how a market in scientific problems and solutions can be established. Of course, it has shortcomings as a model for collaboration in basic research. Only a small number of companies are able to pose Challenges, and they may do so only after a lengthy vetting process. Innocentive’s business model is aimed firmly at industrial rather than basic research, and so the incentives revolve around money and intellectual property, rather than reputation and citation. It’s certainly not a rapid-fire conversational tool like the programming forums; one does not wake up in the morning with a problem in mind, and post it to Innocentive, hoping for help with a quick solution.

FriendFeed

FriendFeed is a much more fluid tool which is being used by scientists as a conversational medium to discuss scientific research problems. What FriendFeed allows users to do is set up what’s called a lifestream. As an example, my lifestream is set up to automatically aggregate pretty much everything I put on the web, including my blog posts, del.icio.us links, YouTube videos, and several other types of content:







I also subscribe to a list of about one hundred or so “friends” (a few are listed on the right in the screenshot above) whose lifestreams I can see aggregated into one giant river of information – all their Flickr photos, blog posts, and so on. These people aren’t necessarily real friends – I’m not personally acquainted with my “friend” Barack Obama – but it’s a fantastic way of tracking a high volume of activity from a large number of people.

As part of the lifestream, FriendFeed allows messages to be passed back and forth in a lightweight way, so communities can form around common interests and shared friendships. In April 2008, Cameron Neylon, a chemist from the University of Southampton, used FriendFeed messaging to post a request for assistance in building molecular models. Pretty quickly Pawel Szczesny replied, and said he could help out. A scientific collaboration was now underway. The original request and discussion is shown here:







FriendFeed is a great service, but it suffers from many of the same problems that afflict the comment sites and Wikipedia. Lacking widely accepted metrics to measure contribution, scientists are unlikely to adopt FriendFeed en masse as a medium for scientific collaboration. And without widespread adoption, the utility of FriendFeed for scientific collaboration will remain relatively low.

The economics of collaboration

How much is lost due to inefficiencies in the current system of collaboration? To answer this question, imagine a scientist named Alice. Like most scientists, many of Alice’s research projects spontaneously give rise to problems in areas in which she isn’t expert. She juggles hundreds or thousands of such problems, re-examining each occasionally, and looking to make progress, but knowing that only rarely is she the person best suited to solve any given problem.

Suppose that for a particular problem, Alice estimates that it would take her 4-5 weeks to acquire the required expertise and solve the problem. That’s a long time, and so the problem is on the backburner. Unbeknownst to Alice, though, there is another scientist in another part of the world, Bob, who has just the skills to solve the problem in less than a day. This is not at all uncommon. Quite the contrary; my experience is that this is the usual situation. Consider the example of Grossmann, who saved Einstein what might otherwise have been years of extra work.

Do Alice and Bob exchange questions and ideas, and start working towards a solution to Alice’s problem? Unfortunately, nine times out of ten they never even meet, or if they meet, they just exchange small talk. It’s an opportunity lost for a mutually beneficial trade, a loss that may cost weeks of work for Alice. It’s also a great loss for the society that bears the cost of doing science, a loss that must run to billions of dollars each year in total. Expert attention, the ultimate scarce resource in science, is very inefficiently allocated under existing practices for collaboration.

An efficient collaboration market would enable Alice and Bob to find this common interest, and exchange their know-how, in much the same way eBay and craigslist enable people to exchange goods and services. However, in order for this to be possible, a great deal of mutual trust is required. Without such trust, there’s no way Alice will be willing to advertise her questions to the entire community. The danger of free riders who will take advantage for their own benefit (and to Alice’s detriment) is just too high.

In science, we’re so used to this situation that we take it for granted. But let’s compare to the apparently very different problem of buying shoes. Alice walks into a shoestore, with some money. Alice wants shoes more than she wants to keep her money, but Bob the shoestore owner wants the money more than he wants the shoes. As a result, Bob hands over the shoes, Alice hands over the money, and everyone walks away happier after just ten minutes. This rapid transaction takes place because there is a trust infrastructure of laws and enforcement in place that ensures that if either party cheats, they are likely to be caught and punished.

If shoestores operated like scientists trading ideas, first Alice and Bob would need to get to know one another, maybe go for a few beers in a nearby bar. Only then would Alice finally say “you know, I’m looking for some shoes”. After a pause, and a few more beers, Bob would say “You know what, I just happen to have some shoes I’m looking to sell”. Every working scientist recognizes this dance; I know scientists who worry less about selling their house than they do about exchanging scientific information.

In economics, it’s been understood for hundreds of years that wealth is created when we lower barriers to trade, provided there is a trust infrastructure of laws and enforcement to prevent cheating and ensure trade is uncoerced. The basic idea, which goes back to David Ricardo in 1817, is to concentrate on areas where we have a comparative advantage, and to avoid areas where we have a comparative disadvantage.

Although Ricardo’s work was in economics, his analysis works equally well for trade in ideas. Indeed, even were Alice to be far more competent than Bob, Ricardo’s analysis shows that both Alice and Bob benefit if Alice concentrates on areas where she has the greatest comparative advantage, and Bob on areas where he has less comparative disadvantage. Unfortunately, science currently lacks the trust infrastructure and incentives necessary for such free, unrestricted trade of questions and ideas.

An ideal collaboration market will enable just such an exchange of questions and ideas. It will bake in metrics of contribution so participants can demonstrate the impact their work is having. Contributions will be archived, timestamped, and signed, so it’s clear who said what, and when. Combined with high quality filtering and search tools, the result will be an open culture of trust which gives scientists a real incentive to outsource problems, and contribute in areas where they have a great comparative advantage. This will change science.

Further reading

The ideas explored in this essay are developed at much greater length in my book Reinventing Discovery: The New Era of Networked Science.



Subscribe to my blog here.

Acknowledgments

Based on a keynote talk by Michael Nielsen at the New Communication Channels for Biology workshop, San Diego, June 26 and 27, 2008. Thanks to Krishna Subramanian and John Wooley for organizing the workshop, and all the participants for an enjoyable event. Thanks to Eva Amsen, Jen Dodd, Danielle Fong, Peter Rohde, Ben Toner, and Christian Weedbrook for providing feedback that greatly improved early drafts of this essay.



Footnotes

[1] Clay Shirky’s “Here Comes Everybody” is an excellent book that contains much of interest on new ways of collaborating.

[2] An ongoing experiment which incorporates online commentary and many other innovative features is PLoS ONE. It’s too early to tell how successful its commentary will be.

[3] I strongly recommend Peter Suber’s Open Access News as a superb resource on all things open access.

[4] Shirley Wu and Cameron Neylon have stimulating blog posts where they propose ideas closely related to collaboration markets.