In this excerpt from my new book, Too Big To Know, we'll look at a key property of the networking of knowledge: hugeness.



In 1963, Bernard K. Forscher of the Mayo Clinic complained in a now famous letter printed in the prestigious journal Science that scientists were generating too many facts. Titled Chaos in the Brickyard, the letter warned that the new generation of scientists was too busy churning out bricks -- facts -- without regard to how they go together. Brickmaking, Forscher feared, had become an end in itself. "And so it happened that the land became flooded with bricks. ... It became difficult to find the proper bricks for a task because one had to hunt among so many. ... It became difficult to complete a useful edifice because, as soon as the foundations were discernible, they were buried under an avalanche of random bricks."

If science looked like a chaotic brickyard in 1963, Dr. Forscher would have sat down and wailed if he were shown the Global Biodiversity Information Facility at GBIF.org. Over the past few years, GBIF has collected thousands of collections of fact-bricks about the distribution of life over our planet, from the bacteria collection of the Polish National Institute of Public Health to the Weddell Seal Census of the Vestfold Hills of Antarctica. GBIF.org is designed to be just the sort of brickyard Dr. Forscher deplored -- information presented without hypothesis, theory, or edifice -- except far larger because the good doctor could not have foreseen the networking of brickyards.

Indeed, networked fact-based brickyards are a growth industry. For example, at ProteomeCommons.org you'll find information about the proteins specific to various organisms. An independent project by a grad student, Proteome Commons makes available almost 13 million data files, for a total of 12.6 terabytes of information. The data come from scientists from around the world, and are made available to everyone, for free. The Sloan Digital Sky Survey -- under the modest tag line Mapping the Universe -- has been gathering and releasing maps of the skies gathered from 25 institutions around the world. Its initial survey, completed in 2008 after eight years of work, published information about 230 million celestial objects, including 930,000 galaxies; each galaxy contains millions of stars, so this brickyard may grow to a size where we have trouble naming the number. The best known of the new data brickyards, the Human Genome Project, in 2001 completed mapping the entire genetic blueprint of the human species; it has been surpassed in terms of quantity by the International Nucleotide Sequence Database Collaboration, which as of May 2009 had gathered 250 billion pieces of genetic data.

There are three basic reasons scientific data has increased to the point that the brickyard metaphor now looks 19th century. First, the economics of deletion have changed. We used to throw out most of the photos we took with our pathetic old film cameras because, even though they were far more expensive to create than today's digital images, photo albums were expensive, took up space, and required us to invest considerable time in deciding which photos would make the cut. Now, it's often less expensive to store them all on our hard drive (or at some website) than it is to weed through them.