At the meeting of the American Association for the Advancement of Science, Atul Butte gave a talk entitled, "Translational Medical Discoveries Through Data Transparency and Reuse." It could just as easily been called "how to run a successful research lab without having a lab." Butte, who is faculty at Stanford, was part of a panel that discussed the open sharing of data, and he used his own experience to provide a compelling case study that showed that, when researchers share their data, it enables others to drive a field forward in new ways.

Butte focused on a specific type of data, generated by what are called DNA or gene chips. A chip that can contain sequences from every single human gene can now be had for only about $250, and each one can survey the expression of all these genes in a single cell type—say a cancer cell, or nerve cells from a Parkinson's patient.

Because this data is entirely digital, the National Institutes of Health has entered what Butte termed "government in library keeping mode." It set up a repository where any researcher can deposit data from a gene chip experiment; a European organization has done something similar. Because these chips are so inexpensive, the rate at which data has been deposited is enormous. Butte said that it's only now dropped down to something like Moore's law, with a doubling every 18 months.

The two major repositories now hold over 850,000 data sets between them. Butte told the audience that anyone can download data from over 22,000 breast cancer experiments. Other options include grabbing data from cells before and after they were exposed to a huge variety of drugs.

Butte's group has used this data to identify possible disease-drug combinations. For example, if a disease causes a set of 30 genes to be expressed more, and another 40 to be expressed less, Butte's group will look for a drug that does the converse, raising the expression of the set of 40 while dropping the 30. It's usually not possible to find a precise match for all genes, but it's not too difficult to pull out promising candidates.

The next step in the research process—testing the drugs in an animal model—would seem to require a lab. But Butte manages to get by without them. He pointed the audience to a site called Assay Depot, where you can basically specify the experiment you'd like to see done, and have labs submit bids for performing it. So, for example, Butte requested an experiment that would test a potential treatment for inflammatory bowel disease in rats, and got 200 bids from industrial and academic labs that had unused capacity.

Most researchers would be a bit leery of handing key experiments over to strangers, but Butte said, "If you don't trust one, buy two of them—that's what we do." In this case, he went with a lab each on the east and west coasts. Each offered an additional feature—one performed tests for inflammation markers in the rats' blood, while the other actually performed a colonoscopy on the rats.

The results of this sort of work has been everything from the usual research publications to the launching of startup companies and thoughts of clinical trials, and all of it has been done using data that's been made publicly available.

Although it's based on other people's work, Butte said that there is still a role for real scientists in the process. After all, he said, "You can never outsource asking good questions."