How Open Data for Science Will Change How Businesses Compete

Today, even an ordinary teenager with a smartphone has almost godlike power over information. With a few swipes and clicks, anybody can access the world’s information, use advanced tools to analyze its meaning and share it with anyone else. That’s really changed how we innovate.

So it’s strange that the practice of science has, for the most part, been stuck in the dark ages. The process of research, peer review and publication remains almost as slow and cumbersome as it was decades ago, which hinders our ability to turn new discoveries into useful applications.

That may be changing though. Taking a page from the open source movement, there are a number of efforts underway to aggregate the latest knowledge and make it available to anyone who wants to use it. From cancer research and materials science to psychological profiles, these new data sets will enable and empower innovation like never before.

A Periodic Table For Cancer

Typically, cancer research has been approached according to the traditional scientific method. A scientist would come up with a theory about how a particular mutation causes a tumor, apply for a grant, perform a study and then publish results. But in 2005, researcher at the National Cancer Institute (NCI), saw an opportunity to go in another direction.

“We said, ‘Let’s gather data along with some basic analysis, publish it and allow the scientific community to study it,’” Jean Claude Zenklusen a biologist at NCI told me. “We did this because we believed by releasing the data in this way, we could tap into the collective expertise of thousands of researchers across a number of fields and accelerate innovation.”

This approach formed the basis for The Cancer Genome Atlas (TCGA), a joint project between NCI and the National Human Genome Research Institute, which began in 2006 and has since sequenced the tumors of over 10,000 patients encompassing 33 types of cancer. “Cancer data has now become open data,” Zenklusen told me proudly.

Today, a decade later, its effect on cancer science has been profound. “It essentially gives us a periodic table,” Ron DePinho, President of MD Anderson Cancer Center says, which has provided us with both diagnostic and therapeutic value as well as helped us design clinical trials to accelerate the development of new cancer drugs,”.

Yet the impact of the program goes far beyond major institutions like MD Anderson. Much like the original periodic table, it has greatly democratized scientific knowledge. Many of the researchers who use the data are first-time grantees from small institutions who likely wouldn’t have gotten their studies off the ground without TCGA as a resource.

A Genome For 21st Century Manufacturing

Like a biological organism, every product is made up of materials and the properties of those materials, in large part, determine how the product functions and performs. So a key way to improve products is to improve the materials in some way, by making them lighter, stronger, better at conducting electricity or whatever.

Traditionally, the way you improved a product has been a process of trial and error. You changed the ingredients or the process by which you made it and saw what happened. For example, at some point a medieval blacksmith figured out that annealing iron would make better swords.

Yet today, coming up with better materials is a multi-billion business. Consider a car maker that wants to improve fuel economy. It could use a smaller, less powerful engine, but that would sacrifice performance. So a much better solution would be to figure out how to make a lighter material that is strong enough to not compromise safety.

With this in mind, the Materials Genome Initiative is building databases of material properties like strength, density and other things, and also includes computer models to predict what processes will achieve the qualities a manufacturer is looking for. Like The Cancer Genome Atlas, it is making the data available to anyone who can find a use for it.

“Our goal is to speed up the development of new materials by making clear the relationship between materials, how they are processed and what properties are likely to result,” Jim Warren, Director of the Materials Genome program told me. “My hope is that the Materials Genome will accelerate innovation in just about every industry America competes in.”

Genomes Of The Mind

Artificial intelligence has long been an area of intense interest at IBM and, in particular, deriving meaning from language. That, after all, is how it created its revolutionary Watson system that beat humans at Jeopardy and is now assisting professional in fields ranging from medicine, finance and even music.

IBM Fellow and Vice President of Healthcare and Life Sciences Research Ajay Royyuru, however, thinks artificial intelligence can go even further and help us understand the most complex entity on the planet — ourselves. “Language is a means to transfer cognitive state,” he told me. “While I’m talking to you, I’m effectively trying to make a link between what’s going on in my mind with what’s going on in yours.”

So his team at IBM’s Healthcare and Life Science division began studying chess players to see if they could find a correlation between their brain activity and their proficiency. Indeed, they found that they could. They later had similar success with evaluating musicians. Now IBM is working on a system that could evaluate mental health through language processing.

“Our hope is that this technology, when combined with the expertise of a trained therapist, can help recognize early indications of mental illnesses and enable the opportunity for more effective treatment before more acute symptoms present themselves,” Royyuru says.

A New Era Of Mass Collaboration

In the past, only major institutions could do large-scale innovation. Sure, a few guys in a garage somewhere could rearrange existing technology and build a disruptive product, but to make a fundamental change in how things work, you had to work in a major lab with a large budget. The quaint notion of a lone scientist at the bench has been defunct for a long time.

The traditional practice of science reflected these realities. To do any significant research, you had to get a budget or a government grant for which you must make your purpose clear. Anything you find that aligns with that stated purpose gets published, but most of what doesn’t tends to be discarded or lost in a notebook or hard drive somewhere.

Yet with the cost of storage and search now negligible, the economics of science are changing. “Having huge amounts of data becomes much more interesting when we can classify it in some way and can even be the first step towards creating a generalized model, which drives further innovation,” the complexity theorist Samuel Arbesman told me.

Essentially, when scientific data becomes open data, the power of fundamental research becomes available to just about anyone with an idea. You no longer need a billion dollar budget to make a breakthrough, but can use the collective knowledge of the world’s scientists to imagine a new future.

– Greg

An earlier version of this article first appeared in Inc.com