To distill a clear message from growing piles of unruly genomics data, researchers often turn to meta-analysis — a tried-and-true statistical procedure for combining data from multiple studies. But the studies that a meta-analysis might mine for answers can diverge endlessly. Some enroll only men, others only children. Some are done in one country, others across a region like Europe. Some focus on milder forms of a disease, others on more advanced cases. Even if statistical methods can compensate for these kinds of variations, studies rarely use the same protocols and instruments to collect the data, or the same software to analyze it. Researchers performing meta-analyses go to untold lengths trying to clean up the hodgepodge of data to control for these confounding factors.

Purvesh Khatri, a computational immunologist at Stanford University, thinks they’re going about it all wrong. His approach to genomic discovery calls for scouring public repositories for data collected at different hospitals on different populations with different methods — the messier the data, the better. “We start with dirty data,” he says. “If a signal sticks around despite the heterogeneity of the samples, you can bet you’ve actually found something.”

This strategy seems too easy, but in Khatri’s hands, it works. Analyzing troves of public data, Khatri and colleagues have uncovered signature genes that could allow clinicians to detect life-threatening infections that cause sepsis, classify infections as bacterial or viral, and tell if someone has a specific disease such as tuberculosis, dengue or malaria. Last year Khatri and two other scientists launched a company to develop a device for measuring these gene signatures at a patient’s bedside. In short, they’re deciphering the host immune response and turning key genes into diagnostics.

Over the past year Khatri discussed his ideas with Quanta Magazine over the phone, by email and from his whiteboard-lined Stanford office. An edited and condensed version of the conversations follows.

What turned you on to biology?

I left India and came to the U.S. in the “fix the Y2K bug” rush with plans to get a master’s in computer science and become a software engineer. Months after arriving at Wayne State University in Detroit I realized that writing software for the rest of my life was going to be really boring. I joined a lab working on neural networks.

But then my adviser switched to bioinformatics and said he’d pay my tuition if I switched with him. I was a poor Indian grad student. I thought, “You’re going to pay my salary? I’ll do whatever you are doing.” That’s how I moved into biology.

You made a splash pretty quickly. How did that happen?

While my adviser was away on sabbatical in 2000-2001, I worked in the lab doing bioinformatics analyses with a postdoc in our collaborator’s lab, a gynecologist studying genes involved in male fertility. Microarrays for running assays on large numbers of genes at once were brand-new. From a recent experiment, he’d gotten a list of some 3,000 genes of interest, and he was trying to figure out what they were doing.

One day I saw him going from one website to another, copying and pasting text into Excel spreadsheets. I said to him, “You know, I can write software for you that will do all of that automatically. Just tell me what you are doing.” So I wrote a script for him — it took me three days — and with the results we wrote a Lancet paper.

We put the software on the web. There was huge interest. They presented it at some conference, and Pfizer wanted to buy it. I thought, wow, this is such low-hanging fruit. I can be a millionaire soon.

What does the software do?

It takes the set of genes you specify and searches annotation databases to tell you what biological processes and molecular pathways those genes are involved in. If you have a list of 100 genes, it could tell you that 15 are involved in immune response, another 15 are involved in angiogenesis and 50 play a role in glucose metabolism. Let’s say you’re studying Type 1 diabetes. You could look at these results and say, “I’m on the right path.”

This was 15 years ago, when I was getting my master’s degree. I developed more tools and expanded the work into a Ph.D. It’s now an open-access, web-based suite of tools called Onto-Tools. Last I checked a few years ago, it had 15,000 users from many countries, analyzing an average of 100 data sets a day.

Although the tools became very popular, they weren’t telling me how the results get used, how they help people. I wanted to see how research progresses from bioinformatics analyses to lab experiments and ultimately to something that could help patients.

How did you make that switch?

When I came to Stanford as a postdoc in 2008, one of my conditions was that somebody with a wet lab — someone running experiments on samples from mice or actual patients, not just analyzing data in silico — would pay half my salary, because I wanted their skin in the game. I wanted to make predictions using methods I’d develop in one lab, and then work with another lab to validate those predictions and tell me what’s clinically important. That’s how I ended up working with Atul Butte, a bioinformatician, and Minnie Sarwal, a renal transplant physician. [Editor’s note: Butte and Sarwal have both since moved from Stanford to the University of California, San Francisco.]