Microbes are mighty. Diverse communities of these single-celled organisms can have far-reaching effects in larger systems including soil, the human body and global climate.

But scientists still have a lot to learn about how these communities work. A research collaboration that started at Oak Ridge National Laboratory (ORNL) is piecing together part of that puzzle: how soil microbes process chemicals. The work is made possible by supercomputers used to analyze high-throughput sequencing data sets of genes and proteins. The team’s results could lead to approaches that improve how next-generation global climate models represent key microbial functions.

ORNL environmental scientist Melanie Mayes began her career working on soil contamination, but a decade ago a project studying dissolved organic carbon led her to think about climate. Soil carbon is a major contributor to global carbon totals; microbes constantly burp out the greenhouse gases carbon dioxide and methane as they process decaying leaves, roots and other debris. But most of today’s terrestrial climate models overly simplify these soil carbon processes, Mayes says.

“Global climate models might incorporate soil decay data from laboratory experiments or even small field experiments,” she says. But that information is incomplete and lacks detail on microbial processes. “That reservoir is larger than the atmosphere and all of the above-ground biomass combined. And so if you can’t quantify that very well, it can impact predictions.”

Her team built a simple model that includes microbes, but it only distinguished two kinds of extracellular enzymes, catalysts that process environmental chemicals. It was a step forward, Mayes says, “but ultimately we realized that there was a lot more that we could learn.” She wanted to make more detailed connections between microbes and soil chemicals: How do the tiny organisms do what they do? Who does what?

To tackle those questions, she needed far more genomic and proteomic data and sophisticated tools and computational time to analyze that information. Genomic data comes from the microbial gene sequences; proteomic data is digested from the protein sequences, including enzymes, expressed from those genes.

Five years ago, she teamed with bioinformatics researcher Chongle Pan, who has developed algorithms and tools to analyze such data. They used an ORNL laboratory-directed research and development grant to support the initial work. The Department of Energy’s Joint Genome Institute has done most of the genome sequencing. In 2016, Mayes received a DOE Early Career Research Program grant to support the project. And Pan has logged significant allocations from the ASCR Leadership Computing Challenge, totaling more than 55 million processor hours, to analyze data, initially on Titan and more recently on Summit, an IBM AC922 system at the Oak Ridge Leadership Computing Facility, a DOE national user facility.

These methods can be useful for a range of questions about microbe communities.

“It’s a big data challenge,” says Pan, who is now at the University of Oklahoma. A metagenomics sample – combining sequence information from multiple organisms in a community – can generate a terabyte of data, he notes, which could take months or years to process on a small computer cluster. When they combine metagenomics data with metaproteomics data that can include millions of mass spectra, the analysis problem quickly snowballs.

To tackle these problems, Pan and his colleagues have developed scalable algorithms, sorting molecular fragments by organism and reassembling them into coherent sequences. A decade ago, they developed Sipros, an algorithm that assembles soil microbial proteins from the molecule fragments mass spectrometers produce. They use a different algorithm, Disco, to assemble microbial genomes. A third algorithm, Sigma, can quantify the abundance of genomes observed in the metagenomics mixtures.

These methods can be useful for a range of questions about microbe communities. Pan uses them to better understand how plants work with symbiotic soil microbes. He’s also collaborating on a study of human gut microbes, comparing samples from lean and obese African-Americans.

Supercomputers let Pan analyze the data using parallel processing. But this greater computational power doesn’t just speed analysis; it also can improve the quality of the results. As researchers examine the initial metagenome or metaproteome results, they might want to optimize their findings or process the data in a slightly different way to follow up on an initial discovery. With Summit, Pan has flexibility; his team can analyze the data and then check with Mayes to see if the findings make sense. “Then we can process it over again to generate some new results,” he says.

Pan and Mayes and their teams have used these tools to study tropical soils and their microbial activity. They started with samples from the Smithsonian Tropical Research Institute on Panama’s Gigante Peninsula. Researchers there have conducted a 20-year-long experiment, supplying various amounts of phosphorus fertilizer to soil plots and monitoring the effects. The team examined a total of 16 samples, four each from four different plots, and examined the microbial genes and proteins produced. They observed variations in microbial species and active genes that depended on whether phosphorus was scarce or plentiful.

No surprise there. But they also found that millions of different microbial species seemed to use the same general playbook to draw necessary nutrients from soil. So the team, including postdoctoral researchers Qiuming Yao and Yang Song, has grouped the approaches into a few enzymatic pathways that help these diverse organisms acquire phosphorus, nitrogen and carbon compounds from soils. It’s still a large number of pathways, Mayes notes, “but it’s tractable.”

That’s led them to focus on these microbial gene types and their functions instead of the microbial species. They’re looking at other sites – wetlands in Panama and a forest location in Puerto Rico – to see if those soils give similar results. If so, those enzymatic pathways may represent the key soil microbial functions that researchers would need to factor into a global climate model. Different land-use factors – such as agriculture, wetlands or forest conditions – will most likely change the relative importance of the various enzymes, she notes. “If you can find those functions everywhere you go, then that’s how you can simplify the representation of microbial decomposition of organic matter and extraction of key nutrients.”