MADISON, Wisconsin - A study that used a new digital library and machine reading system to suck the factual marrow from millions of geologic publications dating back decades has unraveled a longstanding mystery of ancient life: Why did easy-to-see and once-common structures called stromatolites essentially cease forming over the long arc of earth history?

Stromatolites are contorted layers of sediment formed by microbes, and they are often found in limestone and other ancient sedimentary rocks deposited beneath oceans.

"Geologists have known for a long time that stromatolites were abundant in shallow marine environments during the Precambrian, before the emergence of multi-cellular life" more than 560 million years ago, says Jon Husson, a post-doctoral researcher and co-author of a study now online in the journal Geology. "But, stromatolites are rare in the ocean today."

The new study measures the slide in stromatolite prevalence based on descriptions of rocks sifted from more than 3 million scientific publications.

"Paleontologists have largely attributed the decline in stromatolites to the evolution of animals, starting some 560 million years ago," says Shanan Peters, a professor of geoscience at University of Wisconsin-Madison and study first author. "Many multi-cellular animals, like snails, eat microbes. The evolution of these big microbe-grazing animals hit 'reset' on the stromatolite's world. Or so the story has gone."

The new study found a weak correlation between stromatolite occurrence and the diversity of animals, but a stronger link to seawater chemistry.

"The best predictor of stromatolite prevalence, both before and after the evolution of animals, is the abundance of dolomite in shallow marine sediments," says Husson. Dolomite is a high-magnesium variety of carbonate, the type of sediment that forms limestone. Dolomite is harder to make than low-magnesium carbonate and it forms today in only a narrow range of marine environments.

When the ocean water is super-saturated with carbonate, "that can make it easier for things like stromatolites to form," says Husson. "In Lake Tanganyika [Africa], there are stromatolites forming today, even though there are animals everywhere, snails and fish. The lake is super-saturated with carbonate, and it's begging to be precipitated. The microbes come along and help it to precipitate, and the result is an abundance of stromatolites." Elevated carbonate saturation can also help the formation of dolomite, thereby driving the correlation with stromatolites found in this study.

Measuring the prevalence of stromatolites through all Earth history is difficult because counting the number of stromatolites alone is not sufficient. You must also know how many rocks could potentially have stromatolites, but do not.

The big innovation of this study is the interplay of a new type of digital library and machine reading system called GeoDeepDive with a geological database called Macrostrat. Both were spearheaded by Peters at UW-Madison.

GeoDeepDive is a digital library built on high throughput computing technology that can "read" millions of papers and siphon off specific information. To date, the GeoDeepDive library contains more than 3 million scientific publications from all scientific disciplines; some 10,000 new published papers are added daily.

Macrostrat is a database describing the known geological properties of North America's upper crust, at different times and depths.

The massive computing capacity at UW-Madison's Center for High Throughput Computing and HTCondor system, the brainchild of UW-Madison computer scientist Miron Livny, powers GeoDeepDive. Combining the digital library with the geological database allowed the researchers to estimate, at different time periods, the percentage of shallow marine rocks that actually have stromatolites.

The study began in the summer of 2015, when the third author, Julia Wilcots, a Madison-native who was then an undergraduate at Princeton, asked Peters for a summer project. "In my typical fashion I gave Julia a few options," Peters says. "She picked stromatolites, so I said, 'Okay, go do it!' With minimal help from us, she developed a working application to discover and extract every mention of stromatolites from our library."

Among 10,200 papers that mentioned stromatolites, "our program was able to extract 1,013 with a name of a rock unit, which enabled us to link stromatolite occurrences to Macrostrat," says Husson.

Wilcots did not have to travel to see stromatolites, Peters says. "In Madison, we are sitting on top of rocks recording one of the biggest rises in stromatolite abundance - at least during the age of animals."

Scientists long ago observed that stromatolites started a long decline just before the start of the Cambrian era, but that decline represented a "fundamental question of paleobiology," Husson says. "Stromatolites are the oldest fossils that are visible to the naked eye. If you look at rock that is a billion years old, the chance for seeing evidence of life equals the chance of seeing stromatolites."

Beyond answering a fundamental question of Earth's history, the new study "allows us to do the kind of analyses that scientists used to only dream about, Peters says: 'If we could just compile all the published information on... anything!'

"Doing this study without GeoDeepDive would be all but impossible," Peters adds. "Reading thousands of papers to pick out references to stromatolites, and then linking them to a certain rock unit and geologic period, would take an entire career, even with Google Scholar. Here we got started with a talented undergrad working on a summer project. GeoDeepDive has greatly lowered the barrier to compiling literature data in order to answer many questions."

Another beauty of the big data, machine-reading approach is the baked-in capability for replication and improvement. "Now that this study has been done, we can run the stromatolite application again and again. We can refine the searches, and they will evaluate the new data that is being published all the time," Peters says. "So a rerun could make a better study, with minimal effort."

For centuries, "geologists have transferred hard-to-get information from the field to hard-to-get information in the literature," Peters says. "To achieve a broad-scale synthesis, you have to survey all of the published knowledge. There are new discoveries waiting in the scientific literature, if you can see the big picture and get all the data into one place."

###

David Tenenbaum, 608-265-8549, djtenenb@wisc.edu