My dad is a wildlife biologist, and during road trips we took when I was growing up he spent a lot of time talking about the grasses and trees along the highway. It was a game he played, trying to correctly identify the passing greenery from the driver's seat of a moving car. As a carsick-prone kid wedged into the back seat of a Ford F150, I found this supremely lame. As an adult—specifically, one who just spoke with a paleobotanist—I now know something about my father’s roadtripping habit: Identifying leaves isn’t easy.

“I’ve looked at tens of thousands of living and fossil leaves,” says that paleobotanist, Peter Wilf of Penn State's College of Earth and Mineral Sciences. “No one can remember what they all look like. It’s impossible—there’s tens of thousands of vein intersections.” There’s also patterns in vein spacing, different tooth shapes, and a whole host of other features that distinguish one leaf from the next. Unable to commit all these details to memory, botanists rely instead on a manual method of identification developed in the 1800s. That method—called leaf architecture—hasn’t changed much since. It relies on a fat reference book filled with “an unambiguous and standard set of terms for describing leaf form and venation,” and it’s a painstaking process; Wilf says correctly identifying a single leaf’s taxonomy can take two hours.

That’s why, for the past nine years, Wilf has worked with a computational neuroscientist from Brown University to program computer software to do what the human eye cannot: identify families of leaves, in mere milliseconds. The software, which Wilf and his colleagues describe in detail in a recent issue of Proceedings of the National Academy of Sciences, combines computer vision and machine learning algorithms to identify patterns in leaves, linking them to families of leaves they potentially evolved from with 72 percent accuracy. In doing so, Wilf has designed a user-friendly solution to a once-laborious aspect of paleobotany. The program, he says, “is going to really change how we understand plant evolution.”

Shengping Zhang

The project began in 2007, after Wilf read an article in The Economist titled "Easy on the eyes." It documented the work of Thomas Serre, the neuroscientist from Brown, on image-recognition software. Serre was at MIT at the time and had taught a computer to distinguish photos with animals from photos without animals, with an 82 percent rate of accuracy. That was better than his (human) students, who only only pulled it off 80 percent of the time. “An alarm went off in my head,” says Wilf, who cold-called Serre and asked if this computer program could be taught to recognize patterns in leaves. Serre said yes, and the two scientists cobbled together a preliminary image set of leaves from about five families and started running recognition tests on the computer. They quickly achieved an accuracy rating of 35 percent.

By now, Wilf and Serre have fed the program a database of 7,597 images of leaves that have been chemically bleached and then stained, to make details like vein patterns and toothed edges pop. Small imperfections like bug bites and tears were purposefully included, since those details provide clues to the plant’s origins. Once the software processes these ghost images, it creates a heat map on top of them. Red dots point out the importance of different codebook elements, or tiny images illustrating some of the 50 different leaf characteristics. Together, the red dots highlight areas relevant to the family the leaf may belong to.

This, rather than detecting species, is the broader goal for Wilf. He wants to start feeding the software tens of thousands of images of unidentified, fossilized plants. If you’re trying to identify a fossil, Wilf says, it’s almost always of an extinct species, "so finding the evolutionary family is one of our motivators.” Knowing the leaf’s species isn’t as helpful as knowing where the leaf came from or what living leaves it’s related to—invaluable information to a paleobotanist.

In this way, Wilf and Serre's tool creates a stronger bridge between the taxonomical aspects of paleobotany and the ecological side of things. Ellen Currano, an assistant professor in the Department of Geology and Geophysics at the University of Wyoming, says that bridge has been sorely lacking. "You could go into a herbarium and look at leaves, or say, 'I see big leaves, it must be from a wet place,'" but that's less than efficient." Currano, who has studied with Wilf in the past but did not work on this study, also points out that modern botanists can often discern a leaf's taxonomy by looking at the flower or the fruit, but that those often get fossilized separately from each other. "It’s a tremendous challenge to have the leaf, but not flower or fruit," she says. "So [Wilf's tool] is an important breakthrough in that it's taxonomy based on leaves."

It's also taxonomy based on machine learning and image recognition. “Everyone"—at least, every paleobotanist—"has had that dream in their head, if only I could just take a picture of this, and get an identity,” Currano says. In seeking to fulfill that wish, Wilf has taken the same approach to studying fossils that Google engineers have taken to streamlining your search results, or teaching a computer to dominate at Go. Wilf even goes so far as to call his tool "an assistant."

"Assistant" is an apt description. After all, Wilf's creation doesn't always provide hard answers (the software, he reiterates, is 72% accurate, not 100%), but it does serve up helpful prompts and ideas. The computer can quickly, and without bias, see what a well-trained botanist might otherwise overlook—and once the computer presents a promising line of inquiry, human analysis can resume. It's the kind of tool that Wilf is optimistic will unleash “a flood of new botanical information”—but he's definitely not worried about his job. "It's not going to replace botanists," he says, "but it is going to show them where to look.”