It’s often said that we’re in, or will soon enter, the era of Big Data. We’ll have all the data we could possibly want, and so we’ll no longer be data-limited. Instead, the rate of scientific progress will be limited by other factors, like our ability to think of good questions.

But as Jeremy Yoder and David Hembry asked in the comment thread on this old post: what sorts of Big Data do ecologists (and evolutionary biologists) actually have? We certainly don’t have Big Data on everything–whatever that might mean! Rather, we have Big Data on certain things on which technological advances have made it easy to collect data. Gene sequences, for instance. Records of where and when species have been observed, thanks to things like camera trap networks, citizen science projects and smartphone apps, and digitization of museum records. Information that can be remotely sensed, like land cover. Probably other sorts of data I’m forgetting.

What don’t we have Big Data on, even though we really wish we did? What data that we would really like to have has not gotten any easier to obtain thanks to smartphones, satellites, drones, cheap PCR, citizen science, etc.? I’d say demographic data is a big one. Data on the births and deaths (and for mobile organisms, movements) of lots of individuals. Ideally along with relevant environmental data sampled at the spatial and temporal grains and extents relevant to those individuals. And it’d sure be nice to have this information for many generations, but of course there’s no way for technology to speed that up.* And to have it for many different species, so that we could do community ecology and not just population ecology.**

Here’s another sort of Big Data we mostly don’t have: data from controlled, manipulative, randomized experiments. A lot of Big Data is observational data. Which is great. But no matter how much observational data you have, on whatever variables you have it on, inferring causality without experimental data is going to be difficult at best. The great thing about NutNet is that it’s Big Experimental Data. Not that technological advances are irrelevant for NutNet–the internet facilitates collaboration, for instance. But information technology doesn’t make it any easier to fence plots or add fertilizer or remove a species of interest or etc.

So, what do you think are the biggest and most difficult-to-close gaps in ecologists’ collective data collection efforts?

Hat tip to Peter Adler, who got me thinking about this.

*For this reason, I wonder if there will be a long-term trend for ecologists to focus more on spatial variation and less on temporal variation. Technological advances can improve the spatial extent of our sampling, but not the temporal extent.

**And as long as I’m dreaming, I’d like my free pony to be a palomino.