Ecological data is constantly being collected worldwide, but how accessible is it? (Image Credit: GBIF, CC BY 4.0, Image Cropped)

This week Trondheim played host to Living Norway, a Norwegian collective that aims to promote FAIR data use and management. It might sound dry from an ecological perspective, but I was told I’d see my supervisor wearing a suit jacket, an opportunity too preposterous to miss. While the latter opportunity was certainly a highlight, the seminar itself proved fascinating, and underlined just how important FAIR data is for ecology, and science in general. So why is it so important, what can we do to help, and why do I keep capitalising FAIR?

The last 12 months have seen the release of two vital global documents, that of the 2018 IPCC Climate Report and the more recent IPBES Global Assessment of Biodiversity and Ecosystem Services. Both painted grim pictures of the state of the world. But both also suffered from a common problem – lack of data. Whilst they highlighted alarming trends, those trends could have easily been bolstered by more case studies, more examples, and stronger conclusions. These are all dependent on large amounts of data.

Some of the data is unavailable for a simple reason – it hasn’t been collected. There are populations of species everywhere that are declining, but we don’t have the time, money or people to make the observations. That’s understandable – we can’t do everything. But there’s also large amounts of ecological data out there that have been collected, and are either locked up in someone’s hard drive, on a difficult-to-find and more difficult-to-access web service, or in some incomprehensible format understandable only to the relevant researcher.

It’s this data that Living Norway is trying to unlock, so to speak. That’s where the concept of FAIR comes into play. It stands for Findable, Accessible, Interoperable and Reusable. It means that data used to produce scientific literature remains available for metastudies, for re-analysis, and for inclusion in large global reports. That ‘re-analysis’ bit is key. The ‘reproducibility crisis’ is only getting worse these days, as studies because almost impossible to replicate, and thus verify. FAIR data not only means better access to more data, it improves our ability to validate research that ecological theories may depend upon.

Who is Collating FAIR Data?

Organisations like the Living Atlas network and the Global Biodiversity Information Facility (GBIF) have made enormous amounts of data available for consumption by the scientific community. GBIF alone currently houses 1.3 billion records and 45,000 datasets, with around 31,000 more records arriving every hour. Yet quantity does not translate to quality, and organisations like GBIF are currently working to improve the standards of data licencing and description to ensure more transparent data management. I spoke with GBIF’s Head of Informatics Tim Robertson during the seminar, and will have the interview available within the coming weeks.

Citizen science (data collected by non-scientists, read more on that here) plays a huge role in expanding scientific databases, with apps like iNaturalist and Arts Observasjoner allowing people to record sightings of species wherever they go. iNaturalist even allows photos to be sent in, which means the data is verifiable. But this data has problems too. Observed locations are often biased towards population centres, and less charismatic species are often overlooked. Artsdatabanken reported that despite their system containing 21 million records, 87% of those were bird observations. Creating incentives for observations of more obscure species in more remote places is a major challenge for those involved in citizen science.

Educating the Next Generation of Scientists on FAIR Data

My favourite talk of the seminar was given by Vigdis Vandik, who demonstrated methods and advantages of teaching FAIR data to science students. Vigdis noted that whilst many Bachelors students are taught plenty of ecological theory, they are given little instruction on how to actually be a scientist. Vigdis has been taking students to workshops located all around the planet over the last two decades, enabling them to collect data (which then gets used in long-term studies) and even publish scientific papers. She also teaches correct processing and documentation of the data.

The advantages here are numerous. I have several gripes with my own science education, one of them being given little-to-no experience with scientific publication and data handling early on. But aside from the benefits of students learning the more practical sides of science, teaching students how to correctly manage data produces better, and more honest, scientists further down the line. If you’ve ever heard of the Nine Circles of Scientific Hell (a paper I cannot recommend enough that you check out), you’ll know that the deepest circles of scientific hell are reserved for those who manipulate or even invent data. Proper data management makes this all but impossible. Emptying those circles of hell is a noble goal.

My interview with Tim Robertson will be available soon. In the meantime, you can check out Living Norway here.