Peter Murray-Rust launched the openNotebook resource at last week’s #eLifeSprint2019*. openNotebook is a framework for data mining, searching, and reusing research publications. Below he walks through the steps of how to use the framework in the context of climate change and opening up research to the public. Peter Murray-Rust, GenR and the Open Science Lab at TIB have initiated an open research collaboration Open Climate Knowledge to address the question of how to improve on the low rates of open access publishing related to climate change. Together we want to change this. Firstly by establishing better stats on OA rates and secondly, by coming up with a plan and recommendations for an accelerated transition to 100% OA for climate change.

Image: NOAA | From Earth System Models https://www.gfdl.noaa.gov/earth-system-model/

* See Infobox: ‘eLifeSprint 2019’ at end of article

Greetings to GenR!

I’m Peter Murray-Rust, a retired chemistry academic from Cambridge University, and I feel that the most important thing in our lives now is climate change.

But what can I do that’s the most effective response for me and the world?

However we solve climate change one thing seems certain — we need global collaboration based on facts. Emotions will keep us going, but facts will decide what we do?

I don’t know all the facts that I should. If I lectured on climate change to a first year university course I couldn’t give an accurate picture of the facts and what actions they dictate. So I’m going to try to learn what is common knowledge. But my special contribution comes from a technology and philosophy that allows us to get huge numbers of facts from reliable sources — the scientific literature.

There are literally hundreds of thousands of articles (papers) that are about climate change to some degree. This is an example of searching the biomedical literature (explanation later, and you’ll find it’s very simple.):

getpapers -q "climate change" -n -a info: Searching using eupmc API info: Running in no-execute mode, so nothing will be downloaded info: Found 135931 results

This uses Rik Smith-Unna’s getpapers software to search EuropePubMedCentral for papers with “climate change” in the text. From a total of 135,931 papers about half of them (65,516) are “open access” and can be downloaded (but this figure is likely to be lower for non-biomedical).

The papers are about everything:

species extinction

sea level rise

spread of parasite vectors

weather changes

engineering responses

response by society,

and they’re about everywhere on the planet.

So if you want to find out about crops and West Africa…

getpapers -q "((climate change) AND (west africa) AND (crops))" -n -a info: Found 1628 results

1,628, that’s a lot of papers! But if you have enough disk space and a reasonably good connection you can download them in 5 minutes.

Are they useful? That’s where our AMI comes in. AMI searches these papers on your disk, within a minute or two, for things you might be interested in:

species

vectors

tropical diseases

chemicals

countries

funders

international organizations,

and lots more.

The great thing is that anyone who can run a program can do this! Lars, in the Netherlands, 15 years old, learn how to do this and developed more software. If you love computers (and have access to one), or data, or tackling scientific problems, or combatting climate change that’s all you need.

This makes a great citizen science project. Anyone anywhere with a Net connection can do it. The software, data and dictionaries are all open (no restrictions on use, no fee, and you can change them without permission). We’ll share the data we find (probably on GitHub) as soon as we capture it. This is “OpenNoteBook Science”, (no insider knowledge) as promoted by Jean Claude Bradley.

Don’t think that because you aren’t a “scientist” you can’t understand scientific papers. Of course your not going to understand all of them (I can’t either) or some parts of them, but there are many you can understand the key bits of. If you like maps, graphs, and similar data then you’ll feel right at home.

We’ve set up a project called Open Climate Knowledge (OCK) on GitHub. The technology is packaged as openNotebooks and is used in several projects (most notably plants and their medicinal products) so that means that bugs get reported and hopefully fixed. No matter what your interests and skills you’re welcome.

See Infobox: ‘OCK‘ at end of article

See Infobox: ‘openNotebook’ at end article, including installation and user instructions

There’s a lot of useful stuff on the sister project, essential oils. Also the data we extract is open and well organized so we can use a wide range of other software to analyse it.

If you are a techie, there’s a tutorial (rather XML-heavy!) I’m giving next week at XMLSummerSchool (Oxford) — you’ll have to download it. We’re also starting a communal article for Beilsten J. Organic Chem at https://github.com/petermr/CEVOpen/blob/master/BJOC

Infoboxes

Infobox: eLifeSprint 2019 – InstruMinetal team — an example usecase of searching and identifying scientific instruments in a corpus of research papers Project: SaWaMine (working title) #eLifeSprint2019 4–5 September 2019, Cambridge, UK and online. InstruMinetal was a 2-day eLife sprint group of seven (Sabine Weber, Michael Owonibi, Tiago Lubiana, Peter Murray-Rust, Sophia K. Cheng, Wambui Karuga, and Leonie Mueck). The goal was to automatically search the Open Access literature for plants and the intruments used to extract and analyze essential oils. A thousand papers from EuropePMC were automatically searched and downloaded. Then ContentMine dictionaries (plants, country, species, and funders) were used to find terms in the text and plot frequencies and cooccurrences. Because apparatus and methodology is important we also started to build an “instrument” dictionary (e.g., mass-spec). To automate the process the team explored Machine Learning and natural language processing for users to identify scientific instruments from candidate search results extracted from a phytochemistry corpus of papers CEVOpen. The sprint GUI is built on ContenMine’s FOSS software getpapers and AMI. Goals of the Project: Create a way of automatically extracting candidates for scientific equipment terms from scientific papers.

Create a GUI to display the paper’s paragraph of candidates containing scientific equipment, allowing user to select the ones that are actually instruments.

Find out what kind of scientific equipment the papers in the CEVOpen corpus used and add the terms to Wikidata.

Long term goal: Connect the tool and GUI. NB: Content is partly based on https://github.com/caffiendFrog/elife2019 from Sophia Cheng @caffiendFrog

Infobox: openNotebook openNotebook was launched at elifeSprint Cambridge 2019 by Peter Murray-Rust. Read more about the project here https://github.com/petermr/openNotebook openNotebook is a top-level resource supports the general concept of literature-based OpenNoteBook Science. The foundational software employed — getpapers and AMI — enables the searching and retreival of research papers from open repositories such as EuropePMC and then further processing of the data: either to refine the search, identify elements in and accross the papers, or to semantically enrich the papers. One goal of openNoteBook is to provide a pathway for semantic open access publishing — meaning machine readable, reusable, and computational document and publications. getpapers – Software and installation guide https://github.com/contentmine/getpapers | User tutorial AMI – Software and installation guide https://github.com/ContentMine/ami | User tutorial

Infobox: Open Climate Knowledge (OCK) An open initiative to bring 100% open access publishing to climate change related topics. Current open access rates are at levels of <30/40% which is unacceptably low. Open Climate Knowledge (OCK) has two activities: firstly, build up stats on OA rates in climate change research publishing, primarily using openNotebook software;

secondly, make an actionable plan and recommendation to accelerate the transition to ‘100% OA for climate change’. Using the openNotebook software the next steps for OCK are to build up stats on OA rates and examine three areas related to climate change: energy modeling, the question of ‘runaway climate change’, and species distribution and migration. OCK has been initiated by Peter Murray-Rust and Simon Worthington of GenR and TIB – German National Library of Science and Technology. OCK is looking for volunteers to help in a number of areas and for users to tryout/play with the software https://github.com/petermr/climate