There are now > 750 metabolomics studies indexed on MetabolomeXchange (http://www.metabolomexchange.org/) and > 1300 on OmicsDI (Perez-Riverol et al. 2017). These studies with data publicly available in dedicated repositories [MetaboLights (Haug et al. 2013), Metabolomics Workbench (Sud et al. 2016), MetaPhen (Carroll et al. 2015), MeRy-B (Ferry-Dumazet et al. 2011) and GNPS (Wang et al. 2016)], directly link to 368 unique journal articles (as of 15th September 2017). 45.4% of these 368 journal articles are published in ten journals (Fig. 1), with 58 (14.4%) being published in the Metabolomics journal.

Fig. 1 The ten journals with the highest frequency of publications directly linked from a publicly available metabolomics study, in a dedicated repository (MetaboLights, Metabolomics Workbench, MetaPhen, MeRy-B and GNPS) Full size image

Of these 10 journals, PLoS One, PNAS, Scientific Reports and Nature Communications require data availability statements to be included within submitted manuscripts. PLoS One specifically requires data be submitted to an appropriate public repository, whereas Nature Journals (Nature Communications and Scientific Reports) and PNAS only encourage it. Metabolomics and Plant Physiology require that authors make materials available to investigators for non-commercial research purposes, with Metabolomics specifying raw data must be shared and suggesting users deposit their data in a repository.

There are more than 17,000 journal articles on metabolomics indexed in PubMed (when searching for “metabolome” OR “metabolomics”). The ten journals that have published the highest number of papers returned when searching PubMed using this criteria are shown in Fig. 2. Whilst PLoS One appears to have the highest number of metabolomics papers, it is worth noting that only ~ 25% (304/1178) of articles published in the journal Metabolomics are indexed in PubMed.

Fig. 2 The ten journals with the highest frequency of publications when searching PubMed for “metabolome” OR “metabolomics” Full size image

Searching PubMed like this provides only a very rough estimate of the total number of metabolomics journal articles, as metabolomics papers may not contain the words “metabolome” or “metabolomics” in their title or abstract, may not be indexed by PubMed, or non-metabolomics papers may be returned. In fact, 44% of publications directly associated with metabolomics studies are not returned when searching for “metabolome” OR “metabolomics” in PubMed. Of these, 12.6% were not indexed on PubMed and 31.8% were indexed, but not returned. Despite this, it can be assumed that the majority of articles returned when searching for “metabolome” OR “metabolomics” are of metabolomics research.

Two of the journals with highest number of metabolomics papers on PubMed have no data sharing policies (Analytical Chemistry and Methods in Molecular Biology). The Journal of Proteome Research encourages users to deposit proteomics data in ProteomeXchange (Vizcaíno et al. 2014), however, unsurprisingly, has no policy on metabolomics data. Molecular BioSystems, the Journal of Chromatography B and the Journal of Pharmaceutical and Biomedical Analysis all encourage data sharing.

Given the number of journal articles published in the field of metabolomics, it would be expected that far more studies make their data open than actually do. The current data sharing policy of PLoS One has been in place since March 2014 (Bloom et al. 2014) and Springer Nature have had their policy since September 2016 (Announcement: Where are the data? 2016). Since 2015 PLoS One has published ~ 400 metabolomics papers and since 2017 Scientific Reports has published > 140. Despite not requiring data sharing via a dedicated repository, articles published by the metabolomics journal share data in dedicated repositories at a higher rate than those in PLoS One.

Although MetaboLights, is one of PLoS One’s recommended repositories for omics data, users may remain unaware of dedicated metabolomics repositories and instead publish their data in general repositories such as Dryad (https://datadryad.org/), figshare (https://figshare.com/) or Zenodo (https://zenodo.org/). As, PLoS One’s data sharing policy specifically states “authors do not need to submit the raw data collected during an investigation if the standard in the field is to share data that have been processed”, authors may feel that metabolomics is one such field where sharing only preprocessed data or an annotated list of identified metabolites is sufficient, rather than raw spectral data.

Another possibility is that journals such as PLoS One and Scientific Reports publish a higher percentage of clinical research or other studies with human participants. Due to concerns of patient privacy and consent, both journals have different data sharing requirements for clinical studies compared to those for studies including non-human subjects. Only summary, rather than raw data, must be reported for clinical studies.

The Ethical, Legal and Social Implications (ELSI) of sharing data from research involving human participants should always be considered, and protecting patient privacy must be a priority. However, except potentially in the case of rare diseases, there is currently no known means of identifying a patient from their metabolic profile. This is especially true for large cohort studies that include many patients with the same disease. There is a far greater risk of patient identification from genetic data than metabolomic. Despite this, a repository for sensitive genomics data has been developed, the European Genome-phenome Archive (EGA) (Lappalainen et al. 2015), which allows controlled access to datasets that cannot be made publically available. A similar repository could be established for clinical metabolomics data, allowing researchers to identify studies containing data relevant to their research by searching for diseases, metabolites or pathways of interest (metadata). Users could then apply to the studies’ data access committee for access to the dataset of interest. As recommended by the H2020 PhenoMenAl guidelines (http://phenomenal-h2020.eu/home/) for encoding data terms of use in the ISA format (and adopted by EGA), metadata describing terms of use, consent availability and additional ancillary information in the repository should be encoded using the Data Use Ontology (https://www.ebi.ac.uk/ols/ontologies/duo).

An additional concern is the number of publicly available metabolomics studies with raw data that have no associated publication: > 800. Associated journal articles probably exist for much of this open data, however there is no direct link between the data and the literature. This hinders the re-use of data, as papers likely contain more detailed experimental design descriptions and additional metadata, and data alone are insufficient for reanalysis (Kind and Fiehn 2009).

The connection between the publication review process at journals and the deposition of data to public repositories (such as MetaboLights or Metabolomics Workshop) must be improved. Potential methods to enhance this connection include Research Resource Identifiers (RRID) and project preregistration. RRID are unique, persistent identifiers that can be used for referencing a research resource, such as software, organisms or cell lines. Publishers could use RRID to link publications to data. Following the generation of the experimental design, projects can be preregistered—outlining what data and analysis will be performed prior to observing the research outcomes. Examples of repositories that allow project preregistration include the European Bioinformatics Institute’s BioSamples (Faulconbridge et al. 2014) and the National Center for Biotechnology Information’s BioProject (Barrett et al. 2012).

For metabolomics to become an established clinical tool, meta-analyses must be performed. This is necessary in order to demonstrate that quantitative metabolite measurements are reliable and accurate across studies. Meta-analysis cannot be performed without available data. It is also worth noting that data are valuable research outputs in their own right. However, a cultural shift is required to citing data themselves, rather than citing journal articles, in order to give data their full credit. The metabolomics community must move away from accepting non-interoperable summary tables as an acceptable way of disseminating data and towards requiring data sharing.