Editor’s note: Research integrity and reproducibility remain a major concern for the academic world. In terms of image misuse, a review of 20,000 published papers containing the phrase “Western blots” found that nearly 4 percent contained inappropriately duplicated images – and at least half of those cases suggested deliberate manipulation.

A number of universities and companies are actively seeking solutions to this problem. In January, we published an article about a team at Harvard Medical School (HMS) combining expertise from the Image Data and Analysis Core (IDAC) and Harvard Medical School’s Office for Academic and Research Integrity (ARI) that is developing an open-source tool to flag potential image manipulation prior to publication. That tool may ultimately feature in Elsevier’s editorial submission system.

But according to Dr. Mary C. Walsh, Chief Scientific Investigator at ARI, there is still a key resource missing: a shared “test” dataset that can be used to determine how effective each version of the tool might be. She explained: “… a benchmark dataset that enables comparable tests of effectiveness across different platforms will be necessary not only for our work here but for the continued development and refinement of these efforts in the community.”

Germany’s HEADT Centre (Humboldt-Elsevier Advanced Data and Text Centre) has been working on an image integrity database (IIDB) that promises to fill that gap. A beta version of the database is now ready, and the HEADT Centre is planning a series of “competitions” in which researchers will be invited to explain how they would use it to develop and improve manipulation detection algorithms. In this article, Dr. Thorsten S. Beck, one of the researchers behind the tool, explains more about the origins of the project and its plans.

How do researchers use and change images to make their results look more consistent or convincing? What is considered “appropriate” image manipulation, and when does a scientist cross the line?

These are some of the questions I’ve been trying to answer since I started writing my PhD thesis on scholarly image manipulation back in 2013.

Inappropriate image manipulation is not good for the ecosystem of science. Science builds on science, and if there’s something wrong with a published paper, then you are poisoning that well.

What is the HEADT Centre? Launched in 2016, the HEADT Centre (Humboldt-Elsevier Advanced Data and Text Centre)aims to become a national and international resource on all aspects of research integrity. Detection mechanisms, replication studies, behavioral aspects and legal considerations are all in scope. Experts from Elsevier and researchers from Humboldt-Universität zu Berlin currently focus on two key areas: Creating infrastructures and algorithms to improve computer-aided discovery and enhance the efficiency of text and data mining (TDM).

Research integrity issues such as plagiarism, image manipulation, data falsification and fabrication. Results of the centre’s research will be made readily available, supporting its goal to establish broader international cooperation among universities and institutions with similar research interests.

Many of the prominent cases of scientific misconduct you see in the news involve image manipulation to some degree. These cases are only the tip of the iceberg, though I’d say the majority of cases are accidental, for example, wrongly-labelled files executed by inexperienced staff, or perhaps a lack of awareness about what’s acceptable – although that’s not really an excuse!

For my PhD, I interviewed a number of biomedical journal editors, who quickly made it clear that for them, the misuse of scientific images is a major concern.

Beyond the structured process of peer review, it’s mainly editors who are in charge of screening articles, and right now, spotting any kind of manipulation is complex, time-consuming and costly and requires expertise and experience. It’s largely a manual process that involves importing the file back into the software used to make the edits (for example, Photoshop) so that they can track any changes made.

There is also little consistency in how journals are tackling this problem. Some screen all images, while others do spot checking, or no checking at all. Some provide repositories where researchers are required to submit all original image files, including the data behind their images, for later comparison if doubts are raised.

At the moment, there are a number of teams around the world working on developing image manipulation detection software – a sort of Crossref Similarity Check for images. To test the success of each iteration, these teams create and manipulate fictitious datasets, but that doesn’t allow them to judge the tool’s effectiveness in a real-life situation. And even if they did use real data, with no consistency across teams, it’s impossible to compare accuracy levels.

Together with Elsevier, and taking inspiration from research carried out by the Text REtrieval Conference (TREC), we decided to create a gold standard database using real data for them to test. But, before we could create that database, we had to decide on the criteria, and there were a lot of questions we needed to answer.

The first question was: What kind of images do we want the database to contain? We decided to begin by focusing on the biomedical sciences, for example Western and Northern blots, since many of the retraction cases we looked at occurred in those fields. Also, in biology and medicine, there are very clear standards, which provide a framework for understanding what went wrong, why an article was retracted and what is or isn’t considered appropriate. Still, even with guidelines at hand, drawing a line is a very interesting exercise, as the research of Emma Frow found. In the biomedical fields, it’s generally acceptable to change the contrast or brightness levels over the entire image, but when I was conducting research for my PhD, I spoke with a scientist who had an image with a scratch on it. She asked someone with Photoshop expertise to remove that scratch for her and ended up with a good, clear image. She hadn’t really changed the data, but she had altered the image, so the question is, “Was that acceptable manipulation?” Erasing artefacts or dust from images in biomedicine is generally not considered good scholarly practice.

With our database, we didn’t have to make those kind of judgments as we decided early on to focus on images from retracted articles. We started by analyzing image manipulation-related cases in the Retraction Watch database. We only included cases where the manipulation was clear and established – some retraction cases are still pending and may take years to resolve; many others remain undetected and only surface years later. Elsevier also provided us with a lot of original data, which gave us a very strong start.

In this three-minute video, Dr. Beck talks about the HEADT Centre’s image manipulation research. To view this embedded content, please enable Targeting cookies in your To view this embedded content, please enable JavaScript

Another question was: What kind of metadata should we include? We decided to provide a rich dataset, using whatever details were available in the retraction notices. This includes article title, the publisher, the author, the image caption, entries in Retraction Watch or PubPeer and the reason the article was retracted. We’ve captured as much information as we could find.

We also had to establish a structure and wanted our users to be able to browse images by category. So we identified 18 common types of manipulation and turned those into categories: for example, duplication, rotation, erasing or introducing elements, recycling or plagiarizing images.

The selection and curation of the image and article data took a lot of manual work, but we now have a constantly growing database with images that can help teams around the globe develop and train algorithms for image manipulation detection.

Next steps

There is still so much to learn about this topic, and by creating the ability to comment on, annotate and exchange opinions about the images and their manipulations in the database, our goal is to build up an online community to share knowledge and skills.

Next, we plan to reach out to research groups that work on image manipulation detection and invite them to touch our data and let us know how they would use it. These invites will take the form of competitions or grand challenges (again inspired by TREC). Once we’ve read their proposals, we may end up working with one or more of these research groups. We are already in touch with the Harvard Medical School team. They are interested in discovering how effective their algorithms are at spotting the manipulation in our data. We also expect to hear from some software product companies.

Another key goal is to speak with other publishers to see if they would be willing to share images and data, or to support our project with expertise, so we can build this resource to benefit the whole academic community. We have already had some early discussions, but the challenge is that not every publisher keeps the original submitted images on file; they often get converted to publisher-ready formats, with the originals getting lost.