After years of threats, abuse, complaints with forged documentation, crude attempts at blackmail and more, I can tell you that journalists can be quite sensitive about criticism. But there is one valid objection to this column: that I cherry pick the worst examples to write about.

This, of course, is true. When scientific claims are wrong, they're often interestingly wrong. That makes them a good teaching tool to explain how real science works. But there's also a broader worry. People make real-world health-risk behaviour decisions based on information from newspapers, and if that information is routinely misleading, there are real-world consequences.

So how much reporting, overall, is unreliable? To find out, you'd have to take a systematic and unbiased sample – perhaps a whole week's worth of stories – and then check the evidence behind every claim. This would be an enormous job, but a new paper in the journal Public Understanding of Science does exactly that. I'm in a strange position to be writing about it, since the study was my idea, and I'm one of the authors.

Here's what we did. First, we needed a representative, unbiased sample of news stories, so we bought every one of the top 10 bestselling UK newspapers, every day for one week. The top 10 is basically all the newspapers you've heard of, and they weigh a ton when they're stacked in one place.

We went through these to pull out every story with any kind of health claim, about any kind of food or drink, which could be interpreted by a reader as health advice. So "red wine causes breast cancer" was in, but "oranges contain vitamin C" was not.

Then the evidence for every claim was checked. At this point, I will cheerfully declare that the legwork was not done by me: a heroic medical student called Ben Cooper completed this epic task, researching the evidence behind every claim using the best currently available evidence on PubMed, the searchable archive of academic papers, and current systematic reviews on the relationships between food and health.

Finally, to produce data for spotting patterns, the evidence for each claim was graded using two standard systems for categorising the strength of evidence. We worked with the Scottish Intercollegiate Guidelines Network grading system (SIGN), and the World Cancer Research Fund (WCRF)'s scale, because they're simple, widely used, and balanced the conflicting requirements for ease of use and rigour. They're not perfect, but they're pretty good.

Here's what we found: 111 health claims were made in UK newspapers over one week. The vast majority of these claims were only supported by evidence categorised as "insufficient" (62% under the WCRF system). After that, 10% were "possible", 12% were "probable", and in only 15% was the evidence "convincing". Fewer low quality claims ("insufficient" or "possible") were made in broadsheet newspapers, but there wasn't much in it.

There are some clear limitations to this paper. The grading of the evidence could perhaps have been more comprehensive, or done by people who were blinded to the hypothesis of the study (or the source); the evidence could have been rated twice, by two raters, and the level of agreement between them assessed afterwards.

But overall, I think this is quite an interesting finding, a new one, and a worrying one. It seems that the majority of health claims made, in a large representative sample of UK national newspapers, are supported only by the weakest possible forms of evidence.

People who work in public health bend over backwards to disseminate evidence-based information to the public. I wonder if they should also focus on documenting and addressing the harm done by journalists. And for the people who have denied there is a real problem here: I think the onus is now on you to produce evidence justifying your dismissiveness.

• This article has been amended in order to add a link to the research mentioned. Thanks to commenters for pointing this out