"It seems that the majority of health claims made, in a large representative sample of UK national newspapers, are supported only by the weakest possible forms of evidence."

That is how my colleague Ben Goldacre summarised the finding of an interesting study that he initiated (and co-authored) in the journal Public Understanding of Science into the quality of dietary health claims reported in the UK print media. The research team found that 72% (or 68% depending on which scale you use) of such claims are based on the flimsiest category of scientific evidence.

That's a pretty shocking figure and the research is a welcome and useful addition to an important debate. But on closer examination, I believe there are significant limitations in the way the study was conducted that make the 72% headline figure unreliable.

First though, let me be clear. I am not one of the deniers that Ben refers to in his column. Trying to help improve the quality of science in the media was one of the reasons that I became a journalist after finishing my PhD.

I would not seek to defend every health/diet piece that appeared in every corner of the UK quality papers, let alone every flaky tabloid supplement. Journalists (particularly non-specialists) too often fall for implausible claims from quack "experts" and marketeers, or they give too much weight to preliminary findings. Dieting pieces are particularly susceptible to the "quirky quick fix" fad, which can leave readers confused. And there are some head-in-hands corkers in the articles the team looked at ("Got a headache? Reach for the paprika" says the Express, for example, in an article entitled "Spice up your life: your health" on 4 November 2008).

But apart from its important limitations, Ben's study – which was led by Prof Thomas Sanders, head of diabetes and nutritional sciences at King's College London's School of Medicine – appears to demand a standard of evidence for writing about science that is self-defeatingly high. It would exclude almost all science from newspapers and leave the public with an impoverished understanding of research, much of which is publicly funded.

So what did they do? Here's how Ben describes the study:



First, we needed a representative, unbiased sample of news stories, so we bought every one of the top 10 bestselling UK newspapers, every day for one week. The top 10 is basically all the newspapers you've heard of, and they weigh a ton when they're stacked in one place. We went through these to pull out every story with any kind of health claim, about any kind of food or drink, which could be interpreted by a reader as health advice. So "red wine causes breast cancer" was in, but "oranges contain vitamin C" was not. Then the evidence for every claim was checked. At this point, I will cheerfully declare that the legwork was not done by me: a heroic medical student called Ben Cooper completed this epic task, researching the evidence behind every claim using the best currently available evidence on PubMed, the searchable archive of academic papers, and current systematic reviews on the relationships between food and health. Finally, to produce data for spotting patterns, the evidence for each claim was graded using two standard systems for categorising the strength of evidence. We worked with the Scottish Intercollegiate Guidelines Network grading system (SIGN), and the World Cancer Research Fund (WCRF)'s scale, because they're simple, widely used, and balanced the conflicting requirements for ease of use and rigour. They're not perfect, but they're pretty good.

Let me add my tribute to Ben Cooper for undertaking the truly Herculean task of extracting every diet claim from a week's newspapers. Take a bow, sir. And my thanks also to Prof Sanders for sharing with me the list of 37 articles that contain the 111 claims they examined. The study is impossible to evaluate properly without that information – and how those claims were ranked – and it should have appeared as an appendix in the original paper.

Only 17 of the 37 articles the team looked at are news stories. The rest Cooper describes as "reader advice". One is from a TV guide. The real problem, though, is with the week chosen at random to sample the articles. The authors settled on 2-8 November 2008. As they acknowledge with impressive understatement, "There are reasons to suppose that it was not a typical week."

You bet! Unfortunately, their chosen week coincided with one of the biggest global stories of the decade. It was the week when the US elected its first black president and also of Lewis Hamilton's success in the Formula 1 championship. Now I'm not sure whether that affected the results systematically, but my suspicion is that it would have knocked a large number of science and health stories out of the papers (particularly the broadsheets).

That seems to be borne out by the fact that the sample of 37 articles included just two from the Guardian, none from the Observer, none from the Independent, none from the Independent on Sunday, one from the Times and one from the Sunday Times. The Daily Express alone accounts for 31 of the claims (with 16 appearing in a single article).

I am not suggesting that the team deliberately chose that week, simply that it is unfortunate that it had such an obvious potential problem with it. I don't think the claim that this is a "representative, unbiased sample of news stories" stands up to scrutiny.

The second problem concerns what constitutes a dietary health "claim", for the purposes of the study. The authors sensibly include statements that could be construed as direct advice (for instance a statement like, "don't drink red wine – it causes breast cancer") and statements from which one might infer health advice (like, "red wine causes breast cancer"). But by feeding every such claim into the two evidence grading systems, the authors miss some very important context that is present in the articles and which, I believe, gives readers a chance to judge the quality of the evidence for themselves.

Take for example the claims that appeared in the two articles from the Guardian (both of which were, incidentally, written by doctors not journalists). In "What's bugging you", Dr Tom Nolan – a medical professional who has worked as a hospital doctor – writes:

Most colds get better within a few days. Paracetamol is best for a sore throat and temperature. For a blocked nose, try decongestants, but do not use them for longer than a week or your nose could block up again when you stop, even if your cold has gone. There is some evidence to support taking the herbal remedy echinacea, but preparations vary so it is hard to tell what you are getting. Vitamin C tablets will not make you better any sooner.

The diet study authors told me that they rated the echinacea claim as "possible" and the vitamin C claim as "convincing". To my reading, Nolan expresses the uncertainly in the evidence around echinacea and hardly offers it a ringing endorsement, so to lump it in with more bald statements doesn't seem very useful.

In the other Guardian article, from the regular Doctor, Doctor column, Dr Tom Smith (a general practitioner) writes:

A recent Italian study linked the combination of Italian food and dark chocolate with lower levels of a protein in the blood related to inflammation – C-reactive protein (CRP). Basically, the lower your CRP, the lower your risk of heart attack. There's one snag: the lowest risk is at a level of 20g every three days; below and above this level the risk rises. So eat chocolate, by all means, but make it dark, and don't overdo it. The fact that you're not overweight should in theory help to lower your risk further.

Again, a very contextualised response from a GP that the diet study authors told me was placed in the second lowest evidence category of the study.

I sent the text of this critique to the authors prior to publication. Prof Sanders responded:

I think you misrepresent the article because we had a clear definition of what was a claim and what was not. We excluded many articles published in that week because they did not meet the criteria...The bottom line is we need better standards of evidence on which to make dietary advice to the public and this is not helped by the way in which papers report health claims for food.

He also pointed out that article underwent extensive peer review prior to publication. But I do not believe that the reviewers would have had an opportunity to scrutinise the grading of the claims selected by the team because those claims and the articles they came from are not listed in the paper. If the reviewers did have sight of them, I'd be very happy to be corrected in the comments below.

And that brings me to the final issue with the study. The grading systems for "reliability" of evidence that the authors employ are not sophisticated enough to be much use in this context. Deciding to write a news story with appropriate caveats and explanations is very different from a clinical decision on whether to prescribe a particular treatment.

Take for example a story by my colleague, Guardian science correspondent Ian Sample, from March 2009. In it he reported controversial claims from the government's chief scientist Sir John Beddington.

A "perfect storm" of food shortages, scarce water and insufficient energy resources threaten to unleash public unrest, cross-border conflicts and mass migration as people flee from the worst-affected regions, the UK government's chief scientist will warn tomorrow. In a major speech to environmental groups and politicians, Professor John Beddington, who took up the position of chief scientific adviser last year, will say that the world is heading for major upheavals which are due to come to a head in 2030. He will tell the government's Sustainable Development UK conference in Westminster that the growing population and success in alleviating poverty in developing countries will trigger a surge in demand for food, water and energy over the next two decades, at a time when governments must also make major progress in combating climate change.

I doubt that Prof Beddington's conclusions would rank highly if put through the grading systems adopted by the Goldacre/Sanders diet study (one of the grading systems puts "expert opinion" in the lowest category of evidence). But whether Beddington is right or wrong and whether his claims reach the highest evidential standards, his expert view is worthy of reporting because of who he is and the fact that he has the ear of the prime minister.

Another example would be the advice given by the Department of Health and the Health Protection Agency (HPA) regarding mobile phone use.

Despite the lack of good evidence for a link between radiation from handsets and health problems, both warn against "excessive use" by children. This reflects essentially a precautionary approach because it is hard to prove a negative – ie that something is not harmful.

Using one of the rating systems in the diet study (The World Cancer Research Fund grading system), the claim that children may be at risk from using mobile phone handsets would, I think, fit into the lowest, "insufficient" category. This is how the category is described:

Evidence based on findings of a few studies which are suggestive, but are insufficient to establish an association between exposure and disease. Limited or no evidence is available from randomised controlled trials. More well designed research is required to support the tentative associations.

This HPA advice would not fare well in evidence terms because the best studies suggest that there are no health problems linked to handset exposure, and yet the advice is absolutely worth reporting.

Finally, this recent story, again by Ian Sample, is about a fascinating scientific advance that has allowed researchers to edit the genome of living animals (mice in this case) to correct mutations that cause an inherited blood disorder. The story says:

The work raises the prospect of powerful new therapies that can target and repair the genetic defects behind a wide range of human diseases that cannot be tackled with modern medicines.

This was high-quality research reported in Nature, but it would have languished at the bottom of the Goldacre/Sanders criteria along with quack claims about loganberries, because it stems from a study on animals.

Even though this technique is a long way off making it into the clinic, I would argue that it is an interesting and significant advance in gene therapy and is worthy of inclusion in a national newspaper.

The important point here is that if journalists adopted a self-imposed editorial ban on any scientific claim that did not fit into, say, the top two evidence categories used in the Goldacre/Sanders diet study, then fascinating and potentially important pieces of research would be invisible to most of the public.

I'm not arguing that journalists should cover every animal research paper or case-control study – far from it. But the exclusion of numerous stories about research at various stages on the evidential road would leave readers with little idea of the vast bulk of work scientists are doing.

Many people are interested in the process and findings of science and newspapers are right to feed that interest. Readers pay for much of this work through their taxes, after all, and the people who spend that money should talk openly about what they've done, how they've done it and what the implications are, and yes, even speculate on where things might go next.

If newspapers were to filter out all claims made by experts and all studies that aren't meta-analyses or gold standard clinical trials, then journalists would be helping to create a society where scientists are isolated from public scrutiny and people are even more ignorant of the process, potential and probable directions of science.

The Sanders/Goldacre study is an interesting first stab at quantifying the credibility of dietary health stories in the UK press, but it has some significant limitations that mean the headline figure cannot be relied on. I hope they and others build on it.

But anyone using that headline figure in the debate about the quality of media reporting on science and health should be aware of those limitations.

• Ben Goldacre responds to James Randerson's criticisms of his paper.