The controversies surrounding the effectiveness of cognitive behavioural therapy and graded exercise therapy for chronic fatigue syndrome are explained using Cohen’s d effect sizes rather than arbitrary thresholds for ‘success’. This article shows that the treatment effects vanish when switching to objective outcomes. The preference for subjective outcomes by the PACE trial team leads to false hope. This article provides a more realistic view, which will help patients and their doctors to evaluate the pros and cons.

The ‘PACE-gate’ editorial by Geraghty and the subsequent response by White et al. (Geraghty, 2016; White et al., 2017) made me smile and shake my head at the same time. White et al. (2007, 2011) deviated substantially from the trial protocol of their randomized controlled study on treatments for chronic fatigue syndrome (CFS). Geraghty argued that therefore the effects of cognitive behavioural therapy (CBT) and graded exercise therapy (GET) were overstated by the authors and in the press. These therapies were not curative and should be downgraded to adjunct support-level status. White et al. (2017) responded that Geraghty’s views are based on ‘misunderstandings and misrepresentations’, which they would ‘correct’.

In my opinion, White et al. have failed to show that Geraghty is wrong. They provided additional information on their trial and decisions and repeated their findings that CBT and GET are more effective than specialist medical care (SMC). They defended the use of these therapies with arguments based on a series of false dilemmas: treatments are either effective or ineffective; the result is either black or white; the opponents are wrong and they are right. Unfortunately, they have not shown how effective CBT and GET are. I believe this is the crucial point in the debate between Geraghty and White et al. Let us consider the shades of grey by studying Cohen’s d effect sizes.

Effect sizes I was not able to find the statistics behind the effect sizes that White et al. (2017) have reported in their response to Geraghty. Therefore, I re-computed Cohen’s d from data in their Lancet article, comparing CBT and GET versus SMC (White et al., 2011). The first issue I addressed was whether to calculate the effect sizes from the primary outcome variables described in the trial protocol (White et al., 2007) or from the primary outcome variables published in the final article (White et al., 2011). The trial protocol has received criticism since its publication, for example, in the online comment section accompanying that document. Given that White et al. have abandoned it themselves, it appears that neither the authors nor others have shown faith in the primary outcome variables defined in the trial protocol. I believe that some commentators, such as Geraghty, considered the trial protocol not because they actually supported it, but because they wanted to show the consequences of White et al.’s faux-pas to redefine outcomes during the trial. White et al.’s primary outcome variables were subjective fatigue and subjective physical functioning after 12 months. The trial protocol prescribed a 0011 coding scheme for the first primary outcome, which is the score on the Chalder fatigue scale, yet the authors switched to a 0123 coding scheme in the final article. My opinion is that, regardless of the coding scheme, the Chalder fatigue scale should be abandoned as a primary outcome. I refer the reader to my letter and its pre-publication history for more information (Stouten, 2005). For pragmatic reasons, I decided to use the 0123 coding scheme in my effect size analysis: the data are readily available from White et al., and it produces more precise results for fatigue than the 0011 scheme. The other primary outcome, subjective physical functioning, was measured using the physical functioning subscale of the Short Form 36. For this scale, there was no difference in scoring method between the trial protocol and final publication. Table 1 shows the effect of CBT and GET compared to SMC on the primary outcomes after 12 months. Cohen’s d varies between 0.45 and 0.48 for subjective fatigue and between 0.27 and 0.30 for subjective physical functioning. This indicates that the additional benefits of CBT and GET over SMC vary between small and medium, which contrasts with the positive stories in the press. Table 1. Treatment effect size for CBT and GET versus SMC after 12 months. View larger version

The more objective the outcome, the worse the result for CBT and GET Questionnaires that assess fatigue require the patient to rate subjective experiences, such as feeling tired, feeling weak and having not enough energy. In contrast, questionnaires assessing physical functioning ask the patient to estimate the ability to perform objective physical activities, such as dressing oneself, climbing the stairs and going for a walk. Consequently, the outcome of a questionnaire for physical functioning represents a more objective quantity than the outcome of a questionnaire for fatigue. As Table 1 shows, CBT and GET resulted in smaller effect sizes for physical functioning than fatigue. This leads to the interesting hypothesis that the effect size of CBT and GET reduces as the objectiveness of the outcome increases. To investigate this hypothesis, I added to the analysis the only objective test which I could find in White et al.’s study: the distance covered in a 6-minute walking test after 12 months. Table 1 shows the effect sizes. For GET, there was no difference in the results when using the data from the objective test of physical functioning. In other words, there was a small positive effect favouring GET over SMC. For CBT, the beneficial effect over SMC vanished when using the objective outcome measure. In other words, though patients think they are able to walk more after CBT, they fail to actually do so.

Conclusion: where to go from here? The results above lead me to conclude that White et al. systematically overestimate the effectiveness of CBT because they focus on subjective rather than objective outcomes. Their vigorous defence of their findings gives me the impression that they are not open to constructive criticism. This understanding is strengthened by their statement that Geraghty misunderstands and misrepresents their work, without providing sound evidence. I would appreciate a more constructive debate, where they attempt to understand why others do not share their views, and subsequently advance findings in this field in a more scientific way. Given the evidence that the objective improvements reported for CBT and GET are at most modest, I agree with Geraghty that these should be downgraded to adjunct support-level status. I presented three other cases where the trial protocols have been questioned. In the first example, the trial protocol was influenced by reviewers of the funding source. In the second example, the final analysis seems inconsistent with the trial protocol. In the third example, the authors agreed after publication of the final analysis that it would have been better to use a different coding scheme for the primary outcome. I believe that issues with a poor trial protocol cannot be solved within a study. Changing the protocol during the study is regarded as a faux-pas. On the other hand, continuing with a poor trial protocol is not helpful either. We have to await meta-studies for the final verdict, since these are allowed to deviate from the protocols of individual studies, and choose primary outcomes on their own. We are living in the era of Internet and big data, where information is more accessible than ever before. It is refreshing to see patients asking critical questions and claim access to data that are generated by publicly funded studies. I hope they will use parts of my contribution to further investigate PACE-gate and other CFS studies. I admire their perseverance and look forward to see their upcoming publications.

Acknowledgements The author wishes to thank Ms Kathy Rose Williams for proof reading the final submission.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article. Funding

The author(s) received no financial support for the research, authorship and/or publication of this article.