Trial By Error, Continued: Did the PACE Study Really Adopt a ‘Strict Criterion’ for Recovery?

By David Tuller, DrPH

David Tuller is academic coordinator of the concurrent masters degree program in public health and journalism at the University of California, Berkeley.

First, some comments: When Virology Blog posted my very, very, very long investigation of the PACE trial two weeks ago, I hoped that the information would gradually leak out beyond the ME/CFS world. So I’ve been overwhelmed by the response, to say the least, and technologically unprepared for my viral moment. I didn’t even have a photo on my Twitter profile until yesterday.

Given the speed at which events are unfolding, I thought it made sense to share a few thoughts, prompted by some of the reactions and comments and subsequent developments.

I approached this story as a journalist, not an academic. I read as much as I could and talked to a lot of people. I did not set out to write the definitive story about the PACE trial, document every single one of its many oddities, or credit everyone involved in bringing these problems to light. My goal was to explain what I recognized as some truly indefensible flaws in a clear, readable way that would resonate with scientists, public health and medical professionals, and others not necessarily immersed in the complicated history of this terrible disease.

To do that most effectively and maximize the impact, I had to find a story arc, some sort of narrative, to carry readers through 14,000 words and many dense explanations of statistical and epidemiologic concepts. After a couple of false starts, I settled on a patient and advocate, Tom Kindlon, as my “protagonist”—someone readers could understand and empathize with. Tom is smart, articulate, and passionate about good science–and he knows the PACE saga inside out. He was a terrific choice whose presence in the story, I think, made reading it a lot more bearable.

That decision in no way implied that Tom was the only possible choice or even the best possible choice. I built my work on the work of others, including many that James Coyne recently referred to as “citizen-scientists.” Tom’s dedication to tracking and critiquing the research has been heroic, given his health struggles. But the same could be said, and should be said, of many others who have fought to raise awareness about the problems with PACE since the trial was announced in 2003.

The PACE study has generated many peer-reviewed publications and a healthy paper trail. My account of the story, notwithstanding its length, has significant gaps. I haven’t finished writing about PACE, so I hope to fill in some of them myself—as with today’s story on the 2011 Lancet commentary written by colleagues of Peter White, the lead PACE investigator. But I have no monopoly on this story, nor would I want one—the stakes are too high and too many years have already been wasted. Given the trial’s wealth of problems and its enormous influence and ramifications, there are plenty of PACE-related stories left for everyone to tackle.

I am, obviously, indebted to Tom—for his good humor, his willingness to trust me given so many unfair media portrayals of ME/CFS, and his patience when I peppered him with question after question via Facebook, Twitter, and e-mail.

I am also indebted to my friend Valerie Eliot Smith. We met when I began research on this project in July, 2014; since then, she has become an indispensible resource, offering transatlantic support across multiple domains. Valerie has given me invaluable legal counsel, making sure that what I was writing was verifiable and, just as important, defendable—especially in the U.K. (I don’t want to know how many billable hours she has invested!) She has provided keen strategic advice. She has been a terrific editor, whose input greatly improved the story’s flow and readability. She has done all this, I realize, at some risk to her own health. I am lucky she decided to join me on this unexpected journey.

I would like to thank, as well, Dr. Malcolm Hooper, Margaret Williams, Dr. Nigel Speight, Dr. William Weir, Natalie Boulton, Lois Addy, and the Countess of Mar for their help and hospitality while I was in England researching the story last year. I will always cherish the House of Lords plastic bag that I received from the Countess. (The bag was stuffed with PACE-related reports and documents.)

So far, Richard Horton, the editor of The Lancet, has not responded to the criticisms documented in my story. As for the PACE investigators, they provided their own response last Friday on Virology Blog, followed by my rebuttal.

In seeking that opportunity for the PACE investigators to respond, a public relations representative from Queen Mary University of London, or QMUL, had approached Virology Blog. In e-mails to Dr. Racaniello, the public relations representative had suggested that “misinformation” and “inaccuracies” in my article had triggered social media “abuse” and could cause “reputational damage.”

These are serious charges, not to be taken lightly. Last Friday’s exchange has hopefully put an end to such claims. It seems unlikely that calling rituximab an “anti-inflammatory” rather than an “immunomodulatory” drug would trigger social media abuse or cause reputational damage.

Last week, in an effort to expedite Virology Blog’s publication of the PACE investigators’ response, the QMUL public relations representative further charged that I had not sought their input before the article was posted. This accusation goes to the heart of my professional integrity as a journalist. It is also untrue—as the public relations representative would have known had he read my piece or talked to the PACE investigators themselves. (Whether earlier publication of their response would have helped their case is another question.)

Disseminating false information to achieve goals is not usually an effective PR strategy. I have asked the QMUL public relations representative for an explanation as to why he conveyed false information to Dr. Racaniello in his attempt to advance the interests of the PACE investigators. I have also asked for an apology.

Since 2011, the PACE investigators have released several papers, repeatedly generating enthusiastic news coverage about the possibility of “recovery”–coverage that has often drawn conclusions beyond what the publications themselves have reported.

The PACE researchers can’t control the media and don’t write headlines. But in at least one case, their actions appeared to stimulate inaccurate media accounts–and they made no apparent effort immediately afterwards to correct the resulting international coverage. The misinformation spread to medical and public health journals as well.

(I mentioned this episode, regarding the Lancet “comment” that accompanied the first PACE results in 2011, in my excruciatingly long series two weeks ago on Virology Blog. However, that series focused on the PACE study, and the comment itself raised additional issues that I did not have the chance to explore. Because the Lancet comment had such an impact on media coverage, and ultimately most likely on patient care, I felt it was important to return to it.)

The Lancet comment, written by Gils Bleijenberg and Hans Knoop from the Expert Centre for Chronic Fatigue at Radboud University Nijmegen in the Netherlan was called “Chronic fatigue syndrome: where to PACE from here?” It reported that 30 percent of those receiving the two rehabilitative interventions favored by the PACE investigators–cognitive behavior therapy and graded exercise therapy–had “recovered.” Moreover, these participants had “recovered” according to what the comment stated was the “strict criterion” used by the PACE study itself.

Yet the PACE investigators themselves did not make this claim in their paper. Rather, they reported that participants in the two rehabilitative arms were more likely to improve and to be within what they referred to as “the normal range” for physical function and fatigue, the study’s two primary outcome measures. (“Normal range” is a statistical concept that has no inherent connection to “normal functioning” or “recovery.” More on that below.)

In addition, the comment did not mention that 15 percent of those receiving only the baseline condition of “specialist medical care” also “recovered” according to the same criterion. Thus, only half of this 30 percent “recovery” rate could actually be attributed to the interventions.

The PACE investigators themselves reviewed the comment before publication.

Thanks to this inaccurate account of the PACE study’s reported findings, the claim of a 30 percent “recovery” rate dominated much of the news coverage. Trudie Chalder, one of the key PACE investigators, reinforced the message of the Lancet comment when she declared at the press conference announcing the PACE results that participants in the two rehabilitative interventions got “back to normal.”

Just as the PACE paper did not report that anyone had “recovered,” it also did not report that anyone got “back to normal.”

Three months later, the PACE authors acknowledged in correspondence in The Lancet that the paper did not discuss “recovery” at all and that they would be presenting “recovery” data in a subsequent paper. They did not explain, however, why they had not taken earlier steps to correct the apparently inaccurate news coverage about how patients in the trial had “recovered” and gotten “back to normal.”

*****

It is not unusual for journals, when they publish studies of significance, to also commission commentaries or editorials that discuss the implications of the findings. It is also not unusual for colleagues of a study’s authors to be asked to write such commentaries. In this case, Bleijenberg and Knoop were colleagues of Peter White, the lead PACE investigator. In 2007, the three had published, along with two other colleagues, a paper called “Is a full recovery possible after cognitive behavior therapy for chronic fatigue syndrome?” in the journal Psychotherapy and Psychosomatics.

(In their response last Friday to my Virology Blog story, the PACE investigators noted that they had published a “correction” to clarify that the 2011 Lancet paper was not about “recovery”; presumably, they were referring to the Lancet correspondence three months later. In their response to Virology Blog, they blamed the misconception on an “editorial…written by others.” But they did not mention that those “others” were White’s colleagues. In their response, they also did not explain why they did not “correct” this “recovery” claim during their pre-publication review of the comment, nor why Chalder spoke at the press conference of participants getting “back to normal.”)

In the Lancet comment, Bleijenberg and Knoop hailed the PACE team for its work. And here’s what they wrote about the trial’s primary outcome measures for physical function and fatigue: “PACE used a strict criterion for recovery: a score on both fatigue and physical function within the range of the mean plus (or minus) one standard deviation of a healthy person’s score.”

This statement was problematic for a number of reasons. Given that the PACE paper itself made no claims for “recovery,” Bleijenberg and Knoop’s assertion that it “used” any criterion for “recovery” at all was false. The PACE study protocol had outlined four specific criteria that constituted what the investigators referred to as “recovery.” Two of them were thresholds on the physical function and fatigue measures, but the Lancet paper did not present data for the other criteria and so could not report “recovery” rates.

Instead, the Lancet paper reported the rates of participants in all the groups who finished the study within what the researchers referred to as “the normal ranges” for physical function and fatigue. But as noted immediately by some in the patient community, these “normal ranges” featured a bizarre paradox: the thresholds for being “within the normal range” on both the physical function and fatigue scales indicated worse health than the entry thresholds required to demonstrate enough disability to qualify for the trial in the first place.

*****

To many patients and other readers, for the Lancet comment to refer to “normal range” scales in which entry and outcome criteria overlapped as a “strict criterion for recovery” defied logic and common sense. (According to data not included in the Lancet paper but obtained later by a patient through a freedom-of-information request, 13 percent of the total sample was already “within normal range” for physical function, fatigue or both at baseline, before any treatment began.)

In the Lancet comment, Bleijenberg and Knoop also noted that these “normal ranges” were based on “a healthy person’s score.” In other words, the “normal ranges” were purportedly derived from responses to the physical function and fatigue questionnaires by population-based samples of healthy people.

But this statement was also at odds with the fact. The source for the fatigue scale was a population of attendees at a medical practice—a population that could easily have had more health issues than a sample from the general population. And as the PACE authors themselves acknowledged in the Lancet correspondence several months after the initial publication, the SF-36 population-based scores they used to determine the physical function “normal range” were from an “adult” population, not the healthier, working-age population they had inaccurately referred to in The Lancet. (An “adult” population includes the elderly.)

The Lancet has never corrected this factual mistake in the PACE paper itself. The authors had described—inaccurately–how they derived a key outcome for one of their two primary measures. This error indisputably made the results appear better than they were, but only those who scrutinized the correspondence were aware of this discrepancy.

The Lancet comment, like the Lancet paper itself, has also never been corrected to indicate that the source population for the SF-36 responses was not a “healthy” population after all, but an “adult” one that included many elderly. The comment’s parallel claim that the source population for the fatigue scale “normal range” was “healthy” as well has also not been corrected.

Richard Horton, the editor of The Lancet, did not respond to a request for an interview to discuss whether he agreed that the “normal range” thresholds represented “a strict criterion for recovery.” Peter White, Trudie Chalder and Michael Sharpe, the lead PACE investigators, and Gils Bleijenberg, the lead author of the Lancet comment, also did not respond to requests for interviews for this story.

*****

How did the PACE study end up with “normal ranges” in which participants could get worse and still be counted as having achieved the designated thresholds?

Here’s how: The investigators committed a major statistical error in determining the PACE “normal ranges.” They used a standard statistical formula designed for normally distributed populations — that is, populations in which most people score somewhere in the middle, with the rest falling off evenly on each side. When normally distributed populations are graphed, they form the classic bell curve. In PACE, however, the data they were analyzing was far from normally distributed. The population-based responses to the physical function and fatigue questionnaires were skewed—that is, clustered toward the healthy end rather than symmetrically spread around a mean value.

With a normally distributed set of data, a “normal range” using the standard formula used in PACE—taking the mean, plus/minus one standard deviation–contains 68 percent of the values. But when the values are clustered toward one end, as in the source populations for physical function and fatigue, a larger percentage ends up being included in a “normal range” calculated using this same formula. Other statistical methods can be used to calculate 68 percent of the values when a dataset does not form a normal distribution.

If the standard formula is used on a population-based survey with scores clustered toward the healthier end, the result is an expanded “normal range” that pushes the lower threshold even lower, as happened with the PACE physical function scale. And in PACE, the threshold wasn’t just low–it was lower than the score required for entry into the trial. This score, of course, already represented severe disability, not “recovery” or being “back to normal”—and certainly not a “strict criterion” for anything.

Bleijenberg and Knoop, the comment authors, were themselves aware of the challenges faced in calculating accurate “normal ranges,” since the issue was addressed in the 2007 paper they co-wrote with Peter White. In this paper, White, Bleijenberg, and Knoop discussed the concerns related to determining a “normal range” from population data that was heavily clustered toward the healthy end of the scale. The paper noted that using the standard formula “assumed a normal distribution of scores” and generated different results under the “violation of the assumptions of normality.”

*****

Despite the caveats the three scientists included in this 2007 paper, Bleijenberg and Knoop’s 2011 Lancet comment did not mention these concerns about distortion arising from applying the standard statistical formula to values that were not normally distributed. (White and his colleagues also did not mention this problem in the PACE study itself.)

Moreover, the 2007 paper from White, Bleijenberg, and Knoop had identified a score of 80 on the SF-36 as representing “recovery”—a much higher “recovery” threshold than the SF-36 score of 60 that Bleijenberg and Knoop now declared to be a “strict criterion” In the Lancet comment, the authors did not mention this major discrepancy, nor did they explain how and when they had changed their minds about whether an SF-36 score of 60 or 80 best represented “recovery.” (In 2011, White and his colleagues also did not mention this discrepancy between the score for “recovery” in the 2007 paper and the much lower “normal range” threshold in the PACE paper.)

Along with the PACE paper, The Lancet comment caused an uproar in the patient and advocacy communities–especially since the claim that 30 percent of participants in the rehabilitative arms “recovered” per a “strict criterion” was widely disseminated.

The comment apparently caused some internal consternation at The Lancet as well. In an e-mail to Margaret Williams, the pseudonym for a longtime clinical manager in the National Health Service who had complained about the Lancet comment, an editor at the journal, Zoe Mullan, agreed that the reference to “recovery” was problematic.

“Yes I do think we should correct the Bleijenberg and Knoop Comment, since White et al explicitly state that recovery will be reported in a separate report,” wrote Mullan in the e-mail. “I will let you know when we have done this.”

No correction was made, however.

*****

In 2012, to press the issue, the Countess of Mar pursued a complaint about the comment’s claim of “recovery” with the (now-defunct) Press Complaints Commission, a regulatory body established by the media industry that was authorized to investigate the conduct of news organizations. The countess, who frequently championed the cause of the ME/CFS patient community in Parliament’s House of Lords, had long questioned the scientific basis of support of cognitive behavior therapy and graded exercise therapy, and she believed the Lancet’s comment’s claims of “recovery” contradicted the study itself.

In defending itself to the Press Complaints Commission, The Lancet acknowledged the earlier suggestion by a journal editor that the comment should be corrected.

“I can confirm that our editor of our Correspondence section, Zoe Mullan, did offer her personal opinion at the time, in which she said that she thought that we should correct the Comment,” wrote Lancet deputy editor Astrid James to the Press Complaints Commission, in an e-mail.

“Zoe made a mistake in not discussing this approach with a more senior member of our editorial team,” continued James in the e-mail. “Now, however, we have discussed this case at length with all members of The Lancet’s senior editorial team, and with Zoe, and we do not agree that there is a need to publish a correction.”

The Lancet now rejected the notion that the comment was inaccurate. Despite the explicit language in the comment identifying the “normal range” thresholds as the PACE trial’s own “strict criterion for recovery,” The Lancet argued in its response to the Press Complaints Commission that the authors were only expressing their personal opinion about what constituted “recovery.”

In other words, according to The Lancet, Bleijenberg and Knoop were not describing—wrongly–the conclusions of the PACE paper itself. They were describing their own interpretation of the findings. Therefore, the comment was not inaccurate and did not need to be corrected.

(In its response to the Press Complaints Commission, The Lancet did not explain why thresholds that purportedly represented a “strict criterion for recovery” overlapped with the entry criteria for disability.)

*****

The Press Complaints Commission issued its findings in early 2013. The commission agreed with the Countess of Mar that the statement about “recovery” in the Lancet comment was inaccurate. But the commission gave a slightly different reason. The commission accepted the Lancet’s argument that Bleijenberg and Knoop were trying to express their own opinion. The problem, the commission ruled, was that the comment itself didn’t make that point clear.

“The authors of the comment piece were clearly entitled to take a view on how “recovery” should be defined among the patients in the trial,” wrote the commission. However, continued the decision: “The authors of the comment had failed to make clear that the 30 per cent figure for ‘recovery’ reflected their view that function within “normal range’ was an appropriate way of ‘operationalising’ recovery–rather than statistical analysis by the researchers based on the definition for recovery provided. This was a distinction of significance, particularly in the context of a comment on a clinical trial published in a medical journal. The comment was misleading on this point and raised a breach of Clause 1 (Accuracy) of the Code.”

However, this determination seemed based on a msreading of what Bleijenberg and Knoop had actually written: “PACE used a strict criterion for recovery.” That phrasing did not suggest that the authors were expressing their own opinion about “recovery.” Rather, it was a statement about how the PACE study itself purportedly defined “recovery.” And the statement was demonstrably untrue.

Compounding the confusion, the Press Complaints Commission decision noted that the Lancet comment had been discussed with the PACE investigators prior to publication. Since the phrase “strict criterion for recovery” had thus apparently been vetted by the PACE team itself, it remained unclear why the commission determined that Bleijenberg and Knoop were only expressing their own opinion.

The commission’s response left other questions unanswered. The commission noted that the Countess had pointed out that the “recovery” score for physical function cited by the commenters was lower than the score required for entry. Despite this obvious anomaly, the commission did not indicate whether it had asked The Lancet or Bleijenberg and Knoop to explain how such a nonsensical scale could be used to assess “recovery.”.

*****

Notwithstanding the inaccuracy of the Lancet comment’s “recovery” claim, the commission also found that the journal had already taken “sufficient remedial action” to rectify the problem. The commission noted that the correspondence published after the trial had provided a prominent forum to debate concerns over the definition of “recovery.” The decision also noted that the PACE authors themselves had clarified in the correspondence that the actual “recovery” findings would be published in a subsequent paper.

In ruling that “sufficient remedial action” had already been taken, however, the commission did not mention the potential damage that already might have been caused by this inaccurate “recovery” claim. Given the comment’s declaration that 30 percent of participants in the cognitive behavior and graded exercise therapy arms had “recovered” according to a “strict criterion,” the message received worldwide dissemination—even though the PACE paper itself made no such claim.

Medical and public health journals, conflating the Lancet comment and the PACE study itself, also transmitted the 30 percent “recovery” rate directly to clinicians and others who treat or otherwise deal with ME/CFS patients.

The BMJ referred to the approximately 30 percent of patients who met the “normal range” thresholds as “cured.” A study in BMC Health Services Research cited PACE as having demonstrated “a recovery rate of 30-40%”—months after the PACE authors had issued their “correction” that their paper did not report on “recovery” at all. (Another mystery about the BMC Health Services Research report is the source of the 40 percent figure for “recovery.”) A 2013 paper in PLoS One similarly cited the PACE study—not the Lancet comment—and noted that 30 percent achieved a “full recovery.”

Given that relapsing after too much exertion is a core symptom of the illness, it is impossible to calculate the possible harms that could have arisen from this widespread dissemination of misinformation to health care professionals—all based on the flawed claim from the comment that 30 percent of participants had recovered according to the PACE study’s “strict criterion for recovery.”

And that “strict criterion,” it should be remembered, allowed participants to get worse and still be counted as better.