Trial By Error: The SMILE Trial’s Undisclosed Outcome-Swapping

By David Tuller, DrPH

So let’s talk about Professor Esther Crawley’s SMILE trial, published in September by the journal Archives of Disease in Childhood, one of the BMJ Publishing Group’s titles. The study reported that a commercial intervention called the Lightning Process was an effective treatment for children with CFS/ME when offered along with what was called “specialist medical care.”

SMILE was an open-label trial relying on subjective responses, a study design notoriously vulnerable to bias. In this case, self-reported physical function was the primary outcome, just as it was one of two primary outcomes in PACE. (The full name of the trial is: “Clinical and cost-effectiveness of the Lightning Process in addition to specialist medical care for paediatric chronic fatigue syndrome: randomised controlled trial.”)

The Lightning Process, which calls itself a “training programme,” is a goulash of osteopathy, life coaching, and neurolinguistic programming. Much of it consists of lessons and exercises involving positive affirmations; as with the form of cognitive behavior therapy used to treat ME/CFS, participants are told that they can overcome their illness by changing their thought patterns. Lightning Process practitioners have asserted that this approach has found success with a wide range of illnesses, including multiple sclerosis, eating disorders, and addiction, among others. But the U.K. Advertising Standards Authority has found such medical claims to be misleading and unsupported by the available evidence.

Phil Parker, the creator of the Lightning Process, appears to have been involved in some other interesting projects. One of these was something called the “European College of Holistic Medicine Healing Course,” which he co-taught. According to an archived website, this course was designed to help others develop their skills as healers. It included training, for example, on such modalities as “divination medicine cards and tarot,” since “divination is useful in creating a strong connection with healing/spirit guides.” It also trains student healers in how to “prepare a space appropriately so that any energy polluting the room will not interfere with the work you are doing” and in “the use of auras for diagnosis of a client’s problems.”

Here is Phil Parker’s biography from the same archived website: “Phil Parker is already known to many as an inspirational teacher, therapist, healer and author. His personal healing journey began when, whilst working with his patients as an osteopath. [Sentence glitch from the original.] He discovered that their bodies would suddenly tell him important bits of information about them and their past, which to his surprise turned out to be factually correct! He further developed this ability to step into other people’s bodies over the years to assist them in their healing with amazing results. After working as a healer for 20 years, Phil Parker has developed a powerful and magical program to help you unlock your natural healing abilities. If you feel drawn to these courses then you are probably ready to join.”

(If anything in the above two paragraphs is inaccurate, I urge Phil Parker to contact me by e-mailing Virology Blog at virology@virology.ws and I will immediately correct any documented errors.)

I will let others debate whether Professor Crawley should have received ethical approval to study Phil Parker’s trademarked Lightning Process in children. I want to discuss instead a methodological anomaly that conscientious investigators—not to mention responsible peer-reviewers and journal editors– would recognize as a terrific way to bias results. As is often the case, I can’t take credit for having noticed this problem myself. I was alerted to the issue by comments from some of the sharp-eyed sleuths on a patient forum.

This is another long and very complicated post. (Sorry!!) Here are the highlights:

*More than half the participants in the SMILE trial were apparently participants in an earlier feasibility trial. That means most if not all were recruited and provided data before the full-trial protocol was approved. Since SMILE lumped together these earlier data with those from participants recruited later, the full trial itself was not an independent investigation of the information generated by the feasibility trial.

*Based on the results of the feasibility trial, Professor Crawley swapped her primary and secondary outcome measures. The original primary outcome in the feasibility trial—school attendance at six months—was relegated to the status of a secondary outcome. The subjective measure of self-reported physical function, which was a secondary measure for the feasibility trial, became the primary outcome for the full trial. (In the full-trial protocol, self-reported fatigue was also listed as a primary outcome. For unexplained reasons, it was downgraded to a secondary outcome in the full-trial report.)

*Swapping the outcomes based on the feasibility study findings while simultaneously extending the feasibility study into the full study could easily have introduced significant bias in the final paper. How much bias cannot be ascertained at this point, since Professor Crawley has not provided a separate analysis of the feasibility study results for physical function and school attendance. That bias would have added to the bias already generated by the reliance in an open-label trial on a subjective outcome—self-reported physical function.

*Professor Crawley promised to seek verification of self-reported school attendance by requesting official school attendance records. Although she mentioned this in the protocols for both the feasibility trial and the full trial, these school records are not mentioned anywhere in the full-trial report. Nor did she discuss the feasibility of accessing these records in the logical place–the feasibility trial report. One possible and very logical conclusion is that she obtained these objective data but decided not to mention them because they did not provide optimal results.

*The trial registration, indicated that SMILE was a prospective trial. But the registration application date of June 7, 2012, coincided almost exactly with the end of the recruitment time frame for the feasibility trial, which provided more than half of those who ended up being included in the final sample. The full-trial paper did not mention that more than half the participants were from the feasibility study and that their data led to the decision to swap the outcomes. By definition, a prospective trial must not include data from previously assessed participants. If it does, it is obviously not a prospective trial.

*Based on the revised primary outcome of self-reported physical function, the full-trial paper reported that the Lightning Process combined with specialist medical care was effective in treating kids with CFS/ME. The full-trial paper also reported that school attendance at six months–the original primary outcome in the feasibility study—produced null results. Thus, the outcome-swapping that occurred after more than half the full-trial sample had already been recruited for the feasibility study allowed Professor Crawley to report more impressive results than had she retained the six-month school attendance measure as the primary outcome.

*Not surprisingly, media reports focused largely on the positive results for the self-reported physical function outcome and not the null results for the original primary outcome. Without the outcome-swapping that took place after more than half of the participants for the full-trial paper had provided data as part of the feasibility study, the final report would not have been able to present such an optimistic perspective.

*Given these major flaws and many additional problems cited by others, the inescapable conclusion is that the SMILE trial should never have been approved, much less published.

For understandable reasons, I have not contacted Professor Crawley to seek answers to my concerns about SMILE. But let me once again state very, very clearly that I would welcome Professor Crawley’s rebuttal. If she can document any errors or inaccuracies, I will of course correct them immediately. If she cannot document any errors or inaccuracies but simply objects to my tone and to my interpretation of the facts, I urge her to send me her response, at whatever length she chooses, and I will post it all on Virology Blog.

By the way, this offer also includes anything I have published about Professor Crawley’s work since my November 22, 2016, post called “The New FITNET Trial for Kids”—the one she featured in her slide about my “libellous blogs.”

**********

Now for the long version of the story.

In July, 2010, Professor Crawley submitted a protocol for a feasibility trial to assess the possibility of conducting a full trial on the use of the Lightning Process in kids. A feasibility trial is a pilot study designed to generate preliminary information upon which to base a larger investigation, if warranted. For this feasibility study, here is how Professor Crawley described the primary and secondary outcomes:

“The primary outcome measure for the interventions will be school attendance/home tuition at 6 months. Secondary outcome measures will be school attendance at 6 weeks, 3 months and 12 months; the SF36 (physical function) at 6 weeks, 3 months, 6 months and 12 months and pain visual analogue scale at 6 months.” (Our concern here is only with the school attendance and physical function measures.)

Self-reported school attendance can be highly influenced by inaccurate recall as well as other factors, so it is also arguably prone to significant bias, like self-reported physical function. Presumably that is why Professor Crawley added the following information about how school attendance would be measured:

“Children and young people are asked about school attendance and home tuition in a two item inventory. We will ask for consent to check school attendance using school records and will do this at assessment, 3 months 6 months and 12 months.” (The expected comma after “3 months” is not present in the feasibility study protocol.)

In other words, the protocol promised to make an effort to vet the self-reports of school attendance against actual data from the schools. This was a smart decision designed to help ensure the objectivity and accuracy of the reported findings. The protocol was approved.

According to the published feasibility trial, 56 participants were enrolled between September, 2010, and June, 2012. The paper described in detail the process of conducting the study and included discussion of recruitment efforts, attitudes toward the Lightning Process, and other issues related to the feasibility of pursuing a full trial. It did not include the quantitative results for school attendance, physical function and other outcome measures. Based on the preliminary findings, a protocol for a full study was written. This protocol was submitted for publication in December, 2012, to the journal Trials, which published it a year later.

In the new protocol, the primary and secondary outcomes were swapped. Self-reported physical function at six months was now a primary outcome. School attendance at six months was relegated to the status of a secondary outcome, along with school attendance at the three-month and 12-month assessment periods. The six-week assessment was dropped. As in the feasibility trial protocol, the protocol for the full trial promised that the investigators would attempt to access objective data from the schools. It included this sentence: “We will ask for consent to check school attendance using school records at assessment, 3, 6 and 12 months.” (Self-reported fatigue was also listed in the new protocol as another primary outcome, although for unexplained reasons it was downgraded to a secondary outcome in the SMILE trial report.)

Then Professor Crawley conducted the full study—except not exactly. What she did instead is seamlessly extend the feasibility study into the full study, folding in the results from these initial participants into the analysis for the final SMILE report in Archives of Disease in Childhood. There were 100 participants in the total sample for the full study. Given the number of participants in the feasibility study, presumably only 44 of those in the full study were enrolled after the protocol changes were approved. (I write “presumably” because the number is based solely on a simple calculation using the available information. If this calculation is wrong, it is not because of my rudimentary statistical training but because the published record is opaque on the entire matter.)

Why is that a problem? Well, it might not have been if Professor Crawley hadn’t swapped the outcome measures in the protocol for the full study–and then included in that study the feasibility trial participants whose results led to these major changes. The outcome-swapping exerted a big impact on how the findings were reported in the Archives of Disease in Childhood—which in turn exerted a major impact on how news media covered the study. In circumstances like this, the proper approach would have been to conduct a completely new trial in order to test whether the findings from the feasibility study could be verified and sustained with an independent sample of participants.

Moreover, the fact that this outcome-swapping occurred after more than half the total sample had already provided data for the feasibility trial was not mentioned in the full study itself. Readers would understandably draw the logical but apparently false conclusion that the results presented were all from patients recruited after the protocol changes for the full trial was approved, rather than that more than half of the results were data from the feasibility trial participants. And the standard presumption would be that Professor Crawley did not know the results of any of her subjects when the full trial began. In fact, that wasn’t the case, since the protocol changes were based on the experience of the feasibility trial participants.

I am assuming that Professor Crawley actually scrutinized the feasibility trial’s quantitative results for school attendance and physical function before swapping the outcomes, even though I cannot find a direct statement as to whether she did or did not. Perhaps she will claim, as did the PACE authors, that the outcome-swapping occurred before any quantitative outcome data were reviewed and hence were “pre-specified.” This argument would strain credulity, to say the least. A draft of the feasibility paper was submitted along with the application seeking ethical approval to swap the outcome measures and extend the trial rather than start a new one from scratch. It would be unusual for such a paper to be written without checking the quantitative results.

But even if Professor Crawley relied only on recruitment results and qualitative interviews conducted for the feasibility study to write the feasibility trial paper, she as well as the participants knew who was in which arm of the trial. In such circumstances, investigators are generally aware of trends in the results for subjective measures even before looking at any quantitative data at all. So Professor Crawley would likely have recognized that swapping the outcome measures could produce more impressive-looking results.

The full-trial protocol itself also featured multiple phrases suggesting that Professor Crawley would conduct a completely independent prospective trial. It included, as one of many examples, the following sentence: “Children and young people aged 12 to 18 years inclusive will be recruited after assessment by the Bath/Bristol paediatric CFS/ME service.” And this: “Potentially eligible children and their families will be identified by the clinician conducting the initial clinical assessment who will inform them about the study.” And this: “Allowing for 10 to 20% non collection of primary outcome data at six months, we aim to recruit 80 to 112 participants to the study.”

The full-trial protocol did note that “this trial continues from the SMILE feasibility trial” and that the analysis of qualitative interview data, in particular, “will continue from the feasibility study.” A small note at the end of the protocol stated that “full trial randomisation after conversion to full trial” began on September 19, 2012. But the language throughout the protocol certainly would have implied to any reasonable reader that actions involved in pursuing the full trial would take place in the future—not that past participants from the feasibility study would constitute the majority of the full-trial sample.

Moreover, the trial registration with the ISRCTN registry, with an application date of June 7th, 2012, referred to SMILE as “prospectively registered.” It listed the start date of the trial as August 1st, 2012—that is, after the end of the feasibility study. The registration document noted the plan to convert the feasibility study into the full trial, but that declaration flatly contradicted the claim that it would be a prospective trial. By definition, a prospective trial cannot include previously assessed participants. After all, that’s why it’s called a prospective trial.

Since it is not apparent from either the full-study protocol or the SMILE trial paper itself that more than half the subjects were retroactively enrolled as participants, how did this peculiarity come to light? Well, when the study was published, the Phoenix Rising squad of bulltwaddle-detectors sprang into action. Forum members quickly noticed something strange. The SMILE paper reported that the study began in September, 2010—the same month as the reported start of the feasibility trial. (Many of these bulltwaddle-detectors have since moved to another forum, Science for ME.)

In other words, the September, 2010, start date indicated that the full study officially began almost two years before Professor Crawley wrote the protocol upon which the final analyses were based and almost two years before the start date listed in the trial registration. It is rather unusual to have two different start dates for the same trial, or to start a trial years before the protocol has been written.

Bruce Levin, the Columbia biostatistics professor who called the PACE trial “the height of clinical trial amateurism,” expressed strong opinions about this strategy. It is impermissible to lump participants from both a feasibility study and a full study into one sample when outcomes have been changed mid-stream, he said. That would apply even if the changes only involved swaps in the primary and secondary outcomes but not alterations in the assessment methods. The only exception, he added, would be if the method of combining these data had been outlined in the feasibility trial protocol. (Professor Levin did not specifically analyze the SMILE trial.)

Here’s what Professor Levin told me:

The problem with folding pilot data into a following study is that serious bias can arise if any decisions or analytic choices were made on the basis of the pilot data. Another way to say the same thing is that the purpose of a follow-up study is to replicate a pilot observation. A replication must be independent of the results of the study that suggested the finding in the first place.

The reason I mention “decisions” is that an enormous latitude exists when looking at pilot data. Should we analyze this endpoint measure or that one? Should we look at this follow-up period or that one? Should we use this enrollment or exclusion criterion or that one? Even if we retain the same assessment methods, should we swap our primary and secondary outcomes? And so on and on. This is all legitimate inquiry for the pilot. But unless the feasibility trial protocol pre-specifies the methods to be used for combining the evidence and guarantees they are statistically valid, one must NEVER use those data together with the data from the confirmatory study whose design was based on the pilot results. A.K.A. bias!

When Professor Levin says “NEVER,” he means NEVER. (He himself capitalized the word in his e-mail.) Yet that’s what Professor Crawley did, despite the likelihood of generating a self-fulfilling prophecy. The feasibility study protocol did not mention or anticipate the possibility of future outcome-swapping and did not pre-specify the methods for combining the feasibility trial data with the data collected later.

Perhaps it should not then be a surprise that Professor Crawley was able to report in the full study, based on the revised primary outcome of self-reported physical function, that the Lightning Process was effective when delivered along with specialist medical care. Perhaps it should also not be a surprise that, in the full study, the original primary outcome measure—school attendance at six months—demonstrated no statistically significant effects from the intervention.

But school attendance at six months was now officially a secondary outcome. Neither Professor Crawley nor the press reports highlighted the outcome-swapping and focused on the fact that the original primary outcome produced null results. In contrast, Professor Crawley was able to report that those who received the Lightning Process had better school attendance at 12 months—a fact that was cited in the study abstract and mentioned in news coverage of the trial. However, the full-trial report indicated that no more than 70 out of the 100 participants provided data for the 12-month school attendance measure—a fact not included in the study abstract. Without knowing what happened with the other 30 participants, it is hard if not impossible to interpret the purported improvement touted for this measure.

Significantly, the full trial did not provide separate analyses for those recruited for the feasibility study and those recruited after the full study protocol was written. So readers have no way of knowing the results for physical function and school attendance for just the feasibility study participants, and therefore no way to assess the extent to which these initial findings impacted the analyses presented in the final report. In other words, Professor Crawley appears to have designed and written up the full study in a way that could clearly have biased her findings and allowed her to report positive results.

Moreover, Professor Crawley described the study in a way that seems to have obscured what actually happened. Here’s how the final paper explained the sequence of events: “Having shown that recruitment, randomisation and data collection were feasible and acceptable, we conducted a randomised trial to investigate the effectiveness and cost-effectiveness of LP in addition to specialist medical care (SMC), compared with SMC alone, for children with CFS/ME.”

This phrasing leaves readers with the false impression that the full trial was conducted entirely after the feasibility study. Whatever explanation Professor Crawley might offer, the SMILE trial’s flawed approach to data analysis and the evident lack of transparency in the description of what occurred represent serious contraventions of basic scientific principles. Given that Professor Crawley has argued in high-profile presentations that many freedom of information requests for data are “vexatious,” it is possible or even likely that anyone interested in conducting separate analyses of the results from the feasibility trial and those from participants recruited afterwards will have a hard time accessing the necessary information.

In fairness to Professor Crawley, she sought ethical approval for her unorthodox maneuver. She filed an application for a substantial amendment to her initial protocol in early August, 2012, and received approval not long after. To figure out what exactly had happened, I filed a freedom of information request with the regional ethical review board to obtain the application for this protocol amendment. This document outlined the request to extend the feasibility study into the full study and also to swap the primary and secondary outcomes.

In seeking to convert the feasibility trial into the full trial, the application declared that “recruitment is slower than expected.” And here’s how the application explained the downgrading of the six-month school attendance measure from a primary to a secondary outcome:

“The reason for this is that many of the participants are transitioning from GCSEs to A levels in this study and therefore % of school attendance does not necessarily reflect illness severity. For example, a teenager may have decided to take 2 A levels and be attending school for 2-3 hours a day. This would be recorded as 100% school attendance but this does not equate to 6.5 hours a day of normal school attendance.”

I’m not familiar enough with the British educational system to grasp whether this explanation makes sense. But in any event, as Professor Levin noted, the point of a full study is to test preliminary findings from a feasibility study with a larger sample of participants. And as he also pointed out, it is unacceptable to change or swap outcome measures and then pool the earlier and later sets of data without a pre-specified plan on the statistical methodology for doing so.

So why did the ethics committee—in this case, only two committee members attended the meeting—simultaneously approve both the swap in outcome measures and the extension of the feasibility study? Who knows? It is obvious from the entire PACE saga that something is broken in the U.K.’s ethical review and peer-review systems, at least when it comes to this particular illness. After all, both The Lancet and Psychological Medicine published PACE papers in which key outcome thresholds represented worse health status than the entry thresholds designated as demonstrating significant disability—a perplexing dereliction of editorial responsibility that the journals have compounded by continuing to refuse to address the issue.

So the fact that Professor Crawley received ethical approval and was able to find a journal to publish her results does not surprise me. That it would be a journal from the BMJ Publishing Group, which has a sorry history of accepting problematic studies from Professor Crawley and her comrades in the CBT/GET ideological brigades, could also have been predicted.

But that’s not all. Remember how Professor Crawley promised in both protocols to make an effort to check the self-report measure of school attendance against official school records? In fact, the application for extending the feasibility study while swapping the outcomes featured a draft of the letter to be sent to school officials.

That letter included the following: “Both the patient and their parent/s or guardian have provided us with written consent to participate in the study which is kept in their medical notes. As part of their consent, they have given us permission to check their school attendance record during their involvement with the study. This is important information as it is the principle outcome measure for the study.” (Of course, school attendance was no longer the “principle outcome measure,” but let’s put that inaccuracy aside.)

These school records and the “important information” they were designed to elicit were not mentioned in the feasibility study report. Nor were they mentioned in the full-trial study report. Either the letters were not sent, or they were sent and Professor Crawley decided for whatever reason to ignore the data she received. In either case, she had an obligation to explain what happened, given her protocol promises. Moreover, the peer reviewers and editors for Archives of Disease in Childhood had an obligation to ask why the full-trial paper excluded this key outcome measure. One logical explanation for the absence of these data is that Professor Crawley performed this review of school records but did not cite the findings because they proved to be worse than the positive outcomes for self-reported school attendance at 12 months.

Jonathan Edwards, an emeritus professor of medicine at University College London, has called the PACE trial “a mass of uninterpretability.” Here’s what he had to say about the SMILE trial and its reported results:

[*The following quote from Professor Edwards replaced an earlier version on December 15th, 2017. It appears that transatlantic miscommunication led me to include the wrong version of his quote. At the end of this post, I have provided the quote as it initially appeared.]

All it shows is that whether you call it ‘CBT’ or ‘Lightning Process’ or ‘Brainwashing Therapy,’ if you tell people that saying they are better when they are asked if they are better will actually make them better, and then you ask them if they are better, some will say they are better, if only to avoid hassle.”

Other critics have been equally scathing. Here’s what the ME Association had to say about the issue: “The SMILE trial is one of the worst examples of a clinical trial supposedly designed to assess the acceptability, effectiveness and safety of a treatment for ME/CFS…In fact, in several ways it is a lesson in how not to conduct a clinical trial in people who have ME/CFS.”

As for me, I’ve been advised by a wise colleague that “taking the high road” is the best approach. So I’ll end on this positive note: Print-outs of the SMILE trial are excellent new props for future engagements on my “tear-it-up” performance art tour. Professor Crawley deserves my deepest thanks for providing me with such a gift.

[*For full transparency, here is the version of Professor Edwards’ quote that appeared in the original post on December 13th: “All it shows is that whether you call it ‘CBT’ or ‘Lightning Process’ or ‘Brainwashing Therapy,’ if you tell people they will get better and then ask if they are better, some will say they are better, if only to avoid hassle.”]