When virtually all of your data from your study show negatives — e.g., people are getting worse at the task — what does a good researcher do?

Show your intervention helped people decline less than nothing at all. Then ensure the news release talks about “improvements” in these cognitive training tasks — echoing the language you used in your study.

Welcome to the wonderful world of cognitive training, where squishy “neuroscience” means drawing every last bit of statistical significance from your data… even when it has little meaning for real-world impact.

First, let me start by saying this is a well-designed and fairly rigorous scientific study. It has a great number of subjects that is representative of the older American population from six different metropolitan areas. A total of 2,832 participants began the study, with an average age of 73.6; 76 percent were female. At the ten year followup, 1,220 subjects were still able to be assessed.

Here’s what the researchers said they found in their data:

Each training intervention produced large and significant improvements in the trained cognitive ability. [Emphasis added.]

Most of us understand an improvement is something better than what we started with (at baseline). If we start at a score with 100, an improvement to most of us is going to 110, or 120. Let’s look at the raw means reported in the study to give this hypothetical some legs:

Whoa, wait a minute. Look at all those negatives! Not a single improvement in this group… Worse, the control group for the memory task lost the least amount of mean points: -9.4 versus -10.6 for the memory-training group (the higher the score, or the less points you lose, the better). Sorry, I’m not seeing the “improvement” here.

That’s okay, though, because that’s not where the real problems lie. The researchers pretty clearly and reliably demonstrate that specific cognitive training tasks in an elderly population seem to maintain their impact 10 years later, at least for two of the cognitive tasks.

Here’s the real problem:

After 10 years, participants in each of the training groups reported less difficulty in conducting activities of daily living than those in the control groups.

Based upon this data, I’d say that’s a stretch (at best)(click to enlarge table in new window):

What the highlights show is that the measures of daily living for everyday problem solving and speed of processing — even in the speed training group! — did not improve significantly over the control group (the last column of data listed; the first 3 columns are the intervention groups).

Where the researchers show a bigger effect size is on the “instrumental activity of daily living difficulty” measure. This is a measure of 19 daily tasks people perform regularly, and whether a person can perform the task with no difficulty, some difficulty (and needs assistance), or great difficulty. You want something as close to 0 as possible for this score.

At baseline, all four groups scored around a -1.0. Ten years later, the mean scores ranged from -3.4, -4.1, and -4.1 (in the intervention groups) to -4.5 (the control group). A 0.4 or even 1.1 score difference on this measure is clinically meaningless — it is basically the same, since a 1 point difference on a 38-point scale doesn’t really translate into anything different in the real world.

You can trumpet that this is a statistically significant difference. But it has no connection to reality.

To show how default graph settings and graphing itself can have an effect on your perception of the data, let’s take a look at the Mean IADL Difficulty Score, a 38-point scale in real life. Here’s Figure 3 reproduced as it appears in the study (click for larger image):





Wow, look at the decline in Year 10. That’s pretty dramatic, with the control group clearly in the most trouble.

But let’s look at those same data plotted on a scale that shows all 38 points, so you get a better perspective of the data (click to enlarge):





Suddenly the data don’t look all that different. That’s the point — in the real world, they aren’t. A one point difference simply isn’t meaningful here.

When is a Decline an “Improvement?”

In the entirety of Table 2, where 18 mean scores are reported for the 3 intervention groups, there are only two scores that showed actual improvement from baseline — the speed training group on the speed training task, and the memory group on everyday speed of processing. Every other score was a decline from baseline.

The researchers had no explanation why the control group suffered the least decline from baseline in the memory test. Memory is thought to be one of the most easy-to-improve cognitive skills, so this is a surprising result.

The one truly robust finding from the study is that the speed of processing training actually resulted in the most significant — and true — improvements among the groups. The effect size of 0.66 means this is a pretty solid datapoint. Speed of processing requires identification and location of information with 75 percent accuracy.

A secondary finding was that the reasoning task almost kept people at their baseline mean scores at the 10 year followup, also suggesting that engaging in regular reasoning tasks may help.

What does this mean? That these tests may or may not help you in the 10-year long run. This study clearly demonstrates that we lose cognitive abilities no matter what you do. But you may be able to stave off some of these losses by engaging — for a time-limited period — in specific kinds of cognitive tests.

Reference

Rebok et al. (2014) Ten-Year Effects of the Advanced Cognitive Training for Independent and Vital Elderly Cognitive Training Trial on Cognition and Everyday Functioning in Older Adults. Journal of the American Geriatrics Society. DOI: 10.1111/jgs.12607

Other views

Study finds long-lasting results from brain exercises

Brain training courses may keep seniors sharper for 10 more years

Fun with Data! Spinning the Beneficial Effects of Cognitive Training, Brain Games

Footnotes: