27 Nov 2018

As researchers push toward testing Alzheimer’s drugs at presymptomatic stages, the hurdle before them—how to show efficacy and get a drug approved—looks different than before. In its February 2018 draft guidance for industry, the U.S. Food and Drug Administration signaled a willingness to consider a slowing of cognitive decline alone as the basis for approval. At the 11th Clinical Trials on Alzheimer’s Disease conference, held October 24–27 in Barcelona, Spain, leaders in the field wrestled with how this new guideline might be put into play, and what cognitive tests might fit the agency’s bill.

FDA guidance anoints cognition as sole outcome for preclinical trials.

Researchers at CTAD debated how to put this into practice.

Statistical analysis identifies outcome measures that best reflect disease progression.

Suzanne Hendrix of Pentara Corp., Salt Lake City, made a case for analyzing longitudinal data to select the measures that change the most as disease gets worse. Such measures should be tailored to the specific disease stage and population under investigation. A slowing of decline on those scales would reflect slowing in the underlying disease process, Hendrix argued. “If you can measure progression with a test, then you can measure a treatment effect that is disease-related,” she said. Researchers broadly agreed on the need for such customized measures.

New Guidance Engenders Debate

The FDA guidance defines four stages of sporadic Alzheimer’s disease based solely on clinical and cognitive criteria, without reference to biomarkers. In this scheme, stage 1 has no detectable impairment, stage 2 is marked by subtle cognitive decline, stage 3 by cognitive decline and functional impairment, and stage 4 is dementia. Stages 1 and 2 would be considered preclinical disease, and stage 3 prodromal, researchers noted in Barcelona. Maria Carrillo of the Alzheimer’s Association said the guidance represents two years of effort from a working group that convened in 2015 and used the best data available at that time. As new research refines scientists’ understanding of AD, however, the picture may change again. “The guidance is meant to encourage the field to test hypotheses,” Carrillo said. “Some of the revisions in the guidance are already paying off, and some already need revising.”

Case in point: It is not so clear, after all, whether stage 1 disease actually exists, noted Paul Aisen of the University of Southern California, San Diego. Recent studies have found measurable cognitive decline even in people with sub-threshold levels of amyloid accumulation (Aug 2018 conference news). In people with amyloid encroaching on the brain, performance slides slightly faster over time than it does in age-matched controls, even while cognitive scores remain in the normal range. Thus, all preclinical disease may effectively be stage 2.

However, Aisen noted that the difference in decline is slight in preclinical disease, detectable over three years on a sensitive cognitive battery such as the PACC, and over six years on the CDR-SB. In other words, detecting this modest decline would require either long trials or even more sensitive cognitive measures.

Samantha Budd Haeberlein of Biogen in Cambridge, Massachusetts, would rather see the latter than the former. She celebrated the FDA’s willingness to consider new neuropsychological tools. “Industry welcomes this openness, but we’re not as enthused about long trials,” she noted drily. Others agreed. “The guidance is encouraging,” said Gary Romano of Janssen Pharmaceuticals in Philadelphia, adding, “The catch is to demonstrate a relationship between the early and late manifestations of disease.”

Researchers are committed to going early. As Ron Petersen of the Mayo Clinic in Rochester, Minnesota, put it, “Stage 2 is where the money is." These are people whose cognitive decline can be measured, but who are still early enough in disease that interventions could stem the worst of AD. In the draft guidance, the FDA says it will accept cognitive measures alone as the basis for approval in stage 2. It also says a drug’s benefit must be robust and consistent across multiple domains, and that the application will be stronger if supported by biomarker evidence. Petersen suggested mining longitudinal data to identify other early impairments, such as mild neurobehavioral symptoms, that could strengthen the case for a clinical benefit. “Perhaps we could marry subtle behavioral features with tau pathology,” Petersen said.

Industry faces another hurdle in demonstrating that even a slight slowing of cognitive decline is meaningful to people’s daily lives. Chris Edgar of the U.S./Australian testing company Cogstate noted that the previous FDA guidance used the term “meaningful” only twice, but the new draft guidance mentions it 25 times. “This indicates a shift in focus by the FDA,” he said in Barcelona. Alas, the FDA does not define the term. “We’re some ways from having a consensus on what it means,” Edgar said.

He noted that the FDA gives great weight to patient and caregiver perspectives, implying that these should be incorporated into outcome measures. Jason Hassenstab of Washington University in St. Louis agreed. “We’re taking that approach in DIAN,” he said. DIAN investigators are attempting to measure qualities that might be considered intangible, such as a person’s contentment, confidence, and sense of identity, to evaluate whether the benefit of a therapy was meaningful to them, Hassenstab said.

Aisen suggested that the Cognitive Function Index (CFI), a brief survey that asks about small functional deficits in people at preclinical stages, may help get at the question of meaningfulness (Mar 2015 news). The CFI distinguishes between amyloid-positive and –negative people at baseline. “However, I’m not confident it will perform well longitudinally,” Aisen said. Currently, many companies are moving toward single measures that incorporate cognitive and functional tests, such as the CDR-SB, ADCOMS, and the integrated Alzheimer’s Disease Rating Scale (iADRS, Wessels et al., 2015). Overall, Aisen believes more work is needed to translate the FDA guidance into concrete outcome measures. “The framework is not that easy to operationalize,” he said.

Which Test for Progression? Average scores for normal people (blue dots) cluster together, while MCI scores (pink dots in middle) worsen over two years, and mild AD scores (pink dots on left) worsen from baseline to 12 and 24 months. These values define disease progression (red axis). Scores for individual tests (yellow dots) often diverge from this axis, with ADAS-Cog scores (yellow line from upper right to lower left) farthest from it, CDR-SB scores (yellow line from bottom right to upper left) closer, and ADCOMS scores (middle yellow line) fitting best. Vertical axis (green line) shows cognition-related scores on top, functional tests on bottom. [Courtesy of Suzanne Hendrix.]

The Hunt for Best Cognitive Tests Is On

So how to find that most sensitive, meaningful test? In Barcelona, Hendrix proposed one way. She visualized neuropsychological test scores in three-dimensional space to find those measures that correlate most closely with disease progression. Using the ADNI data set as an example, she graphed baseline, 12-, and 24-month data from cognitively normal controls, people with mild cognitive impairment, and with mild AD. Scores for controls clustered at one side of the space, while those for MCI ended up in the middle, and the mild AD group landed on the far side. For the controls, all three time points bunched up, but for the MCI group they separated, and for the mild AD group they were spaced far apart, in keeping with the rate of decline at each disease stage.

Hendrix showed a line drawn from the heart of the clustered baseline control scores through the MCI group to end in the center of the 24-month mild AD scores. This line defines the axis of disease progression from normal to mild AD, Hendrix said. With the axis established, researchers can look at lines drawn through specific patient populations longitudinally and see how well performance on each individual test over time parallels the axis of disease progression—in other words, how well that test reflects an actual worsening of disease.

A complication arises because the axis of disease progression is not perfectly straight. Scores on some tests change more during the MCI stage and others during the AD stage, resulting in different slopes or directions for progression at each stage. This means that most tests work best at a specific stage (Nov 2018 conference news).

For example, in the ADNI data, the ADAS-Cog aligns with the axis of progression at the mild AD stage, but not with progression in MCI. The CDR-SB matches up a little better with the slope of progression in MCI, but still not well. To find a better measure for MCI, Hendrix collaborated with researchers at Eisai to develop ADCOMS. They fit a partial least squares (PLS) model using neuropsychological data from four different cohorts, then looked for those individual test items that correlated best with disease progression in people with MCI over one year. The graphical representation in three-dimensional space corresponds to the mathematical model using a PLS regression, and makes it easier to understand the math. The model selected four items from the ADAS-Cog, two from the MMSE, and all six from the CDR-SB, weighted them according to their relative contributions to disease progression, and combined them into ADCOMS (Wang et al., 2016). The resulting composite includes both cognitive and functional measures. In Barcelona, Hendrix showed that ADCOMS performed in the ADNI data set as it was designed to do, lining up well with the axis of disease progression in MCI.

The ADCOMS was used successfully in the recent BAN2401 trial (Nov 2018 conference news).

However, ADCOMS is not a panacea either. It performs poorly in preclinical disease, because this relatively healthy population is outside the dynamic range of its functional measures. For preclinical disease, Hendrix teamed up with colleagues at the Alzheimer’s Prevention Initiative to use the same PLS model to develop the API Preclinical Cognitive Composite (APCC). This battery was optimized to detect cognitive change that occurs over the decade before people got their MCI diagnosis; it includes sensitive cognitive measures but no functional ones (Ayutyanont et al., 2014).

The researchers adjusted APCC scoring to account for the effects of normal aging-related cognitive decline. At the MCI stage, aging-related decline has little relative effect, but at preclinical stages, it can be a significant confounder, Hendrix told Alzforum. An earlier version of the APCC, called the API-LOAD, did not include this correction for normal aging. She noted that this aging correction, and the longitudinal derivation, are the main differences between the APCC and the Preclinical Alzheimer Cognitive Composite (PACC) developed by A4 researchers (Jun 2014 news). The PACC includes a broader range of cognitive performance and does not correct for age-related cognitive decline.

So which test is best for preclinical disease? Hendrix said it depends what researchers want to measure. Some treatments, such as cognitive enhancers, might improve progressive and non-progressive aspects of cognition. In that case, the PACC might be good for evaluating efficacy. Interventions such as nutritional or lifestyle interventions might be expected to slow age-related decline as well as Alzheimer’s pathology, and the API-LOAD might be the best test to evaluate efficacy (Langbaum et al., 2014). For a therapy targeting a specific AD pathology, such as amyloid or tau, the APCC is likely to paint a clearer picture of its effects. “It comes down to figuring out what kind of change you have in an untreated population, and what aspects of that change you’re targeting with your treatment,” Hendrix told Alzforum.

How Might This Work?

In Barcelona, Hendrix showed an example of how this method could help a clinical trial. Danone Nutricia Research recently reported that the LipiDiDiet trial of its nutriceutical drink Souvenaid missed its primary endpoint but met two secondary ones, hinting at some activity (Mar 2016 conference news; Nov 2017 news). Nutricia contracted with Hendrix to perform a post hoc analysis assessing Souvenaid’s performance on the ADCOMS.

Souvenaid showed statistical significance on the ADCOMS. Graphing the data in three-dimensional space, Hendrix found that LipiDiDiet’s primary outcome measure, the neuropsychological test battery (NTB), bore almost no relationship to disease progression. The axis of NTB scores angled 75 degrees away from the axis of disease progression. The NTB measures several aspects of cognition, such as processing speed and attention, that vary greatly from day to day based on a person’s physical and mental state, i.e., how rested or focused he or she is. These factors are not expected to change in a consistent way in this stage of disease, Hendrix noted. Some of the NTB’s executive function tasks may measure inherent abilities that, again, change little with disease progression in this stage.

By contrast, ADCOMS scores lined up more closely with disease progression in the LipiDiDiet data set, as they had in the ADNI dataset. When Hendrix reanalyzed LipiDiDiet data using ADCOMS, she saw a 36 percent slowing of cognitive decline, with a p value of 0.023. Cohen’s d was 0.31, considered a relatively small effect size. Breaking ADCOMS down by its component tests, Hendrix found that the CDR-SB items provided most of the discrimination in this study. Had the LipiDiDiet trial used ADCOMS, or another measure optimized for disease progression in MCI, it would have posted a positive result, Hendrix concluded.

Hendrix claims that other trials, too, would have performed better with a more tailored outcome measure. She performed post hoc analysis on the public results from the EXPEDITION3 solanezumab trial, in which solanezumab slowed decline on the ADAS-Cog, but not by enough to achieve statistical significance (Jan 2018 news). A global test statistic that is similar to ADCOMS does show statistical significance in this data set, although solanezumab’s benefit remains small, Hendrix said.

Hendrix ran a similar analysis for seven other recent trials (see table above). Each trial that had a significant effect or a trend on the ADAS-Cog, MMSE, or CDR-SB turned out to be clearly significant on ADCOMS, while trials without a significant benefit were more clearly negative on ADCOMS. Using the right measure gives more accurate results for both positive and negative trials, Hendrix argues. “Some studies we think have failed may not have failed, and studies we think succeeded may not have succeeded. We’re in muddy waters a lot of the time,” she told Alzforum.

The key is to use longitudinal data to evaluate neuropsychological tests and find those that actually reflect disease progression, Hendrix stressed. Measures should combine a large change over time with a small standard deviation, giving them the best chance to detect a slowing of cognitive decline. In studies where different outcome measures conflict, a three-dimensional analysis can show which measure more closely reflects disease progression and deserves more weight, she added.

This strategy may help identify tests that satisfy the FDA, Hendrix believes. “We need to find the most sensitive measures of progression in a given population, and use those as outcomes,” she said. She believes measures that closely reflect disease progression are more likely to be clinically meaningful. “We need to make sure our outcome measures reflect the symptoms that progress,” Hendrix said.—Madolyn Bowman Rogers