Illustration by Claire Welsh/Nature

Mice take the blame for one of the most uncomfortable truths in translational research. Even after animal studies suggest that a treatment will be safe and effective, more than 80% of potential therapeutics fail when tested in people. Animal models of disease are frequently condemned as poor predictors of whether an experimental drug can become an effective treatment. Often, though, the real reason is that the preclinical experiments were not rigorously designed1, 2.

The series of clinical trials for a potential therapy can cost hundreds of millions of dollars. The human costs are even greater: patients with progressive terminal illnesses may have just one shot at an unproven but promising treatment. Clinical trials typically require patients to commit to year or more of treatment, during which they are precluded from pursuing other experimental options. Launching a clinical trial without the backing of robust animal data keeps patients out of tests for therapies that may have a better chance of success.

One such group of patients is those with amyotrophic lateral sclerosis (ALS), the fatal neurodegenerative condition also known as Lou Gehrig's or motor neuron disease. Over the past decade, about a dozen experimental treatments have made their way into human trials for ALS. All had been shown to ameliorate disease in an established animal model. All but one failed in the clinic, and the survival benefits of that one are marginal.

At the ALS Therapy Development Institute (TDI) in Cambridge, Massachusetts, we have tested more than 100 potential drugs in an established mouse model of this disease (mostly unpublished work). Many of these drugs had been reported to slow down disease in that same mouse model; none was found to be beneficial in our experiments (see 'Due diligence, overdue'). Eight of these compounds ultimately failed in clinical trials, which together involved thousands of people. One needs to look no further than potential blockbuster indications such as Alzheimer's and cancer to see that the problem persists across diseases.

After nearly a decade of validation work, the ALS TDI introduced guidelines that should reduce the number of false positives in preclinical studies and so prevent unwarranted clinical trials. The recommendations, which pertain to other diseases too, include: rigorously assessing animals' physical and biochemical traits in terms of human disease; characterizing when disease symptoms and death occur and being alert to unexpected variation; and creating a mathematical model to aid experimental design, including how many mice must be included in a study. It is astonishing how often such straightforward steps are overlooked. It is hard to find a publication, for example, in which a preclinical animal study is backed by statistical models to minimize experimental noise.

The experiments necessary for this type of characterization are expensive, time-consuming and will not, in themselves, lead to new treatments. But without this upfront investment, financial resources for clinical trials are being wasted and lives are being lost.

Know your animals

Investigations at the ALS TDI exemplify how initial physiological descriptions of an animal model rarely encompass all salient features, including how closely the model captures what is observed in patients. Such models are often inadequate for studying how a drug affects various aspects of disease.

ALS progression is characterized by a deterioration in the neurons that innervate skeletal muscles. Sequencing and genetic studies implicate RNA-binding proteins as crucial for maintaining the health of motor neurons3. Mouse models expressing a mutant form of the RNA binding protein TDP43 show hallmark features of ALS: loss of motor neurons, protein aggregation and progressive muscle atrophy4.

But further study of these mice revealed key differences. In patients (and in established mouse models), paralysis progresses over time. However, we did not observe this progression in TDP43-mutant mice. Measurements of gait and grip strength showed that their muscle deficits were in fact mild, and post-mortem examination found that the animals died not of progressive muscle atrophy, but of acute bowel obstruction caused by deterioration of smooth muscles in the gut5. Although the existing TDP43-mutant mice may be useful for studying drugs' effects on certain disease mechanisms, a drug's ability to extend survival would most probably be irrelevant to people.

Scientists who use animal models for translational research must proceed with caution, and be prepared to do further characterizations themselves.

Cancel the noise

ALS TDI scientists performed a meta-analysis on nearly 5,500 mice that had been used in treatment or control groups over four years1. All mice expressed a specific defective version of the SOD1 gene, which is mutated in about 10% of people with inherited ALS. This work, and that of others6, revealed both unexpected variation in the animals, and ways to control for it.

Almost 90% of the mice had an average lifespan of 134 days, give or take 10 days. Careful inspection of animals that lived shorter or longer revealed four factors that produced considerable noise in the data and could have led to spurious conclusions (see ‘Four ways to fight noise’). Crucially, understanding such variation requires careful monitoring of hundreds of mice over several generations.

Four ways to fight noise Simple steps to avoid spurious conclusions Exclude irrelevant animals As often done in clinical trials, subjects that die for reasons unrelated to disease (such as mishandling) should not be counted in results. Reasons for exclusion should be well documented.

As often done in clinical trials, subjects that die for reasons unrelated to disease (such as mishandling) should not be counted in results. Reasons for exclusion should be well documented. Balance for gender Males and females can show differences in symptoms that obscure modest drug effects.

Males and females can show differences in symptoms that obscure modest drug effects. Split littermates among experimental groups Putting siblings into the same treatment group can bias results.

Putting siblings into the same treatment group can bias results. Track genes Genes that induce disease are often not inherited reliably. When copies are lost, symptoms can be less severe and drugs can seem more effective than they are.

One factor is the failure to exclude animals whose deaths are unrelated to the disease being studied. Other factors are failing to split littermates between control and treatment groups, and not taking gender into account. Male SOD1 mice show symptoms as much as a week before females and die about a week earlier. Given that a week is a 4% variability in survival, such differences could easily be misconstrued as a drug effect.

The fourth factor regards the genes introduced to induce disease. All too often, a disease phenotype is lost as a colony of breeding mice is built up. For many diseases, including ALS, animal models carry multiple copies of the disease-causing gene, and these repeated genes are often not passed on in a stable fashion as cells divide to make gametes. Regular genotyping assays are essential to make sure that mice in subsequent generations do not have fewer copies of the transgene, and therefore less severe disease.

At the ALS TDI we have seen this several times. When first described in 2010, all TDP43-mutant mice died within 200 days7. When we ordered mice from a breeding colony established from those used in this initial publication, the mice lived for up to 400 days without showing signs of disease. To perform the characterization work on TDP43 described above, we first spent several months backcrossing the strain to create a stable phenotype.

Illustration by Claire Welsh/Nature

Characterization can flag more subtle potential problems for translation. This is exemplified by a study showing that lithium can boost survival of SOD1 mice by 30 days, an astoundingly long time8. A small clinical trial showed that it also extended life in people with ALS8. Lithium is already sold to treat schizophrenia, and many people with ALS began taking the drug off label in hope of slowing down their disease progression. Three separate phase III clinical trials were launched in parallel to assess the drug's effects. These enrolled hundreds of patients with a total cost of well over US$100 million. None of the three trials showed any therapeutic benefit9, 10, 11.

Concurrently, other groups attempted to reproduce the preclinical data and could not12, 13. Although it is difficult to determine why the first study showed such a dramatic effect, its initial results are curious. The median survival time of untreated animals was 20 days shorter than that observed elsewhere, suggesting other anomalies.

For studies that aim to predict treatment benefits, such as extended survival or a delay of symptom progression, a mathematical simulation is in order. This incorporates the variation typically observed in an animal model to calculate how many animals should be assigned to the experimental groups. According to our calculations, highly variable animal models could require hundreds of animals per group; even homogeneous ones require as many as ten.

“Public and private agencies should fund characterization studies as a specific project.”

And before assessing a drug's efficacy, researchers should investigate what dose animals can tolerate, whether the drug reaches the relevant tissue at the required dose and how quickly the drug is metabolized or degraded by the body. We estimate that it takes about $30,000 and 6–9 months to characterize the toxicity of a molecule and assess whether enough reaches the relevant tissue and has a sufficient half life at the target to be potentially effective.

If those results are promising, then experiments to test whether a drug can extend an animal's survival are warranted — this will cost about $100,000 per dose and take around 12 months. At least three doses of the molecule should be tested; this will help to establish that any drug responses are real and suggest what a reasonable dosing level might be.

Thus, even assuming the model has been adequately characterized, an investment of $330,000 is necessary just to determine whether a single drug has reasonable potential to treat disease in humans. This seems worthwhile given that it could take thousands of patients, several years and hundreds of millions of dollars to move a drug through the clinical development process.

Community effort

As academic labs shift their focus to translational research, the burden to characterize animal models will fall on them. Although the costs are meagre compared with those of clinical trials, the investment required in time and funds is far beyond what any one lab should be expected to do. This burden and the resulting mouse models should be shared. At the very least, researchers should place new animal models in a public repository so that other teams can repeat the characterization, and share the costs of doing it well.

Public and private agencies should fund characterization studies as a specific project. A good example is the Alzheimer's Disease Neuroimaging Initiative, a large, collaborative study to find diagnostic biomarkers of the disease. Competitive bidding and milestone-driven payments could persuade qualified groups to perform the necessary experiments and to make results publicly available. This is unglamorous work that will never directly lead to a breakthrough or therapy, and is hard to mesh with the aims of a typical grant proposal or graduate student training programme. However, without these investments, more patients and funds will be squandered on clinical trials that are uninformative and disappointing.