The whole purpose of having a manual of psychiatric diagnosis is to promote diagnostic agreement. The great value to the field of DSM-III was that it established reliability and preserved the credibility of psychiatry at a time when it was becoming irrelevant because it seemed that psychiatrists could not agree on diagnoses. Everyone knew that the reliability achieved in DSM field testing far exceeds what is possible in clinical practice, but DSM-III took the major step of proving that reliability could be achieved at all. Until now the DSMs have facilitated communication across the clinical/research interface, promoted research, and provided credibility in the court room.

But bad news was recently reported from the annual meeting of the American Psychiatric Association in Philadelphia. The hard-won credibility of psychiatric diagnosis is compromised by the abysmal results reported by the DSM-5 field trials. This failure was clearly predictable from the start:

The writing of the DSM-5 criteria sets was far too raw and imprecise to be ready for the rigors of field testing. The ambiguity cried out for expert editing, without which reasonable reliability is impossible.

The design of the field trial was byzantine in complexity and could never be done on schedule.

Constant delays in starting and completing Stage 1 of the study forced DSM-5 to cancel the planned Stage 2, which was meant to clean up the poorly performing criteria sets identified in the first stage.

With Stage 2 cancelled without explanation, it looks like even the worst diagnoses are being given a social pass.

Most absurdly, the design was totally off-point, failing to ask the only question that really counted: the impact of DSM-5 on rates.

The results of the DSM-5 field trials are a disgrace to the field. For context, in previous DSMs a diagnosis had to have a kappa reliability of about 0.6 or above to be considered acceptable. A reliability of 0.2 to 0.4 has always been considered completely unacceptable, not much above chance agreement.

No predetermined publication date justifies business as usual in the face of these terrible field trial results, which are even more striking given that they were obtained in academic settings with trained and skilled interviewers, highly selected patients, and no time pressure (the results in real-world settings would be much lower). Reliability this low for so many diagnoses gravely undermines the credibility of DSM-5 as a basis for administrative coding, treatment selection, and clinical research.

What can be done to salvage this deplorable mess?

DSM-5 has never had anyone on board who could write a clean, consistent, unambiguous criteria set. DSM-5 appears to have received either no editing at all, or amateur editing at best. Getting the words right is certainly not enough, but if you can't even get them right, nothing else can ever be safe.

For DSM-5 to retrieve credibility, it must complete the second planned stage of its field testing. If doing the job right must delay publication, so be it. Public trust must trump private publishing profits, and it is self-defeating for APA to publish a book no one can trust.

I have been consistently pessimistic and critical about DSM-5 since my first piece on it three years ago. The sad thing is I can still be so surprised. Each step of the way I predict it will fail in one or another way. But then I discover that DSM-5 has managed to fail in ways that go beyond my poor imagination. This assault on reliability was predicted, but its scope exceeds even my jaundiced fears and creates a DSM-5 emergency.

Allen Frances is a professor emeritus at Duke University and was the chairman of the DSM-IV task force.