Many times, researchers will categorize continuous variables. For example, birth weight of human infants is often categorized as “low birth weight” vs. “normal”; sometimes it is “very low birth weight”, “low birth weight” and normal. The cutoff for low birth weight is usually 2.5 kg. IQ tests are categorized with labels such as “gifted”. Depression tests are categorized. And so on. This rarely makes sense, either statistically or substantively.From a statistical point of view, categorizing a continuous (or nearly continuous) variable throws away information, leading (nearly always) to less power. It also assumes that we have any cutoff exactly right.

Substantively, it invokes “magical thinking” – that is, that something huge happens at the cutoff. E.g., with the birth weight example, a baby of 2.49 kg is treated as being very different than one of 2.51 kg, but very like one of 1.51 kg (if there is a 3rd category of “very low birth weight”). There’s no biological reason to think this happens. Substantive experts sometimes argue for cutoffs because they aid diagnosis and treatment, but this is only true if diagnosis is based on only one symptom and if treatment is dichotomous. This may sometimes be the case, but it is rare. Neonates are not diagnosed based solely on weight (there is also APGAR score, length of pregnancy and other factors) and treatment is not dichotomous. It is true that a baby will either be in intensive care or not. But, even if not in the NICU, hospital staff can be informed that a baby is at some risk. Babies can stay longer (or shorter) times in the hospital. New parents can be given varied advice, and so on.

Another example is depression. We can diagnose depression based on psychological tests such as the Beck Depression Inventory. But categorizing people still doesn’t make that much sense. People range on depression along a continuum. Treatment also varies. Neither psychotherapy nor drug treatment are dichotomous. Therapy can be more or less frequent. Dosages of drugs can be higher or lower. Nor should we diagnose depression based solely on the scores on one test – not when other symptoms are available.

So, does categorization ever make sense? Yes. There may be some situations where treatment really is dichotomous (although it is surprisingly difficult to think of them). In addition, sometimes there really is a big gap. For example, if looking at consumption of alcohol among teens and young adults, the age where drinking becomes legal would be key. Even here, though, there may be better models, such as spline regression.

I specialize in helping graduate students and researchers in psychology, education, economics and the social sciences with all aspects of statistical analysis. Many new and relatively uncommon statistical techniques are available, and these may widen the field of hypotheses you can investigate. Graphical techniques are often misapplied, but, done correctly, they can summarize a great deal of information in a single figure. I can help with writing papers, writing grant applications, and doing analysis for grants and research.

Specialties: Regression, logistic regression, cluster analysis, statistical graphics, quantile regression.

You can click here to email or reach me via phone at 917-488-7176. Or if you want you can follow me on Facebook, Twitter, or LinkedIn.