In a recent post, Scott linked an interesting paper about controlling for statistical confounders. The paper draws some pretty damning conclusions, all based on the simple idea that you’re never really controlling for X, you’re controlling for your imperfect proxy for X. Since the proxy is imperfect, if you’ve measured some associated variable Z, it’ll usually give you information about the true value of X above and beyond what your proxy tells you, and the usual approach mistakes this for an independent effect of Z above and beyond its association with X.

That’s very interesting, but it strikes me as just one facet of a bigger issue with statistical controls which has always unsettled me. There is something oddly backwards about the whole idea.

You, the scientist, want to publish an exciting new study about some variable of interest called Y. Everyone knows about ten different variables that “obviously” affect Y; call these, collectively, X. A study saying “X affects Y!” would not be new or exciting. No, you want to say that some other variable, Z, affects Y. No one has discovered that yet.

A problem arises: Z is also associated, in various ways, with various of the ten components of X. What if the correlation between Z and Y (or nonzero regression coefficient, or whatever) is just due to the already known X-to-Y association? How can you tell?

The usual answer is: make some sort of model predicting Y from both X and Z, and show that the model uses some information from Z to predict Y, even though it knows about X, too. Success! Now you can claim that Z is associated with Y. You are now free to forget about your model, which was merely a tool you used to draw this conclusion. You didn’t really care about predicting Y, and you don’t care whether your model is the best model for predicting Y, or even a good one. It has served its purpose, and into the dumpster it goes.

As I said, there is something backwards about this. Your claim about Z and Y depended entirely on Z helping some model predict Y. Clearly, the strength of your argument must depend on the quality of this model. If the model is a bad model of the relationship between X and Y, before Z is even added to the picture, then it’s hard to conclude anything from what happens when you add in Z; if your model doesn’t capture the relationships we think are there in the first place, its use of Z could just be an attempt to “put them back in.”

(For example, someone’s BMI is inversely proportional to the square of their height. The electrostatic force between an electron on someone’s head and an electron on their heel is also inversely proportional to the square of their height. Suppose, absurdly, that someone tries to model the relationship between height and BMI by doing linear regression on the two. This will fare poorly, because the relationship is inverse-square, not linear. But if they add in the electrostatic force as a regressor, it will of course have a nonzero coefficient, and predict BMI much better than the height term. This does not show that this force is associated with BMI “even controlling for height”!)

This was brought forcefully to my attention recently when I was reading a recent study about alcohol consumption and mortality. The big punchline was that, in a huge meta-analysis, it only took something like 7 standards drinks / week (not the 14 specified in the US guidelines) to negatively impact mortality.

There was a big problem with this claim that has nothing to do with this post, namely that the researchers meant “the confidence interval for 7 drinks / wk just barely excluded no effect” (it was nearly symmetric about a hazard ratio of 1.0). This is the same old problem where people try to figure out when an effect “turns on” or “turns off” by noticing when they start being able to reject the null, which is the kind of thing you are taught not to do in Stats 101 but which is nonetheless endemic in the medical literature.

But anyway, even after facepalming over that, I was curious about how the study adjusted for confounders. So many things are associated with mortality, and so many things are associated with alcohol consumption – how do you disentangle it all? And the authors clearly tried to do their due diligence on this front. My eyes started to glaze over as I read the list of confounders they controlled for:

HRs were adjusted for usual levels of available potential confounders or mediators, including body-mass index (BMI), systolic blood pressure, high-density-lipoprotein cholesterol (HDL-C), low-density-lipoprotein cholesterol (LDL-C), total cholesterol, fibrinogen, and baseline measures for smoking amount (in pack-years), level of education reached (no schooling or primary education only vs secondary education vs university), occupation (not working vs manual vs office vs other), self-reported physical activity level (inactive vs moderately inactive vs moderately active vs active), self-reported general health (scaled 0–1 where low scores indicate poorer health), self-reported red meat consumption, and self-reported use of anti-hypertensive drugs.



My first reaction upon reading this was to think, “okay, some of these may or may not have been poorly operationalized, and that may have affected their results in problematic ways not captured in the sensitivity analyses in their appendix, or maybe not, because how the fuck would I know when there’s so much going on in their mortality model?”



And then I was like, wait. They have a “mortality model.” They’re only focusing on the coefficients for one variable, but it’s got a zillion variables in it. It sounds like it could be the sort of model used by the people who are actually interested in predicting mortality as accurately as possible – say, insurance companies – as opposed to people who are just interesting in making claims about alcohol.

But they aren’t telling me how good their model is. I have no idea if it’s similar to the models the insurance company people use, or if the insurance company people would turn up their noses at it. Their model was created on the spot to make some claims about alcohol, and even if I spent a day scratching my head and trying to understand it, the next day I might read a paper with another mortality model, and have to repeat the process. There must be hundreds of models like this, invented on the spot for the purposes of statistical controls, and then discarded.

It feels like there should be someone in charge of maintaining our best models of things like mortality. Questions about individual variables, like alcohol, could be investigated on a common footing. Instead, we have hundreds of claims about how some Z affects some other Y, derived from different models, which might not all be true if stitched together into a single framework.