Let me say this right off the bat. Research ethics are a good thing. Seriously.

But have you ever thought of an idea for a study that you sort of wish you could do, but know you couldn't?

Like randomizing high school kids to pot-smoking?

Yeah, that's not getting through the IRB -- but come on -- we'd finally figure out if it's a gateway drug!

Observational studies are biased, confounded, and often plain wrong, but there are many reasons to do them. Randomized trials are fabulously expensive, for one. But, in many cases, randomized trials just aren't ethical. We're stuck. And some information is better than none, right?

In fact, one could argue that the whole of modern statistical methods is a vast apology for observational research. All of our adjustment techniques, our propensity-score matching, our sensitivity analyses are designed to try to make an observational study give us the same information a randomized trial would.

These statistical techniques have had some impressive failures, like the oft-cited time when the observational research suggested that hormone replacement therapy in post-menopausal women would prevent heart attacks when the opposite was true.

But one technique to take observational data and make it look like a randomized controlled trial just might have the chops to allow us to conduct all the unethical studies we want and get valid results.

It's called instrumental variable analysis.

But before we get to that, we need to realize exactly why a randomized trial is so special. Why is it different from all the other medical research out there? (As an aside, this is question number 1 in my statistics-themed Passover Seder).

The answer is that we introduce this artificial thing -- randomization -- that is strongly associated with the exposure of interest (a drug, typically), but has nothing to do with the outcome. Of course it doesn't -- the outcome hasn't happened yet. Randomization to group A or group B may turn out to be associated with the outcome, but the only possible mechanism for it to be associated is through the action of the drug you were randomized to. That's the secret sauce. There is no pathway for randomization to lead to the outcome except through the drug (barring fraud of course).

Figure 1: The secret sauce of a randomized trial is introducing a factor that is strongly, almost perfectly associated with the exposure, but has no relationship to the outcome except through the exposure.

Think of instrumental variables like natural randomizers. Instead of randomization being done by a computer generating numbers, it is done by genetics (see Mendelian randomization), or perhaps state laws changing, or some other "natural experiment."

Let's start with an example.

You want to know if pot (marijuana, Cannabis sativa) is a gateway drug. We've established that randomizing 14-year-olds to pot versus placebo is not a virtuous act.

So we use observational data. The 2013 National Survey on Drug Use and Health is available for free online (thanks, Obama). This is a survey of about 55,000 adolescents ages 12-17 from around the U.S. Looking at the file, 44% reported ever smoking marijuana.

Figure 2: Kids having more fun than I did in high school

But is pot a gateway drug? Is there, say, a pot:cocaine connection?

Figure 3: The "gateway drug" hypothesis states that marijuana use leads directly to harder drugs (like cocaine). Mechanisms proposed include euphoria-seeking behavior, or increased tolerance to law/rule-breaking.

Looking at our data, we find some pretty dramatic results:

If you had smoked marijuana, your odds of trying cocaine were 80 times higher than if you had never smoked marijuana. In fact, in the entire dataset of 55,000 people, there are only 135 who have reported trying cocaine but never the ganja.

Figure 4: This Venn diagram illustrates the substantial overlap between pot and cocaine use. Still, most pot smokers never try cocaine.

Case closed? Of course not. We know that there are going to be systematic differences between pot smokers and nonpot smokers. For example, pot smokers had lower family income, were more likely to have moved in the past year, and were more than twice as likely not to be regularly attending school. Pot smokers were more often employed as well (50% held down a job compared to 35% of abstainers).

So maybe pot doesn't lead to cocaine. Maybe people who are at risk of cocaine addiction (due to social isolation, for example) are also at risk of smoking pot.

Figure 5: In this framework, pot isn't a gateway drug, it's just a marker of something else bad going on that also puts you at risk of trying cocaine.

We can adjust for factors that we think might be confounding the observed pot:cocaine relationship. Well, we can if they were measured in the first place. I've written before about how adjustment can go very, very wrong.

But we might miss something. Wait, strike that. We ALWAYS miss something. There are always other confounding factors that we didn't think about, or we didn't measure. For instance, nowhere in the dataset I'm looking at do I see a variable for "drug dealer lives in walking distance" which might be important to know.

Instrumental variable analysis is a fundamentally different approach to observational data, that, for me, has much more appeal.

Here's the idea. You find a variable (an "instrument") that is associated with the exposure but NOT the outcome, except via the exposure.

Does that sound familiar? It's the same secret sauce that makes a randomized trial so great.

If we are trying to evaluate the relationship between pot smoking and cocaine use we need to think of a variable that is associated with pot use, but not cocaine use, except, potentially via pot use. Let's brainstorm:

Poverty: Clearly not. Poverty is associated with our exposure, but also with our outcome. And poverty can lead to cocaine use in the absence of pot use, so it is NOT a valid instrument.

Facebook use: Interesting thought. Maybe Facebook users smoke more pot (I have no dataset to back this up, unfortunately), but I can't argue for a clear rationale as to why that might be, and why whatever it is about Facebook wouldn't put them ALSO at risk of cocaine use.

Living in Colorado: Now we're getting somewhere. Colorado has legalized marijuana (though not for teenagers). Living in Colorado, then, might increase access to marijuana, and thus use, without directly impacting cocaine usage (unless the hypothesis that greater marijuana use leads to greater cocaine use is true).

Price of pot: I like this one. Higher costs of pot should be associated with less consumption, but should have nothing to do with cocaine.

As you can probably see, picking the right instrument is tricky. In the end, the way you criticize an instrumental variable paper is arguing that the instrument they picked was wrong.

Neither hash sticker-price nor state of residence is available in the dataset I had, but it was available for economist Jeffrey DeSimone, PhD, when he published this study using state marijuana-possession penalty variables as the instrument. The idea here is that states with laxer marijuana penalties will have higher marijuana use. If there is truly no link between marijuana use and cocaine use, that should have no bearing on cocaine use in the state. If there is a causal link, well, states with laxer pot laws will have more cocaine use. He found that, using this approach, prior use of pot increases your risk of subsequent cocaine use by about 29%.

But another paper, using the presence of medical marijuana laws as the instrument, found the opposite. Medical marijuana laws increased marijuana usage in the state by 10%-20%, but cocaine consumption was unmoved (and heroin consumption decreased, by the way).

Perhaps the most reasonable paper, though, came out of England. Using marijuana prices as the instrument, they identified two subgroups of individuals. The first was sort of an "at-risk youth" group, for whom consumption of marijuana did appear to lead directly to subsequent cocaine use. There was no gateway effect in the much larger "everyone else" group though.

And doesn't that sort of make sense?

In the final analysis, we still don't really know if pot is a gateway to harder drugs. For some kids it appears like it might be. The truth is, we'll probably never know with 100% certainty, because we can't do the unethical trial that opened this post.

If you have an unethical trial, leave a comment -- maybe we can come up with an instrumental variable to get at the truth in a clever, if roundabout way.