In a previous blog post I discussed how we can use the idea of potential outcomes to make causal inferences from observational data. Under the Potential Outcomes framework we treat the counterfactual outcome as if it were missing data and attempt to estimate these missing values from the observed data. To do this we needed to make strong assumptions about the data generating process, specifically "Strong Ignorability"

$\Cin{Y_{i}}{X}{Z}$

Where $Y_{i}$ are the potential outcomes we are trying to estimate, $X$ is the intervention we are trying to measure and $Z$ are a set of covariates which allow us to "correct" our estimate. This statement should be read: "$Y_{i}$ is conditional independent of $X$ given $Z$".

This is a strong statement about the process which generated our data. In order to understand where strong ignorability hold, we need to make some assumptions about the structure of the data generating process itself. The language we will be using to express this structure is that of Causal Graphical Models. In this post I will try to give an light overview of causal graphical model using a python package of the same name.

Compared to my previous post, this post will be less about techniques to make causal inferences and more on gaining intuition about how we can describe data generating structure and what statements we can make once we have such a description. I am also not going to be playing fast and loose with some of the maths