On this Tuesday, Professor Xuming He presented their recent work on subgroup analysis, which is very interesting and useful in reality. Think about the following very much practical problem (since the drug is expensive or has certain amount of side effect):

If you are given the drug response, some baseline covariates which have nothing to do with the treatment, and the treatment indicator as well as some post-treatment measurements, how could you come up with a statistical model to tell whether there exist subgroups which respond to the treatment differently?

Think about 5 minutes and continue the following!

Dr He borrowed a very traditional model in Statistics, logistic-normal mixture model to study the above problem. The existence of the two subgroups is characterized by the observed baseline covariates, which have nothing to do with the treatment:

where is the unobserved membership index. And the observed response follows a normal mixture model

with different means and , where usually contains but also includes the treatment indicator as well as any post-treatment measurements. Given that there is two subgroups characterized by the baseline covariates (which makes the test problem regular), they tried to test whether the two groups respond to the treatment differently, that is testing the component of which corresponds to the treatment indicator.

Nice work to demonstrate how to come up with a statistical model to study some interesting and practical problems!

But the above part has nothing to do with the title, EM algorithm. Actually you could imagine that they will use EM as a basic tool to study the above mixture model. That’s why I came back to revisit this great idea in Statistics.

Given complete random vector with observed and unobserved, we have the likelihood function . Then the log marginal likelihood has the following property:

where the last inequality is from Jensen’s inequality, and is any density function put on . In order to make the bound tight, i.e. to make the above inequality as equality, one possible way is , which leads to

Then we have

In summary, we have the following EM procedure:

E step: get the conditional distribution ; M step:

And the corresponding EM algorithm can be described as the following iterative procedure:

E step: get the conditional distribution ; M step:

In order to make this procedure effective, in the M step, the condition expectation should be easy to calculate. In fact, usually, since the expectation will be taken under the current , which will not produce any new , we usually get first and then by plugging we have .

And this procedure guarantees the following to make sure the convergence

In summary, EM algorithm is useful when the marginal problem is difficult while the joint problem is easy. However is unobservable, and the EM algorithm attempts to maximize iteratively, by replacing it with its conditional expectation given the observed data. This expectation is computed with respect to the distribution of the complete data evaluated at the current estimate of .

In the talk given by Professor Xuming He, he mentioned a rule of thumb from practice experience that the EM algorithm produces a good enough estimator in the first few steps.