$\begingroup$

I am reading this ICML2016 paper, and am puzzled with the first inequality (converted to equality) on section 2.2.

Assume the model is $P(x,h)$ where $h$ are the hidden variables. Also assume $\hat{I}$ is an unbiased estimator of the likelihood term, $P(x)$. From this we can conclude that $$E_{P(x)}[\hat{I}]-P(x)=0\Rightarrow E_{P(x)}[\hat{I}] = P(x)$$ now assume we want to establish a lower bound on $P(x)$ (similar to EM approach) and plug in the estimator $\hat{I}$, instead of $P_{\theta}(x)$ in the lower bound formulation.

For this imagine the posterior distribution over the latent variables $h$, to be estimated using $Q(h|x)$ (i.e., $Q$ is a variational posterior). So if we want to write

\begin{align} &\log P(x) =\log \sum P(x,h)\\&\Rightarrow \log P(x) = \log \sum P(x,h)\frac{Q(h|x)}{Q(h|x)}\\&\Rightarrow \log E_{Q(h|x)}[P(x,h)] \ge E_{Q(h|x)}[\log P(x,h)]\\ &\Rightarrow \log \hat{I} \ge E_{Q(h|x)}[\log P(x,h)] \end{align} Here are two the puzzling parts:

I don't understand how they could drive

$$E_{Q(h|x)}[\log \hat{I}]\leq \log E_{Q(h|x)}[\hat{I}] = \log P(x)$$ given all mentioned in the above.

They also say since $\hat{I}$ is an unbiased, it can be written

$$E_{Q(h|x)}[ \hat{I}] = P(x)$$ which is not clear why, given the unbiased estimator definition.