## Monotonicity of EM Algorithm Proof

Here the monotonicity of the EM algorithm is established.
$$f_{o}(Y_{o}|\theta)=f_{o,m}(Y_{o},Y_{m}|\theta)/f_{m|o}(Y_{m}|Y_{o},\theta)$$
$$\log L_{o}(\theta)=\log L_{o,m}(\theta)-\log f_{m|o}(Y_{m}|Y_{o},\theta) \label{eq:loglikelihood}$$
where $$L_{o}(\theta)$$ is the likelihood under the observed data and $$L_{o,m}(\theta)$$ is the likelihood under the complete data. Taking the expectation of the second line with respect to the conditional distribution of $$Y_{m}$$ given $$Y_{o}$$ and the current parameters $$\theta^{k}$$ yields
$$\log L_{o}(\theta)= \mathbb{E}_{Y_{m}}\left[\log L_{o,m}(\theta)|Y_{o},\theta^{k}\right]-\mathbb{E}_{Y_{m}}\left[\log f_{m|o}(Y_{m}|Y_{o},\theta)|Y_{o},\theta^{k} \right]$$
which is used to construct the difference between the log-likelihood of a new value of $$\theta$$ and the current value $$\theta^{k}$$ as

\begin{split}
\log L_{o}(\theta)-&\log L_{o}(\theta^{k})=\mathbb{E}_{Y_{m}}\left[ \log L_{o,m}(\theta)|Y_{o},\theta^{k}\right]-\mathbb{E}_{Y_{m}}\left[ \log L_{o,m}(\theta^{k})|Y_{o},\theta^{k}\right] \\
+&\mathbb{E}_{Y_{m}}\left[ \log f_{m|o}(Y_{m}|Y_{o},\theta^{k})|Y_{o},\theta^{k} \right]-\mathbb{E}_{Y_{m}}\left[ \log f_{m|o}(Y_{m}|Y_{o},\theta)|Y_{o},\theta^{k} \right],\\
\end{split}

or by adopting common notation as

\log L_{o}(\theta)-\log L_{o}(\theta^{k})=Q(\theta;\theta^{k})-Q(\theta^{k};\theta^{k})+H(\theta^{k};\theta^{k})-H(\theta;\theta^{k}).\\

Consider the last two “$$H$$” terms, then by Jensen’s inequality

\begin{align*}
&-\mathbb{E}_{Y_{m}}\left[ \log f_{m|o}(Y_{m}|Y_{o},\theta)- \log f_{m|o}(Y_{m}|Y_{o},\theta^{k})|Y_{o},\theta^{k} \right]\\
&=-\mathbb{E}_{Y_{m}}\left[\log \frac{ f_{m|o}(Y_{m}|Y_{o},\theta)}{ f_{m|o}(Y_{m}|Y_{o},\theta^{k})}|Y_{o},\theta^{k} \right]\\
&\geq-\log \mathbb{E}_{Y_{m}}\left[ \frac{ f_{m|o}(Y_{m}|Y_{o},\theta)}{ f_{m|o}(Y_{m}|Y_{o},\theta^{k})}|Y_{o},\theta^{k} \right]\\
&=-\log \int f_{m|o}(Y_{m}|Y_{o},\theta)dY_{m}\\
&=0 \; \; \; \; \; \; \;\forall \theta\in \Theta.
\end{align*}
It follows that $$\log L_{o}(\theta)-\log L_{o}(\theta^{k})\geq 0$$ by choosing $$\theta$$ such that $$Q(\theta;\theta^{k})-Q(\theta^{k};\theta^{k})\geq 0$$.

Ruslan R Salakhutdinov, Sam T Roweis, & Zoubin Ghahramani (2012). On the Convergence of Bound Optimization Algorithms arXiv arXiv: 1212.2490v1
Wu C.F.J. (1983). On the Convergence Properties of the EM Algorithm, The Annals of Statistics, 11 (1) 95-103. DOI:
McLachlan G. & Peel D. DOI: