Monotonicity of EM Algorithm Proof


Here the monotonicity of the EM algorithm is established.
$$ f_{o}(Y_{o}|\theta)=f_{o,m}(Y_{o},Y_{m}|\theta)/f_{m|o}(Y_{m}|Y_{o},\theta)$$
$$ \log L_{o}(\theta)=\log L_{o,m}(\theta)-\log f_{m|o}(Y_{m}|Y_{o},\theta) \label{eq:loglikelihood} $$
where \( L_{o}(\theta)\) is the likelihood under the observed data and \(L_{o,m}(\theta)\) is the likelihood under the complete data. Taking the expectation of the second line with respect to the conditional distribution of \(Y_{m}\) given \(Y_{o}\) and the current parameters \(\theta^{k}\) yields
$$\log L_{o}(\theta)= \mathbb{E}_{Y_{m}}\left[\log L_{o,m}(\theta)|Y_{o},\theta^{k}\right]-\mathbb{E}_{Y_{m}}\left[\log f_{m|o}(Y_{m}|Y_{o},\theta)|Y_{o},\theta^{k} \right]$$
which is used to construct the difference between the log-likelihood of a new value of \(\theta\) and the current value \(\theta^{k}\) as
\begin{equation}
\begin{split}
\log L_{o}(\theta)-&\log L_{o}(\theta^{k})=\mathbb{E}_{Y_{m}}\left[ \log L_{o,m}(\theta)|Y_{o},\theta^{k}\right]-\mathbb{E}_{Y_{m}}\left[ \log L_{o,m}(\theta^{k})|Y_{o},\theta^{k}\right] \\
+&\mathbb{E}_{Y_{m}}\left[ \log f_{m|o}(Y_{m}|Y_{o},\theta^{k})|Y_{o},\theta^{k} \right]-\mathbb{E}_{Y_{m}}\left[ \log f_{m|o}(Y_{m}|Y_{o},\theta)|Y_{o},\theta^{k} \right],\\
\end{split}
\end{equation}
or by adopting common notation as
\begin{equation}
\log L_{o}(\theta)-\log L_{o}(\theta^{k})=Q(\theta;\theta^{k})-Q(\theta^{k};\theta^{k})+H(\theta^{k};\theta^{k})-H(\theta;\theta^{k}).\\
\end{equation}
Consider the last two “\( H\)” terms, then by Jensen’s inequality

\begin{align*}
&-\mathbb{E}_{Y_{m}}\left[ \log f_{m|o}(Y_{m}|Y_{o},\theta)- \log f_{m|o}(Y_{m}|Y_{o},\theta^{k})|Y_{o},\theta^{k} \right]\\
&=-\mathbb{E}_{Y_{m}}\left[\log \frac{ f_{m|o}(Y_{m}|Y_{o},\theta)}{ f_{m|o}(Y_{m}|Y_{o},\theta^{k})}|Y_{o},\theta^{k} \right]\\
&\geq-\log \mathbb{E}_{Y_{m}}\left[ \frac{ f_{m|o}(Y_{m}|Y_{o},\theta)}{ f_{m|o}(Y_{m}|Y_{o},\theta^{k})}|Y_{o},\theta^{k} \right]\\
&=-\log \int f_{m|o}(Y_{m}|Y_{o},\theta)dY_{m}\\
&=0 \; \; \; \; \; \; \;\forall \theta\in \Theta.
\end{align*}
It follows that \(\log L_{o}(\theta)-\log L_{o}(\theta^{k})\geq 0\) by choosing \(\theta\) such that \(Q(\theta;\theta^{k})-Q(\theta^{k};\theta^{k})\geq 0\).

Ruslan R Salakhutdinov, Sam T Roweis, & Zoubin Ghahramani (2012). On the Convergence of Bound Optimization Algorithms arXiv arXiv: 1212.2490v1
Wu C.F.J. (1983). On the Convergence Properties of the EM Algorithm, The Annals of Statistics, 11 (1) 95-103. DOI:
McLachlan G. & Peel D. DOI: