Monotonicity of EM Algorithm Proof

Here the monotonicity of the EM algorithm is established.
$$ f_{o}(Y_{o}|\theta)=f_{o,m}(Y_{o},Y_{m}|\theta)/f_{m|o}(Y_{m}|Y_{o},\theta)$$
$$ \log L_{o}(\theta)=\log L_{o,m}(\theta)-\log f_{m|o}(Y_{m}|Y_{o},\theta) \label{eq:loglikelihood} $$
where $ L_{o}(\theta)$ is the likelihood under the observed data and $L_{o,m}(\theta)$ is the likelihood under the complete data. Taking the expectation of the second line with respect to the conditional distribution of $Y_{m}$ given $Y_{o}$ and the current parameters $\theta^{k}$ yields
$$\log L_{o}(\theta)= \mathbb{E}_{Y_{m}}\left[\log L_{o,m}(\theta)|Y_{o},\theta^{k}\right]-\mathbb{E}_{Y_{m}}\left[\log f_{m|o}(Y_{m}|Y_{o},\theta)|Y_{o},\theta^{k} \right]$$
which is used to construct the difference between the log-likelihood of a new value of $\theta$ and the current value $\theta^{k}$ as
\begin{equation}
\begin{split}
\log L_{o}(\theta)-&\log L_{o}(\theta^{k})=\mathbb{E}_{Y_{m}}\left[ \log L_{o,m}(\theta)|Y_{o},\theta^{k}\right]-\mathbb{E}_{Y_{m}}\left[ \log L_{o,m}(\theta^{k})|Y_{o},\theta^{k}\right] \\
+&\mathbb{E}_{Y_{m}}\left[ \log f_{m|o}(Y_{m}|Y_{o},\theta^{k})|Y_{o},\theta^{k} \right]-\mathbb{E}_{Y_{m}}\left[ \log f_{m|o}(Y_{m}|Y_{o},\theta)|Y_{o},\theta^{k} \right],\\
\end{split}
\end{equation}
or by adopting common notation as
\begin{equation}
\log L_{o}(\theta)-\log L_{o}(\theta^{k})=Q(\theta;\theta^{k})-Q(\theta^{k};\theta^{k})+H(\theta^{k};\theta^{k})-H(\theta;\theta^{k}).\\
\end{equation}
Consider the last two “$ H$” terms, then by Jensen’s inequality

\begin{align*}
&-\mathbb{E}_{Y_{m}}\left[ \log f_{m|o}(Y_{m}|Y_{o},\theta)- \log f_{m|o}(Y_{m}|Y_{o},\theta^{k})|Y_{o},\theta^{k} \right]\\
&=-\mathbb{E}_{Y_{m}}\left[\log \frac{ f_{m|o}(Y_{m}|Y_{o},\theta)}{ f_{m|o}(Y_{m}|Y_{o},\theta^{k})}|Y_{o},\theta^{k} \right]\\
&\geq-\log \mathbb{E}_{Y_{m}}\left[ \frac{ f_{m|o}(Y_{m}|Y_{o},\theta)}{ f_{m|o}(Y_{m}|Y_{o},\theta^{k})}|Y_{o},\theta^{k} \right]\\
&=-\log \int f_{m|o}(Y_{m}|Y_{o},\theta)dY_{m}\\
&=0 \; \; \; \; \; \; \;\forall \theta\in \Theta.
\end{align*}
It follows that $\log L_{o}(\theta)-\log L_{o}(\theta^{k})\geq 0$ by choosing $\theta$ such that $Q(\theta;\theta^{k})-Q(\theta^{k};\theta^{k})\geq 0$.

Ruslan R Salakhutdinov, Sam T Roweis, & Zoubin Ghahramani (2012). On the Convergence of Bound Optimization Algorithms arXiv arXiv: 1212.2490v1
Wu C.F.J. (1983). On the Convergence Properties of the EM Algorithm, The Annals of Statistics, 11 (1) 95-103. DOI: 10.1214/aos/1176346060
McLachlan G. & Peel D. DOI: 10.1002/0471721182

Ive Moved

Monotonicity of EM Algorithm Proof

Like this:

Categories

Recent Posts

Sponsors

Share this:

Like this:

Categories

Recent Posts

Sponsors