# Application in Conditional Distribution of Multivariate Normal

The Sherman-Woodbury-Morrison matrix inverse identity can be regarded as a transform between Schur complements. That is, given $V_{22.1}^{-1}$ one can obtain $V_{11.2}^{-1}$ by using the Woodbury matrix identity and vice versa. Recall the Woodbury Identity:

$V_{11.2}^{-1}=V_{11}^{-1}+V_{11}^{-1}V_{12}V_{22.1}^{-1}V_{21}V_{11}^{-1}$

and

$V_{22.1}^{-1}=V_{22}^{-1}+V_{22}^{-1}V_{21}V_{11.2}^{-1}V_{12}V_{22}^{-1}$

I recently stumbled across a neat application of this whilst deriving full conditionals for a multivariate normal. Recall that if the data are partitioned into two blocks, $Y_{1},Y_{2}$, then the variance of the conditional distribution $Y_{1}|Y_{2},-$ is the Schur complement of the block $V_{22}$ of total variance matrix $V$, that is, the variance of the conditional distribution is $V_{11.2}=V_{11}-V_{12}V_{22}^{-1}V_{21}$ which is the variance of $Y_{1}$ subtracted by something corresponding to the reduction in uncertainty about $Y_{1}$ gained from the knowledge about $Y_{2}$. If, however, $V_{22}$ has the form of a Schur complement itself, then it may be possible to exploit the Woodbury identity above to considerably simplify the variance term. I came across this when I derived two very different-looking expressions for the conditional distribution and found them equivalent by the Woodbury identity. Consider the model

$\begin{bmatrix} Y_{1}\\ Y_{2} \end{bmatrix} = \begin{bmatrix} X_{1}\\ X_{2} \end{bmatrix}\beta_{ } + \varepsilon$

where

$\varepsilon \sim N\left( \begin{bmatrix}0\\ 0\end{bmatrix}, \sigma^{2} \begin{bmatrix}I_{1} & 0 \\ 0 & I_{2}\end{bmatrix} \right)$

$\beta_{ }| ,\sigma^{2} \sim N(0, \sigma^{2}\Lambda^{-1})$

.
I was seeking the distribution $Y_{1}| Y_{2},\sigma^{2}$ and arrived there through two different paths. The distributions derived looked very different, but they turned out to be equivalent upon considering the Woodbury identity.

## Method 1

This simply manipulates properties of the multivariate normal. Marginalizing over $\beta$ one gets

$Cov \begin{bmatrix} Y_{1} \\ Y_{2} \end{bmatrix} = \begin{bmatrix} X_{1} \\ X_{2} \end{bmatrix} Cov (\beta_{ }) \begin{bmatrix} X_{1}^{T} & X_{2}^{T} \end{bmatrix} + Cov(\varepsilon)$

$Cov \begin{bmatrix} Y_{1} \\ Y_{2} \end{bmatrix} = \sigma^{2}\begin{bmatrix} X_{1}\Lambda^{-1} X_{1}^{T} & X_{1}\Lambda^{-1} X_{2}^{T} \\ X_{2}\Lambda^{-1} X_{1}^{T} & X_{2}\Lambda^{-1} X_{2}^{T} \end{bmatrix} + \sigma^{2} \begin{bmatrix} I_{1} & 0 \\ 0 & I_{2} \end{bmatrix}$

.
Such that the distribution

$\begin{bmatrix} Y_{1}\\ Y_{2} \end{bmatrix}| ,\sigma^{2} \sim N \left( \begin{bmatrix} 0\\ 0 \end{bmatrix}, \sigma^{2} \left( \begin{bmatrix} I_{1}+ X_{1}\Lambda^{-1} X_{1}^{T} & X_{1}\Lambda^{-1} X_{2}^{T} \\ X_{2}\Lambda^{-1} X_{1}^{T} & I_{2}+ X_{2}\Lambda^{-1} X_{2}^{T} \end{bmatrix} \right) \right)$

It follows that the conditional distribution is
$Y_{1}| Y_{2} ,\sigma^{2} \sim N \left( X_{1}\Lambda^{-1} X_{2}^{T} \left[ X_{2}\Lambda^{-1} X_{2}^{T} + I_{2}\right]^{-1} Y_{2}, I_{1} + X_{1}\Lambda^{-1} X_{1}^{T} - X_{1}\Lambda^{-1} X_{2}^{T} \left[ I_{2} + X_{2}\Lambda^{-1} X_{2}^{T} \right]^{-1} X_{2}\Lambda^{-1} X_{1}^{T}\right).$
This looks a bit nasty, but notice that $V_{22}^{-1}$ looks like it too could be a Schur complement of some matrix.

## Method 2

An alternative route to this distribution is

$f( Y_{1}| Y_{2},\sigma^{2} )=\int f( Y_{1}|\sigma^{2} ,\beta_{ })\pi(\beta_{ }| Y_{2},\sigma^{2} )d\beta_{ }$

where

$\beta_{ }| Y_{2} ,\sigma^{2}\sim N \left( ( X_{2}^{T} X_{2}+\Lambda)^{-1} X_{2}^{T} Y_{2}, \sigma^{2}( X_{2}^{T} X_{2}+\Lambda)^{-1} \right).$

It follows that

$Y_{1}| Y_{2} ,\sigma^{2} \sim N\left( X_{1}( X_{2}^{T} X_{2}+\Lambda)^{-1} X_{2}^{T} Y_{2}, \sigma^{2} (I_{1} + X_{1} ( X_{2}^{T} X_{2}+\Lambda)^{-1} X_{1}^{T}) \right)$

which looks different from the distribution obtained through method 1. The expression for the variance is a lot neater. They are in fact identical by the Woodbury identity.

## Comparison

### Mean (Submitted by Michelle Leigh)

$\left[\Lambda+ X_{2}^TI_{2} X_{2}\right]^{-1} X_{2}^T\\ =\{\Lambda^{-1}-\Lambda^{-1} X_{2}^T\left[I_{2}+ X_{2}\Lambda^{-1} X_{2}^T\right]^{-1} X_{2}\Lambda^{-1}\} X_{2}^T\\ =\Lambda^{-1} X_{2}^T\left[I_{2}+ X_{2}\Lambda^{-1} X_{2}^T\right]^{-1}\left[I_{2}+ X_{2}\Lambda^{-1} X_{2}^T\right]-\Lambda^{-1} X_{2}^T\left[I_{2}+ X_{2}\Lambda^{-1} X_{2}^T\right]^{-1} X_{2}\Lambda^{-1} X_{2}^T\\ =\Lambda^{-1} X_{2}^T\left[I_{2}+ X_{2}\Lambda^{-1} X_{2}^T\right]^{-1}I_{2}$

So mean1=mean2.

### Variance

By the Woodbury Identity it follows that

$\Lambda^{-1} - \Lambda^{-1} X_{2}^{T} \left[ I_{2} + X_{2}\Lambda^{-1} X_{2}^{T} \right]^{-1} X_{2}\Lambda^{-1} = ( X_{2}^{T}I_{2} X_{2}+\Lambda)^{-1}.$

Therefore

$X_{1}\Lambda^{-1} X_{1}^{T}- X_{1}\Lambda^{-1} X_{2}^{T} \left[ I_{2}+ X_{2}\Lambda^{-1} X_{2}^{T} \right]^{-1} X_{2}\Lambda^{-1} X_{1}^{T}={ X_{1} ( X_{2}^{T} X_{2}+\Lambda)^{-1} X_{1}^{T}}\\$

and so variance1=variance2. The trick is recognizing the form of the formulas at the top of the page, then one can write the variance as a much neater expression.

• Very nice series of articles about schur compliment and etc. Appreciate them very much.

Anyways here is my proof of mean1=mean2
$\left[\Lambda+ X_{2}^TI_{2} X_{2}\right]^{-1} X_{2}^T\\ =\{\Lambda^{-1}-\Lambda^{-1} X_{2}^T\left[I_{2}+ X_{2}\Lambda^{-1} X_{2}^T\right]^{-1} X_{2}\Lambda^{-1}\} X_{2}^T\\ =\Lambda^{-1} X_{2}^T\left[I_{2}+ X_{2}\Lambda^{-1} X_{2}^T\right]^{-1}\left[I_{2}+ X_{2}\Lambda^{-1} X_{2}^T\right]-\Lambda^{-1} X_{2}^T\left[I_{2}+ X_{2}\Lambda^{-1} X_{2}^T\right]^{-1} X_{2}\Lambda^{-1} X_{2}^T\\ =\Lambda^{-1} X_{2}^T\left[I_{2}+ X_{2}\Lambda^{-1} X_{2}^T\right]^{-1}I_{2}$