Marginal Likelihood and Model Evidence in Bayesian Regression

The marginal likelihood or the model evidence is the probability of observing the data given a specific model. This is used in Bayesian model selection and comparison when computing Bayes factor between models, which is simply the ratio of the two respective marginal likelihoods. This can be used to select which covariates to include in a linear model for describing data. Consider the usual normal-inverse gamma prior specification on the regression parameters and variance.

Normal Inverse-Gamma Priors

\pi(\beta,\sigma^{2})=\pi(\beta|\sigma^{2})\pi(\sigma^{2})\\  \pi(\beta|\sigma^{2})=N(\mu_{0},\sigma^{2} \Lambda_{0}^{-1})\\  \pi(\sigma^{2})=IG(a_{0},b_{0})\\  f(Y|\beta,\sigma^{2})=N(X\beta,\sigma^{2})\\

Now construct the joint, out of which the regression coefficients and variance will be integrated.

f(Y|\beta,\sigma^{2})\pi(\beta|\sigma^{2})\pi(\sigma^{2})=  \left( \frac{1}{2\pi} \right)^{\frac{n}{2}} \left( \frac{1}{\sigma^{2}}\right)^{\frac{n}{2}} e^{-\frac{1}{2\sigma^{2}}(Y-X\beta)'(Y-X\beta)}   \left( \frac{1}{2\pi}\right) ^{\frac{k}{2}}\left( \frac{1}{\sigma^{2}} \right)^{\frac{k}{2}} |\Lambda_{0}|^{\frac{1}{2}} e^{-\frac{1}{2\sigma^{2}}(\beta-\mu_{0})'\Lambda_{0}(\beta-\mu_{0})}  \frac{b_{0}^{a_{0}}}{\Gamma(a_{0})}\left(\frac{1}{\sigma^{2}}\right)^{a_{0}-1}e^{-\frac{b_{0}}{\sigma^{2}}}\\

This can be made neater by rearrangement. Consider first the two exponentials above. The exponents can be combined and rewritten in such a manner that a normal kernel in Beta is recognizable.

(Y-X\beta)'(Y-X\beta)+(\beta-\mu_{0})\Lambda_{0}(\beta_{0}-\mu_{0})=\\  \beta'(X'X+\Lambda_{0})\beta-2\beta'(X'X\hat{\beta}+\Lambda_{0}\mu_{0})   +\mu_{0}'\Lambda_{0}\mu_{0}+\hat{\beta}X'X\hat{\beta}+Y'(I-P)Y=\\  \beta'(X'X+\Lambda_{0})\beta-2\beta'(X'X\hat{\beta}+\Lambda_{0}\mu_{0})   +\mu_{0}'\Lambda_{0}\mu_{0}+Y'Y

Define:
\Lambda_{n}=X'X+\Lambda_{0}\\  \mu_{n}=(X'X+\Lambda_{0})^{-1}(X'X\hat{\beta}+\Lambda_{0}\mu_{0})=\Lambda_{n}^{-1}(X'X\hat{\beta}+\Lambda_{0}\mu_{0})
then the above can written as:

(Y-X\beta)'(Y-X\beta)+(\beta-\mu_{0})\Lambda_{0}(\beta_{0}-\mu_{0})=\\  \beta'\Lambda_{n}\beta-2\beta'\Lambda_{n}\mu_{n}+\mu_{0}'\Lambda_{0}\mu_{0}+Y'Y=\\  (\beta-\mu_{n})\Lambda_{n}(\beta-\mu_{n})-\mu_{n}\Lambda_{n}\mu_{n}+\mu_{0}'\Lambda_{0}\mu_{0}+Y'Y\\
where the last line was achieved by completing the square.

f(Y|\beta,\sigma^{2})\pi(\beta|\sigma^{2})\pi(\sigma^{2})=f(Y,\beta,\sigma^{2})=\\  \left( \frac{1}{2\pi} \right)^{\frac{n}{2}} \left( \frac{1}{\sigma^{2}}\right)^{\frac{n}{2}} e^{-\frac{1}{2\sigma^{2}}(Y'Y-\mu_{n}\Lambda_{n}\mu_{n}+\mu_{0}'\Lambda_{0}\mu_{0})}  \left( \frac{1}{2\pi}\right) ^{\frac{k}{2}}\left( \frac{1}{\sigma^{2}} \right)^{\frac{k}{2}} |\Lambda_{0}|^{\frac{1}{2}} e^{-\frac{1}{2\sigma^{2}}(\beta-\mu_{n})\Lambda_{n}(\beta-\mu_{n})}   \frac{b_{0}^{a_{0}}}{\Gamma(a_{0})}\left(\frac{1}{\sigma^{2}}\right)^{a_{0}-1}e^{-\frac{b_{0}}{\sigma^{2}}}\\

Recognizing the kernel of a normal for Beta and knowing the normalizing constant the integral over Beta is now simple. Integrating out Beta yields:

f(Y,\sigma^{2})=  \left( \frac{1}{2\pi} \right)^{\frac{n}{2}} \left( \frac{1}{\sigma^{2}}\right)^{\frac{n}{2}} e^{-\frac{1}{2\sigma^{2}}(Y'Y-\mu_{n}\Lambda_{n}\mu_{n}+\mu_{0}'\Lambda_{0}\mu_{0})}   \frac{|\Lambda_{0}|^{\frac{1}{2}}}{|\Lambda_{n}|^{\frac{1}{2}}}   \frac{b_{0}^{a_{0}}}{\Gamma(a_{0})}\left(\frac{1}{\sigma^{2}}\right)^{a_{0}-1}e^{-\frac{b_{0}}{\sigma^{2}}}\\

Following a similar strategy as that for Beta, this can be rearranged into something containing a Gamma kernel for sigma squared.

f(Y,\sigma^{2})=\left( \frac{1}{2\pi} \right)^{\frac{n}{2}} \left( \frac{1}{\sigma^{2}}\right)^{\frac{n}{2}+a_{0}-1} e^{-\frac{b_{0}+\frac{1}{2}(Y'Y-\mu_{n}'\Lambda_{n}\mu_{n}+\mu_{0}'\Lambda_{0}\mu_{0})}{\sigma^{2}}}    \frac{|\Lambda_{0}|^{\frac{1}{2}}}{|\Lambda_{n}|^{\frac{1}{2}}}   \frac{b_{0}^{a_{0}}}{\Gamma(a_{0})}

Define
a_{n}=a_{0}+\frac{n}{2}\\  b_{n}=b_{0}+\frac{1}{2}(Y'Y-\mu_{n}\Lambda_{n}\mu_{n}+\mu_{0}'\Lambda_{0}\mu_{0})
Then the above reads

f(Y,\sigma^{2})=\left( \frac{1}{2\pi} \right)^{\frac{n}{2}}    \frac{|\Lambda_{0}|^{\frac{1}{2}}}{|\Lambda_{n}|^{\frac{1}{2}}}  \frac{b_{0}^{a_{0}}}{\Gamma(a_{0})}   \left( \frac{1}{\sigma^{2}}\right)^{a_{n}-1} e^{-\frac{b_{n}}{\sigma^{2}}}\\

Recognizing the kernel of a Gamma makes the integration easy, which yields the final result for the marginal likelihood below.

Marginal Likelihood/Model Evidence

m(Y)=\left( \frac{1}{2\pi} \right)^{\frac{n}{2}}    \frac{|\Lambda_{0}|^{\frac{1}{2}}}{|\Lambda_{n}|^{\frac{1}{2}}}  \frac{b_{0}^{a_{0}}}{b_{n}^{a_{n}}}  \frac{\Gamma(a_{n})}{\Gamma(a_{0})}\\

An example application of this can be found here in linear model selection.