# Bayesian Lasso

\begin{align*} p(Y_{o}|\beta,\phi)&=N(Y_{o}|1\alpha+X_{o}\beta,\phi^{-1} I_{n{o}})\\ \pi(\beta_{i}|\phi,\tau_{i}^{2})&=N(\beta_{i}|0, \phi^{-1}\tau_{i}^{2})\\ \pi(\tau_{i}^{2})&=Exp \left( \frac{\lambda}{2} \right)\\ \pi(\phi)&\propto \phi^{-1}\\ \pi(\alpha)&\propto 1\\ \end{align*}

Marginalizing over $$\alpha$$ equates to centering the observations and losing a degree of freedom and working with the centered $$Y_{o}$$.
Mixing over $$\tau_{i}^{2}$$ leads to a Laplace or Double Exponential prior on $$\beta_{i}$$ with rate parameter $$\sqrt{\phi\lambda}$$ as seen by considering the characteristic function
\begin{align*} \varphi_{\beta_{i}|\phi}(t)&=\int e^{jt\beta_{i}}\pi(\beta_{i}|\phi)d\beta_{i}\\ &=\int \int e^{jt\beta_{i}}\pi(\beta_{i}|\phi,\tau_{i}^{2})\pi(\tau_{i}^{2})d\tau_{i} d\beta_{i}\\ &=\frac{\lambda}{2} \int e^{-\frac{1}{2}\frac{t^{2}}{\phi}\tau_{i}^{2}}e^{-\frac{\lambda}{2}\tau_{i}^{2}}d\tau_{i}\\ &=\frac{\lambda}{\frac{t^{2}}{\phi}+\lambda}=\frac{\lambda\phi}{t^{2}+\lambda\phi} \end{align*}.

# EM Algorithm

The objective is to find the mode of the joint posterior $$\pi(\beta,\phi|Y_{o})$$. It is easier, however, to find the joint mode of $$\pi(\beta,\phi|Y_{o},\tau^{2})$$ and use EM to exploit the scale mixture representation.

\begin{align*} \log \pi(\beta,\phi|Y_{o},\tau^{2})=c+ \frac{n_o+p-3}{2}\log \phi -\frac{\phi}{2}||Y_{o}-X_{o}\beta||^{2}-\sum_{i=1}^{p}\frac{\phi}{2}\frac{1}{\tau_{i}^{2}}\beta^{2}_{i} \end{align*}

## Expectation

The expecation w.r.t. $$\tau_{i}^{2}$$ is handled as by
\begin{align*} &\frac{\lambda}{2}\int \frac{1}{\tau_{i}^{2}}\left( \frac{\phi}{2\pi\tau_{i}^{2}} \right)^{\frac{1}{2}}e^{-\frac{1}{2}\phi\beta_{i}^{2}\frac{1}{\tau_{i}^{2}}}e^{-\frac{\lambda}{2}\tau_{i}^{2}}d\tau_{i}^{2}\\ &\frac{\lambda}{2}\int \left( \frac{\phi}{2\pi[\tau_{i}^{2}]^{3}} \right)^{\frac{1}{2}}e^{-\frac{1}{2}\phi\beta_{i}^{2}\frac{1}{\tau_{i}^{2}}}e^{-\frac{\lambda}{2}\tau_{i}^{2}}d\tau_{i}^{2}\\ \end{align*}

This has the kernel of an Inverse Gaussian distribution with shape parameter $$\phi \beta_{i}^{2}$$ and mean $$\sqrt{\frac{\phi}{\lambda}}|\beta_{i}|$$

\begin{align*} &\frac{{\lambda}}{2|\beta_{i}|}\int \left( \frac{\beta_{i}^{2}\phi}{2\pi[\tau_{i}^{2}]^{3}} \right)^{\frac{1}{2}}e^{-\frac{1}{2}\phi\beta_{i}^{2}\frac{1}{\tau_{i}^{2}}}e^{-\frac{\lambda}{2}\tau_{i}^{2}}d\tau_{i}^{2}\\ &\frac{\lambda}{2|\beta_{i}|}e^{-\sqrt{\lambda\phi\beta_{i}^{2}}}\int \left( \frac{\beta_{i}^{2}\phi}{2\pi[\tau_{i}^{2}]^{3}} \right)^{\frac{1}{2}}e^{-\frac{1}{2}\phi\beta_{i}^{2}\frac{1}{\tau_{i}^{2}}}e^{-\frac{\lambda}{2}\tau_{i}^{2}}e^{\sqrt{\lambda\phi\beta_{i}^{2}}}d\tau_{i}^{2}\\ &\frac{\lambda}{2|\beta_{i}|}e^{-\sqrt{\lambda\phi\beta_{i}^{2}}}\\ \end{align*}

Normalization as follows

\begin{align*} &\frac{\lambda}{2}\int \left( \frac{\phi}{2\pi\tau_{i}^{2}} \right)^{\frac{1}{2}}e^{-\frac{1}{2}\phi\beta_{i}^{2}\frac{1}{\tau_{i}^{2}}}e^{-\frac{\lambda}{2}\tau_{i}^{2}}d\tau_{i}^{2}\\ &\frac{\lambda}{2}\int \tau_{i}^{2}\left( \frac{\phi}{2\pi[\tau_{i}^{2}]^{3}} \right)^{\frac{1}{2}}e^{-\frac{1}{2}\phi\beta_{i}^{2}\frac{1}{\tau_{i}^{2}}}e^{-\frac{\lambda}{2}\tau_{i}^{2}}d\tau_{i}^{2}\\ \end{align*}
\begin{align*} &\frac{\lambda}{2|\beta_{i}|}e^{-\sqrt{\lambda\phi\beta_{i}^{2}}}\sqrt{\frac{\phi}{\lambda}}|\beta_{i}|\\ \end{align*}

$$\Rightarrow \mathbb{E}\left[ \frac{1}{\tau_{i}^{2}} \right]=\sqrt{\frac{\lambda}{\phi^{t}}}\frac{1}{|\beta_{i}^{t}|}$$.
Let $$\Lambda^{t}=diag(\sqrt{\frac{\lambda}{\phi^{t}}}\frac{1}{|\beta_{1}^{t}|}, \dots, \sqrt{\frac{\lambda}{\phi^{t}}}\frac{1}{|\beta_{p}^{t}|})$$.

## Maximization

\begin{align*} &Q(\beta,\phi||\beta^{t},\phi^{t})=c+ \frac{n_o+p-3}{2}\log \phi -\frac{\phi}{2}||Y_{o}-X_{o}\beta||^{2} – \frac{\phi}{2}\beta^{T}\Lambda^{t}\beta\\ &=c+ \frac{n_o+p-3}{2}\log \phi -\frac{\phi}{2}||\beta-(X_{o}^{T}X_{o}+\Lambda^{t})^{-1}X_{o}^{T}Y_{o}||^{2}_{(X_{o}^{T}X_{o}+\Lambda^{t})}-\frac{\phi}{2}Y_{o}^{T}(I_{n_{o}}-X_{o}^{T}(X_{o}^{T}X_{o}+\Lambda^{t})^{-1}X_{o})Y_{o}\\ \end{align*}

\begin{align*} \beta^{t+1}&=(X_{o}^{T}X_{o}+\Lambda^{t})^{-1}X_{o}^{T}Y_{o}\\ \end{align*}

\begin{align*} \phi^{t+1}=\frac{n_{o}+p-3}{Y_{o}^{T}(I_{n_{o}}-X_{o}^{T}(X_{o}^{T}X_{o}+\Lambda^{t})^{-1}X_{o})Y_{o}} \end{align*}

## RCpp C++ Code

#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]

using namespace Rcpp;
using namespace arma;

double or_log_posterior_density(int no, int p, double lasso, const Col<double>& yo, const Mat<double>& xo, const Col<double>& B,double phi);

// [[Rcpp::export]]
List or_lasso_em(NumericVector ryo, NumericMatrix rxo, SEXP rlasso){

//Define Variables//
int p=rxo.ncol();
int no=rxo.nrow();
double lasso=Rcpp::as<double >(rlasso);

//Create Data//
arma::mat xo(rxo.begin(), no, p, false);
arma::colvec yo(ryo.begin(), ryo.size(), false);
yo-=mean(yo);

//Pre-Processing//
Col<double> xoyo=xo.t()*yo;
Col<double> B=xoyo/no;
Col<double> Babs=abs(B);
Mat<double> xoxo=xo.t()*xo;
Mat<double> D=eye(p,p);
Mat<double> Ip=eye(p,p);
double yoyo=dot(yo,yo);
double deltaB;
double deltaphi;
double phi=no/dot(yo-xo*B,yo-xo*B);
double lp;

//Create Trace Matrices
Mat<double> B_trace(p,20000);
Col<double> phi_trace(20000);
Col<double> lpd_trace(20000);

//Run EM Algorithm//
cout << "Beginning EM Algorithm" << endl;
int t=0;
B_trace.col(t)=B;
phi_trace(t)=phi;
lpd_trace(t)=or_log_posterior_density(no,p,lasso,yo,xo,B,phi);
do{
t=t+1;

lp=sqrt(lasso/phi);

Babs=abs(B);
D=diagmat(sqrt(Babs));
B=D*solve(D*xoxo*D+lp*Ip,D*xoyo);

phi=(no+p-3)/(yoyo-dot(xoyo,B));

//Store Values//
B_trace.col(t)=B;
phi_trace(t)=phi;
lpd_trace(t)=or_log_posterior_density(no,p,lasso,yo,xo,B,phi);

deltaB=dot(B_trace.col(t)-B_trace.col(t-1),B_trace.col(t)-B_trace.col(t-1));
deltaphi=phi_trace(t)-phi_trace(t-1);
} while((deltaB>0.00001 || deltaphi>0.00001) && t<19999);
cout << "EM Algorithm Converged in " << t << " Iterations" << endl;

//Resize Trace Matrices//
B_trace.resize(p,t);
phi_trace.resize(t);
lpd_trace.resize(t);

return Rcpp::List::create(
Rcpp::Named("B") = B,
Rcpp::Named("B_trace") = B_trace,
Rcpp::Named("phi") = phi,
Rcpp::Named("phi_trace") = phi_trace,
Rcpp::Named("lpd_trace") = lpd_trace
) ;

}

double or_log_posterior_density(int no, int p, double lasso, const Col<double>& yo, const Mat<double>& xo, const Col<double>& B,double phi){

double lpd;
lpd=(double)0.5*((double)no-1)*log(phi/(2*M_PI))-0.5*phi*dot(yo-xo*B,yo-xo*B)+0.5*(double)p*log(phi*lasso)-sqrt(phi*lasso)*sum(abs(B))-log(phi);
return(lpd);

}

## An Example in R

rm(list=ls())

#Generate Design Matrix
set.seed(3)
no=100
foo=rnorm(no,0,1)
sd=4
xo=cbind(foo+rnorm(no,0,sd),foo+rnorm(no,0,sd),foo+rnorm(no,0,sd),foo+rnorm(no,0,sd),foo+rnorm(no,0,sd),foo+rnorm(no,0,sd),foo+rnorm(no,0,sd),foo+rnorm(no,0,sd))
for(i in 1:40) xo=cbind(xo,foo+rnorm(no,0,sd),foo+rnorm(no,0,sd),foo+rnorm(no,0,sd),foo+rnorm(no,0,sd),foo+rnorm(no,0,sd),foo+rnorm(no,0,sd),foo+rnorm(no,0,sd))

#Scale and Center Design Matrix
xo=scale(xo,center=T,scale=F)
var=apply(xo^2,2,sum)
xo=scale(xo,center=F,scale=sqrt(var/no))

#Generate Data under True Model
p=dim(xo)[2]
b=rep(0,p)
b[1]=1
b[2]=2
b[3]=3
b[4]=4
b[5]=5
xo%*%b
yo=xo%*%b+rnorm(no,0,1)
yo=yo-mean(yo)

#Run Lasso
or_lasso=or_lasso_em(yo,xo,100)

#Posterior Density Increasing at Every Iteration?
or_lasso$lpd_trace[2:dim(or_lasso$lpd_trace)[1],1]-or_lasso$lpd_trace[1:(dim(or_lasso$lpd_trace)[1]-1),1]>=0
mean(or_lasso$lpd_trace[2:dim(or_lasso$lpd_trace)[1],1]-or_lasso$lpd_trace[1:(dim(or_lasso$lpd_trace)[1]-1),1]>=0)

#Plot Results
plot(or_lasso\$B,ylab=expression(beta[lasso]),main="Lasso MAP Estimate of Regression Coefficients")

Park, T., & Casella, G. (2008). The Bayesian Lasso Journal of the American Statistical Association, 103 (482), 681-686 DOI: 10.1198/016214508000000337
Figueiredo M.A.T. (2003). Adaptive sparseness for supervised learning, IEEE Transactions on Pattern Analysis and Machine Intelligence, 25 (9) 1150-1159. DOI: http://dx.doi.org/10.1109/tpami.2003.1227989
Better Shrinkage Priors:
Armagan A., Dunson D.B. & Lee J. GENERALIZED DOUBLE PARETO SHRINKAGE., Statistica Sinica, PMID:

• Deb

## I get an error while running this
## Error in sourceCpp(“K:/BigData/Rcpp/Bayesian/lasso.cpp”) :
## Error occurred building shared library.

library(Rcpp)
library(RcppArmadillo)

sourceCpp(“K:/BigData/Rcpp/Bayesian/lasso.cpp”)

• admin

Hi Deb, thanks for trying my code! Apologies that it did not work for you, it seems when posting the code up on the blog that I failed copy all of it in its entirety… there was the definition of the log_posterior_density function hiding at the end of the file after the main function, separated by a few lines, that I overlooked (next time I will do select-all). The code updated code is now complete, please try again ðŸ™‚

• Scott Akenhead

Prettier here than at r-bloggers