Ive Moved

Partial Application for Functions in Julia

admin — Thu, 03 Nov 2016 17:07:09 +0000

Julia is purportedly a multi-paradigm language but I find their support for functional paradigms to be lacking. One feature that I looked for was Currying or Partial Application which corresponds to converting a function of multiple arguments into a sequence of single argument functions and taking a multiple argument function and fixing some of the arguments to produce a new function of lesser arity respectively. In Clojure one has the partial function for the latter. Here is my attempt to emulate this behaviour in Julia.

Partial Application

function partial(f,a...)
        ( (b...) -> f(a...,b...) )
end

The function returns a lambda where the a… parameters are fixed. If you don’t know what the ellipsis do, check the documentation for “splat”. One can check the behavour is as expected

julia> function testf(x,y,z)
         2*x+y*y+x*y*z
       end
testf (generic function with 1 method)
julia> testf(2,3,4)
37
julia> partial(testf,2)(3,4)
37
julia> partial(testf,2,3)(4)
37

Of course you could just use a lambda.

The post Partial Application for Functions in Julia appeared first on Ive Moved.

Newtons Iteration in Scala, Clojure and Haskell Comparison

admin — Sun, 16 Oct 2016 13:19:28 +0000

Newton’s iterations is an algorithm for computing the square root of a number n. It also makes a nice “Hello World” program if you find yourself doing a lot of optimization. Here is how it looks in Haskell, Clojure and Scala

Haskell

update y x = 0.5*(x+y/x)
foo = (iterate (update 100) 1000)
foobar = zip foo (tail foo)
notconverged tol x = if (abs(a-b)/abs(b) > tol) then True else False
  where
    a = fst x
    b = snd x
map (\x -> fst x) $ takeWhile (notconverged (0.000001)) foobar

Clojure

(use '(incanter core stats charts io))
 
(defn sqroot [y x] (* 0.5 (+ x (/ y x))))
(def updater (partial sqroot 16))
(defn converged? [x] (> (abs (- (first x) (last x))) 0.0000001))
(map (fn [x] (first x)) (take-while converged? (partition 2 1 (iterate updater 100))))

Scala

import breeze.numerics.abs
 
def UpdateCurry(r: Double)(x: Double) = 0.5*(x+r/x)
 
val tolerance=0.0000001
val desiredSQRoot=16
val Update=UpdateCurry(desiredSQRoot)_
 
val iterates = Iterator.iterate(1.0) { x => Update(x) }
 
def ConvergedCurry(tol: Double)(x: Seq[Double]): Boolean = abs(x(0)-x(1))/abs(x(0)) !Converged(x)).toList

The post Newtons Iteration in Scala, Clojure and Haskell Comparison appeared first on Ive Moved.

MALA – Metropolis Adjusted Langevin Algorithm in Julia

admin — Mon, 03 Oct 2016 17:28:24 +0000

Metropolis Adjusted Langevin Algorithm (MALA)

Haven’t dumped much code here in a while. Here’s a Julia implementation of MALA with an arbitrary preconditioning matrix M. Potentially I might use this in the future.

Generic Julia Implementation

Arguments are a function to evaluate the logdensity, function to evaluate the gradient, a step size h, a preconditioning matrix M, number of iterations and an initial parameter value.

function mala(logdensity,gradient,h,M,niter,θinit)                                                                                                                                                                 
        function gradientStep(θ,t)                                                                                                                                                                                 
                θ-t*M*gradient(θ)                                                                                                                                                                                  
        end                                                                                                                                                                                                        
        θtrace=Array{Float64}(length(θinit),niter)                                                                                                                                                                 
        θ=θinit                                                                                                                                                                                                    
        θtrace[:,1]=θinit                                                                                                                                                                                          
        for i=2:niter                                                                                                                                                                                              
                θold=θ                                                                                                                                                                                             
                θ=rand(MvNormal(gradientStep(θ,0.5*h),h*M))                                                                                                                                                        
                d=logdensity(θ) - logdensity(θold) + logpdf(MvNormal(gradientStep(θ,0.5*h),h*M),θold) - logpdf(MvNormal(gradientStep(θold,0.5*h),h*M),θ)                                                           
                if(!(log(rand(Uniform(0,1)))
Don’t forget to use the Distributions package of Julia.
Bivariate Normal Example
Consider a Bivariate Normal as an example
ρ²=0.8                                                                                                                                                                                                             
Σ=[1 ρ²;ρ² 1]                                                                                                                                                                                                      
                                                                                                                                                                                                                   
function logdensity(θ)                                                                                                                                                                                             
        logpdf(MvNormal(Σ),θ)                                                                                                                                                                                      
end                                                                                                                                                                                                                
                                                                                                                                                                                                                   
function gradient(θ)                                                                                                                                                                                               
        Σ\θ                                                                                                                                                                                                        
end   

and now the code to generate the plots and results
niter=1000                                                                                                                                                                                                         
h=1/eigs(inv(Σ),nev=1)[1][1]                                                                                                                                                                                       
@time draws=mala(logdensity,gradient,h,eye(2),niter,[5,50]);                                                                                                                                                       
sum(map(t -> draws[:,t]!=draws[:,t-1],2:niter))/(niter-1)                                                                                                                                                          
@time pdraws=mala(logdensity,gradient,h,Σ,niter,[5,50]);                                                                                                                                                           
sum(map(t -> pdraws[:,t]!=pdraws[:,t-1],2:niter))/(niter-1)                                                                                                                                                        
                                                                                                                                                                                                                   
function logdensity2d(x,y)                                                                                                                                                                                         
        logdensity([x,y])                                                                                                                                                                                          
end                                                                                                                                                                                                                
x = -30:0.1:30                                                                                                                                                                                                     
y = -30:0.1:50                                                                                                                                                                                                     
X = repmat(x',length(y),1)                                                                                                                                                                                         
Y = repmat(y,1,length(x))                                                                                                                                                                                          
Z = map(logdensity2d,Y,X)                                                                                                                                                                                          
p1 = contour(x,y,Z,200)                                                                                                                                                                                            
plot(vec(draws[1,:]),vec(draws[2,:]))                                                                                                                                                                              
plot(vec(pdraws[1,:]),vec(pdraws[2,:])) 

pdraws uses the covariance matrix Σ as the preconditioning matrix, whereas the first uses an identity matrix, resulting in the original MALA algorithm. The traceplot of draws from MALA and preconditioned MALA are shown in blue and green respectively…

Traceplots for MALA and preconditioned MALA

Effective Sample Sizes in R
We can use the julia “RCall” package to switch over to R and use the coda library to evaluate the minimum effective sample size for both of these MCMC algorithms.
julia> library(coda)

R> library(coda)

R> min(effectiveSize($(draws')))
[1] 22.02418

R> min(effectiveSize($(pdraws')))
[1] 50.85163

I didn’t tune the step size h in this example at all (you should).
The post MALA – Metropolis Adjusted Langevin Algorithm in Julia appeared first on Ive Moved.

Passing Julia Type to C Function as Struct

admin — Wed, 24 Feb 2016 18:28:06 +0000

There seems to be hardly any examples to be found about calling C from Julia (perhaps because people feel no need to do so). Moreover the docs are, at least to me, quite cryptic. So if anyone wants to get started here is a minimal example.

Julia Code:

type mystruct                                                                                                                                                                                                      
        n::Int32                                                                                                                                                                                                   
end

C Code:

struct mystruct{
	int n;
};
void structure(struct mystruct * input);
void sructure(struct mystruct * input){
	(*input).n=99;
}

Compilation:
gcc -fPIC -shared rosetta.c -o rosetta.so

Execution:
julia> test=mystruct(10) mystruct(10)


julia> ccall((:structure, "rosetta.so"), Void, (Ref{mystruct},), test);

julia> test mystruct(99)

The post Passing Julia Type to C Function as Struct appeared first on Ive Moved.

Send Lines of Code from Vim to R/Julia/Python REPL

admin — Sun, 29 Mar 2015 14:36:42 +0000

Vim + Tmux + Slime Workflow

Editing code in anything other than Vim seems to drain my coding stamina rapidly, but when working with a language such as R that has an interactive REPL (read evaluate print loop) it is often useful to send lines of code thereto for purposes of debugging. I have used the Vim-R-Plugin for some time but recently my differences with R have led me to Julia, and so I have been looking for a way to send code from Vim to Julia. My search has resulted in vim-slime.

Vim Slime

vim-slime is absolutely great, it’s a plugin for vim that lets you send text to screen or tmux. This is perfect for programming any language that has an REPL, simply load a session up in a tmux, open up vim, tell it what tmux window and session you want to send text to and away you go. This will work for any language!
A great post can be found here.

Edit Locally, Execute Remotely

I’ve long since desired to be able to do this. It would be great if you could edit the file locally, but have the Julia session execute on a remote machine, say my office computer, so that all the heavy lifting is done by the remote machine allowing me to get on with my own personal things on my local machine. Vim-slime provides an elegant solution. Simply start a tmux session, ssh to your remote machine, start up an interactive REPL of the language of your choice, open another terminal locally and start vim on your code and now you can send lines to your remote R/Julia/Python session! This has dramatically improved my workflow as my office computer is much more powerful than my laptop.

send code from vim to r python julia etc repl

The post Send Lines of Code from Vim to R/Julia/Python REPL appeared first on Ive Moved.

C++ Merge Sort Algorithm

admin — Mon, 05 Jan 2015 22:55:09 +0000

C++ Merge Sort Algorithm

#include 
#include 
#include 
#include 

using namespace std;

vector merge_sort(const vector& input)
{
	if(input.size()<=1) return input;
	vector output(input.size());

	//Split Vector//
	int midpoint=0.5*input.size();
	vector input_left(input.begin(),input.begin()+midpoint);
	vector input_right(input.begin()+midpoint,input.end());

	input_left=merge_sort(input_left);
	input_right=merge_sort(input_right);
	merge(input_left.begin(),input_left.end(),input_right.begin(),input_right.end(),output.begin());

	return output;
}

int main(){

	//Create unsorted vector of ints
	vector unsorted(40);
	iota(unsorted.begin(),unsorted.end(),-20);
	shuffle(unsorted.begin(),unsorted.end(),default_random_engine());

	//Perform merge_sort//
	vector sorted=merge_sort(unsorted);

	//Display results//
	cout << "Unsorted: " <<  endl;
	for(auto value:unsorted)  cout << value << " ";
	cout <<  endl;
	cout << "Sorted: " <<  endl;
	for(auto value:sorted)  cout << value << " ";
	cout <<  endl;

}

stdout

Unsorted:
-2 14 7 -15 -9 -17 -8 -1 13 1 -10 -7 16 -19 6 2 -12 -11 8 -18 -14 10 5 4 17 12 15 -16 -5 18 -4 -3 -6 -20 0 3 9 -13 11 19
Sorted:
-20 -19 -18 -17 -16 -15 -14 -13 -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

The post C++ Merge Sort Algorithm appeared first on Ive Moved.

Generate Random Inverse Gaussian in R

admin — Sat, 20 Sep 2014 21:57:19 +0000

Needed to generate draws from an inverse Gaussian today, so I wrote the following Rcpp code:

#include 
// [[Rcpp::depends(RcppArmadillo)]]

using namespace Rcpp;
using namespace arma;

// [[Rcpp::export]]
Col rrinvgauss(int n, double mu, double lambda){

	Col random_vector(n);
	double z,y,x,u;

	for(int i=0; i
It seems to be faster than existing implementations such as rig from mgcv and rinvgauss from statmod packages.
library(Rcpp) 
library(RcppArmadillo)
library(rbenchmark)
library(statmod)
library(mgcv)
sourceCpp("rrinvgauss.cpp")
n=10000 
benchmark(rig(n,1,1),rinvgauss(n,1,1),rrinvgauss(n,1,1),replications=100)


rename rrinvgauss as desired.
The post Generate Random Inverse Gaussian in R appeared first on Ive Moved.

Generalized Double Pareto Priors for Regression

admin — Thu, 11 Sep 2014 01:45:58 +0000

This post is a review of the “GENERALIZED DOUBLE PARETO SHRINKAGE” Statistica Sinica (2012) paper by Armagan, Dunson and Lee.

Consider the regression model $Y=X\beta+\varepsilon$ where we put a generalized double pareto distribution as the prior on the regression coefficients $\beta$. The GDP distribution has density
$$\begin{equation}
f(\beta|\xi,\alpha)=\frac{1}{2\xi}\left( 1+\frac{|\beta|}{\alpha\xi} \right)^{-(\alpha+1)}.
\label{}
\end{equation}$$

GDP as Scale Mixture of Normals

The GDP distribution can be conveniently represented as a scale mixture of normals. Let
$$\begin{align*}
\beta_{i}|\phi,\tau_{i} &\sim N(0,\phi^{-1}\tau_{i})\\
\tau_{i}|\lambda_{i}&\sim Exp(\frac{\lambda_{i}^{2}}{2})\\
\lambda_{i}&\sim Ga(\alpha,\eta)\\
\end{align*}$$
then $\beta|\phi \sim GDP(\xi=\frac{\eta}{\sqrt{\phi}\alpha},\alpha)$.
To see this first note that $\beta_{i}|\phi,\lambda_{i}$ has a Laplace or Double Exponential distribution with rate parameter $\sqrt{\phi}\lambda_{i}$.
$$\begin{align*}
p(\beta_{i}|\phi,\lambda_{i})&=\int p(\beta_{i}|\phi,\tau_{i})p(\tau_{i}|\lambda_{i})d\tau_{i}\\
\psi(t)&=\int e^{it\beta_{i}} \int p(\beta_{i}|\phi,\tau_{i})p(\tau_{i}|\lambda_{i})d\tau_{i} d\beta_{i}\\
&=\int \int e^{it\beta_{i}}p(\beta_{i}|\phi,\tau_{i})d\beta_{i}p(\tau_{i}|\lambda_{i})d\tau_{i}\\
&=\int e^{-\frac{1}{2}\frac{\tau_{i}}{\phi}t^{2}}p(\tau_{i}|\lambda_{i})d\tau_{i}\\
&=\frac{\lambda_{i}^{2}}{2} \int e^{-\frac{1}{2}(\frac{t^{2}}{\phi}+\frac{\lambda_{i}^{2}}{2})\tau_{i}}d\tau_{i}\\
&=\frac{\phi\lambda_{i}^{2}}{t^{2}+\phi\lambda_{i}^{2}},
\end{align*}$$
which is the characteristic function of a Double Exponential distribution with rate parameter $\sqrt{\phi}\lambda_{i}$.
Lastly
$$\begin{align*}
p(\beta_{i}|\phi)&=\int p(\beta_{i}|\phi,\lambda_{i})p(\lambda_{i})d\lambda_{i}\\
&=\frac{1}{2}\sqrt{\phi}\frac{\eta^{\alpha}}{\Gamma(\alpha)}\frac{\Gamma(\alpha+1)}{(\eta+\sqrt{\phi}|\beta_{i}|)^{\alpha+1}}\\
&=\frac{1}{2}\frac{\sqrt{\phi}\alpha}{\eta}\left( 1+\frac{\sqrt{\phi}\alpha}{\eta}\frac{|\beta_{i}|}{\alpha} \right)^{-(\alpha+1)},
\end{align*}$$
which is the density of a $GDP(\xi=\frac{\eta}{\sqrt{\phi}\alpha},\alpha)$.

EM Algorithm

$\tau_{i}$ and $\lambda_{i}$ are treated as missing data for each $i$.
\begin{align*}
Q(\beta,\phi||\beta^{(t)},\phi^{(t)})&=c+\mathbb{E}_{\tau,\lambda}\left[ \log p(\beta,\phi|Y,\tau,\lambda)|\beta^{(t)},\phi^{(t)} \right]\\
&=\frac{n+p-3}{2}\log\phi – \frac{\phi}{2}||Y-X\beta||^{2}-\frac{\phi}{2}\sum_{i=1}^{p}\beta_{i}^{2}\mathbb{E}\left[ \frac{1}{\tau_{i}} \right]\\
\end{align*}

Expectation

For the iterated expectation one needs the distribution $\tau_{i}|\lambda_{i},\beta_{i},\phi$ and $\lambda_{i}|\beta_{i},\phi$.
\begin{align*}
p(\tau_{i}|\beta_{i},\lambda_{i},\phi)&\propto p(\beta_{i}|\phi,\tau_{i})p(\tau_{i}|\lambda_{i})\\
&\propto \left( \frac{1}{\tau_{i}} \right)^{\frac{1}{2}}e^{-\frac{1}{2}(\frac{\phi \beta_{i}^{2}}{\tau_{i}}+\lambda_{i}^{2}\tau_{i})}
\end{align*}
This is the kernel of a Generalized Inverse Gaussian distribution, specifically $p(\tau_{i}|\beta_{i},\lambda_{i},\phi)=GIG(\tau_{i}:\lambda_{i}^{2},\phi \beta_{i}^{2},\frac{1}{2})$.
By a standard change of variables it follows that $p(\frac{1}{\tau_{i}}|\beta_{i},\lambda_{i},\phi)=IG(\frac{1}{\tau_{i}}:\sqrt{\frac{\lambda_{i}^{2}}{\phi \beta_{i}^{2}}},\lambda_{i}^{2})$ and so $\mathbb{E}\left[ \frac{1}{\tau_{i}}|\lambda_{i},\beta^{(t)},\phi^{(t)} \right]=\frac{\lambda_{i}}{\sqrt{\phi^{(t)}}|\beta_{i}^{(t)}|}$.

Recall that $p(\beta_{i}|\phi,\lambda_{i})$ has a double exponential distribution with rate $\sqrt{\phi}\lambda_{i}$.
Hence from $p(\lambda_{i}|\beta_{i},\phi)\propto p(\beta_{i}|\lambda_{i},\phi)p(\lambda_{i})$ it follows that $\lambda_{i}|\beta_{i},\phi \sim Ga(\alpha+1,\eta+\sqrt{\phi}|\beta_{i}|)$, then performing the expectation with respect to $\lambda_{i}$ yields
\begin{align*}
\mathbb{E}\left[ \frac{1}{\tau_{i}}|\beta^{(t)},\phi^{(t)} \right]=\left( \frac{\alpha+1}{\eta+\sqrt{\phi^{t}}|\beta_{i}^{(t)}|} \right)\left( \frac{1}{\sqrt{\phi^{(t)}}|\beta_{i}^{(t)}|} \right)
\end{align*}

Maximization

Writing $D^{(t)}=diag(\mathbb{E}[\frac{1}{\tau_{1}}],\dots,\mathbb{E}[\frac{1}{\tau_{p}}])$ the function to maximize is
\begin{align*}
Q(\beta,\phi||\beta^{(t)},\phi^{(t)})&=c+\mathbb{E}_{\tau,\lambda}\left[ \log p(\beta,\phi|Y,\tau,\lambda)|\beta^{(t)},\phi^{(t)} \right]\\
&=\frac{n+p-3}{2}\log\phi – \frac{\phi}{2}||Y-X\beta||^{2}-\frac{\phi}{2}\beta^{‘}D^{(t)}\beta,\\
\end{align*}
which is maximized by letting
\begin{align*}
\beta^{(t+1)}&=(X^{‘}X+D^{(t)})^{-1}X^{‘}Y\\
\phi^{(t+1)}&=\frac{n+p-3}{Y^{‘}(I-X(X^{‘}X+D^{(t)})^{-1}X^{‘})Y}\\
&=\frac{n+p-3}{||Y-X\beta^{(t+1)}||^{2}+\beta^{(t+1)’}D^(t)\beta^{(t+1)}}\\
\end{align*}

R CPP Code

#include 
// [[Rcpp::depends(RcppArmadillo)]]

using namespace Rcpp;
using namespace arma;

double gdp_log_posterior_density(int no, int p, double alpha, double eta, const Col& yo, const Mat& xo, const Col& B,double phi);

// [[Rcpp::export]]
List gdp_em(NumericVector ryo, NumericMatrix rxo, SEXP ralpha, SEXP reta){

	//Define Variables//
	int p=rxo.ncol();
	int no=rxo.nrow();
	double eta=Rcpp::as(reta);
	double alpha=Rcpp::as(ralpha);

	//Create Data//
	arma::mat xo(rxo.begin(), no, p, false);
	arma::colvec yo(ryo.begin(), ryo.size(), false);
	yo-=mean(yo);

	//Pre-Processing//
	Col xoyo=xo.t()*yo;
	Col B=xoyo/no;
	Col Babs=abs(B);
	Mat xoxo=xo.t()*xo;
	Mat D=eye(p,p);
	Mat Ip=eye(p,p);
	double yoyo=dot(yo,yo);
	double deltaB;
	double deltaphi;
	double phi=no/dot(yo-xo*B,yo-xo*B);
	double lp;

	//Create Trace Matrices
	Mat B_trace(p,20000);
	Col phi_trace(20000);
	Col lpd_trace(20000);

	//Run EM Algorithm//
	cout << "Beginning EM Algorithm" << endl;
	int t=0;
	B_trace.col(t)=B;
	phi_trace(t)=phi;
	lpd_trace(t)=gdp_log_posterior_density(no,p,alpha,eta,yo,xo,B,phi);
	do{
		t=t+1;


		Babs=abs(B);
		D=diagmat(sqrt(((eta+sqrt(phi)*Babs)%(sqrt(phi)*Babs))/(alpha+1)));
		B=D*solve(D*xoxo*D+Ip,D*xoyo);

		phi=(no+p-3)/(yoyo-dot(xoyo,B));

		//Store Values//
		B_trace.col(t)=B;
		phi_trace(t)=phi;
		lpd_trace(t)=gdp_log_posterior_density(no,p,alpha,eta,yo,xo,B,phi);

		deltaB=dot(B_trace.col(t)-B_trace.col(t-1),B_trace.col(t)-B_trace.col(t-1));
		deltaphi=phi_trace(t)-phi_trace(t-1);
	} while((deltaB>0.00001 || deltaphi>0.00001) && t<19999);
	cout << "EM Algorithm Converged in " << t << " Iterations" << endl;

	//Resize Trace Matrices//
	B_trace.resize(p,t);
	phi_trace.resize(t);
	lpd_trace.resize(t);

	return Rcpp::List::create(
			Rcpp::Named("B") = B,
			Rcpp::Named("B_trace") = B_trace,
			Rcpp::Named("phi") = phi,
			Rcpp::Named("phi_trace") = phi_trace,
			Rcpp::Named("lpd_trace") = lpd_trace
			) ;

}



double gdp_log_posterior_density(int no, int p, double alpha, double eta, const Col& yo, const Mat& xo, const Col& B,double phi){

	double lpd;
	double xi=eta/(sqrt(phi)*alpha);
	lpd=(double)0.5*((double)no-1)*log(phi/(2*M_PI))-p*log(2*xi)-(alpha+1)*sum(log(1+abs(B)/(alpha*xi)))-0.5*phi*dot(yo-xo*B,yo-xo*B)-log(phi);
	return(lpd);

}

An Example in R

rm(list=ls())
library(Rcpp)
library(RcppArmadillo)
sourceCpp("src/gdp_em.cpp")

#Generate Design Matrix
set.seed(3)
no=100
foo=rnorm(no,0,1)
sd=4
xo=cbind(foo+rnorm(no,0,sd),foo+rnorm(no,0,sd),foo+rnorm(no,0,sd),foo+rnorm(no,0,sd),foo+rnorm(no,0,sd),foo+rnorm(no,0,sd),foo+rnorm(no,0,sd),foo+rnorm(no,0,sd))
for(i in 1:40) xo=cbind(xo,foo+rnorm(no,0,sd),foo+rnorm(no,0,sd),foo+rnorm(no,0,sd),foo+rnorm(no,0,sd),foo+rnorm(no,0,sd),foo+rnorm(no,0,sd),foo+rnorm(no,0,sd))

#Scale and Center Design Matrix
xo=scale(xo,center=T,scale=F)
var=apply(xo^2,2,sum)
xo=scale(xo,center=F,scale=sqrt(var/no))

#Generate Data under True Model
p=dim(xo)[2]
b=rep(0,p)
b[1]=1
b[2]=2
b[3]=3
b[4]=4
b[5]=5
xo%*%b
yo=xo%*%b+rnorm(no,0,1)
yo=yo-mean(yo)

#Run GDP
gdp=gdp_em(yo,xo,100,100)

#Posterior Density Increasing at Every Iteration?
gdp$lpd_trace[2:dim(gdp$lpd_trace)[1],1]-gdp$lpd_trace[1:(dim(gdp$lpd_trace)[1]-1),1]>=0
mean(gdp$lpd_trace[2:dim(gdp$lpd_trace)[1],1]-gdp$lpd_trace[1:(dim(gdp$lpd_trace)[1]-1),1]>=0)

#Plot Results
plot(gdp$B,ylab=expression(beta[GDP]),main="GDP MAP Estimate of Regression Coefficients")

WEST M. (1987). On scale mixtures of normal distributions, Biometrika, 74 (3) 646-648. DOI: http://dx.doi.org/10.1093/biomet/74.3.646

Artin Armagan, David Dunson, & Jaeyong Lee (2011). Generalized double Pareto shrinkage Statistica Sinica 23 (2013), 119-143 arXiv: 1104.0861v4

Figueiredo M.A.T. (2003). Adaptive sparseness for supervised learning, IEEE Transactions on Pattern Analysis and Machine Intelligence, 25 (9) 1150-1159. DOI: http://dx.doi.org/10.1109/tpami.2003.1227989

Also see this similar post on the Bayesian lasso.

The post Generalized Double Pareto Priors for Regression appeared first on Ive Moved.

EM Algorithm for Bayesian Lasso R Cpp Code

admin — Fri, 05 Sep 2014 16:56:55 +0000

Bayesian Lasso

$$\begin{align*}
p(Y_{o}|\beta,\phi)&=N(Y_{o}|1\alpha+X_{o}\beta,\phi^{-1} I_{n{o}})\\
\pi(\beta_{i}|\phi,\tau_{i}^{2})&=N(\beta_{i}|0, \phi^{-1}\tau_{i}^{2})\\
\pi(\tau_{i}^{2})&=Exp \left( \frac{\lambda}{2} \right)\\
\pi(\phi)&\propto \phi^{-1}\\
\pi(\alpha)&\propto 1\\
\end{align*}$$

Marginalizing over $\alpha$ equates to centering the observations and losing a degree of freedom and working with the centered $ Y_{o} $.
Mixing over $\tau_{i}^{2}$ leads to a Laplace or Double Exponential prior on $\beta_{i}$ with rate parameter $\sqrt{\phi\lambda}$ as seen by considering the characteristic function
$$\begin{align*}
\varphi_{\beta_{i}|\phi}(t)&=\int e^{jt\beta_{i}}\pi(\beta_{i}|\phi)d\beta_{i}\\
&=\int \int e^{jt\beta_{i}}\pi(\beta_{i}|\phi,\tau_{i}^{2})\pi(\tau_{i}^{2})d\tau_{i} d\beta_{i}\\
&=\frac{\lambda}{2} \int e^{-\frac{1}{2}\frac{t^{2}}{\phi}\tau_{i}^{2}}e^{-\frac{\lambda}{2}\tau_{i}^{2}}d\tau_{i}\\
&=\frac{\lambda}{\frac{t^{2}}{\phi}+\lambda}=\frac{\lambda\phi}{t^{2}+\lambda\phi}
\end{align*}$$.

EM Algorithm

The objective is to find the mode of the joint posterior $\pi(\beta,\phi|Y_{o})$. It is easier, however, to find the joint mode of $\pi(\beta,\phi|Y_{o},\tau^{2})$ and use EM to exploit the scale mixture representation.

$$\begin{align*}
\log \pi(\beta,\phi|Y_{o},\tau^{2})=c+ \frac{n_o+p-3}{2}\log \phi -\frac{\phi}{2}||Y_{o}-X_{o}\beta||^{2}-\sum_{i=1}^{p}\frac{\phi}{2}\frac{1}{\tau_{i}^{2}}\beta^{2}_{i}
\end{align*}$$

Expectation

The expecation w.r.t. $\tau_{i}^{2}$ is handled as by
$$
\begin{align*}
&\frac{\lambda}{2}\int \frac{1}{\tau_{i}^{2}}\left( \frac{\phi}{2\pi\tau_{i}^{2}} \right)^{\frac{1}{2}}e^{-\frac{1}{2}\phi\beta_{i}^{2}\frac{1}{\tau_{i}^{2}}}e^{-\frac{\lambda}{2}\tau_{i}^{2}}d\tau_{i}^{2}\\
&\frac{\lambda}{2}\int \left( \frac{\phi}{2\pi[\tau_{i}^{2}]^{3}} \right)^{\frac{1}{2}}e^{-\frac{1}{2}\phi\beta_{i}^{2}\frac{1}{\tau_{i}^{2}}}e^{-\frac{\lambda}{2}\tau_{i}^{2}}d\tau_{i}^{2}\\
\end{align*}$$

This has the kernel of an Inverse Gaussian distribution with shape parameter $\phi \beta_{i}^{2}$ and mean $\sqrt{\frac{\phi}{\lambda}}|\beta_{i}|$

$$\begin{align*}
&\frac{{\lambda}}{2|\beta_{i}|}\int \left( \frac{\beta_{i}^{2}\phi}{2\pi[\tau_{i}^{2}]^{3}} \right)^{\frac{1}{2}}e^{-\frac{1}{2}\phi\beta_{i}^{2}\frac{1}{\tau_{i}^{2}}}e^{-\frac{\lambda}{2}\tau_{i}^{2}}d\tau_{i}^{2}\\
&\frac{\lambda}{2|\beta_{i}|}e^{-\sqrt{\lambda\phi\beta_{i}^{2}}}\int \left( \frac{\beta_{i}^{2}\phi}{2\pi[\tau_{i}^{2}]^{3}} \right)^{\frac{1}{2}}e^{-\frac{1}{2}\phi\beta_{i}^{2}\frac{1}{\tau_{i}^{2}}}e^{-\frac{\lambda}{2}\tau_{i}^{2}}e^{\sqrt{\lambda\phi\beta_{i}^{2}}}d\tau_{i}^{2}\\
&\frac{\lambda}{2|\beta_{i}|}e^{-\sqrt{\lambda\phi\beta_{i}^{2}}}\\
\end{align*}$$

Normalization as follows

$$\begin{align*}
&\frac{\lambda}{2}\int \left( \frac{\phi}{2\pi\tau_{i}^{2}} \right)^{\frac{1}{2}}e^{-\frac{1}{2}\phi\beta_{i}^{2}\frac{1}{\tau_{i}^{2}}}e^{-\frac{\lambda}{2}\tau_{i}^{2}}d\tau_{i}^{2}\\
&\frac{\lambda}{2}\int \tau_{i}^{2}\left( \frac{\phi}{2\pi[\tau_{i}^{2}]^{3}} \right)^{\frac{1}{2}}e^{-\frac{1}{2}\phi\beta_{i}^{2}\frac{1}{\tau_{i}^{2}}}e^{-\frac{\lambda}{2}\tau_{i}^{2}}d\tau_{i}^{2}\\
\end{align*}$$
$$\begin{align*}
&\frac{\lambda}{2|\beta_{i}|}e^{-\sqrt{\lambda\phi\beta_{i}^{2}}}\sqrt{\frac{\phi}{\lambda}}|\beta_{i}|\\
\end{align*}$$

$ \Rightarrow \mathbb{E}\left[ \frac{1}{\tau_{i}^{2}} \right]=\sqrt{\frac{\lambda}{\phi^{t}}}\frac{1}{|\beta_{i}^{t}|}$.
Let $\Lambda^{t}=diag(\sqrt{\frac{\lambda}{\phi^{t}}}\frac{1}{|\beta_{1}^{t}|}, \dots, \sqrt{\frac{\lambda}{\phi^{t}}}\frac{1}{|\beta_{p}^{t}|})$.

Maximization

$$\begin{align*}
&Q(\beta,\phi||\beta^{t},\phi^{t})=c+ \frac{n_o+p-3}{2}\log \phi -\frac{\phi}{2}||Y_{o}-X_{o}\beta||^{2} – \frac{\phi}{2}\beta^{T}\Lambda^{t}\beta\\
&=c+ \frac{n_o+p-3}{2}\log \phi -\frac{\phi}{2}||\beta-(X_{o}^{T}X_{o}+\Lambda^{t})^{-1}X_{o}^{T}Y_{o}||^{2}_{(X_{o}^{T}X_{o}+\Lambda^{t})}-\frac{\phi}{2}Y_{o}^{T}(I_{n_{o}}-X_{o}^{T}(X_{o}^{T}X_{o}+\Lambda^{t})^{-1}X_{o})Y_{o}\\
\end{align*}$$

$$\begin{align*}
\beta^{t+1}&=(X_{o}^{T}X_{o}+\Lambda^{t})^{-1}X_{o}^{T}Y_{o}\\
\end{align*}$$

$$\begin{align*}
\phi^{t+1}=\frac{n_{o}+p-3}{Y_{o}^{T}(I_{n_{o}}-X_{o}^{T}(X_{o}^{T}X_{o}+\Lambda^{t})^{-1}X_{o})Y_{o}}
\end{align*}$$

RCpp C++ Code

#include 
// [[Rcpp::depends(RcppArmadillo)]]

using namespace Rcpp;
using namespace arma;

double or_log_posterior_density(int no, int p, double lasso, const Col& yo, const Mat& xo, const Col& B,double phi);

// [[Rcpp::export]]
List or_lasso_em(NumericVector ryo, NumericMatrix rxo, SEXP rlasso){

	//Define Variables//
	int p=rxo.ncol();
	int no=rxo.nrow();
	double lasso=Rcpp::as(rlasso);

	//Create Data//
	arma::mat xo(rxo.begin(), no, p, false);
	arma::colvec yo(ryo.begin(), ryo.size(), false);
	yo-=mean(yo);

	//Pre-Processing//
	Col xoyo=xo.t()*yo;
	Col B=xoyo/no;
	Col Babs=abs(B);
	Mat xoxo=xo.t()*xo;
	Mat D=eye(p,p);
	Mat Ip=eye(p,p);
	double yoyo=dot(yo,yo);
	double deltaB;
	double deltaphi;
	double phi=no/dot(yo-xo*B,yo-xo*B);
	double lp;

	//Create Trace Matrices
	Mat B_trace(p,20000);
	Col phi_trace(20000);
	Col lpd_trace(20000);

	//Run EM Algorithm//
	cout << "Beginning EM Algorithm" << endl;
	int t=0;
	B_trace.col(t)=B;
	phi_trace(t)=phi;
	lpd_trace(t)=or_log_posterior_density(no,p,lasso,yo,xo,B,phi);
	do{
		t=t+1;

		lp=sqrt(lasso/phi);

		Babs=abs(B);
		D=diagmat(sqrt(Babs));
		B=D*solve(D*xoxo*D+lp*Ip,D*xoyo);

		phi=(no+p-3)/(yoyo-dot(xoyo,B));

		//Store Values//
		B_trace.col(t)=B;
		phi_trace(t)=phi;
		lpd_trace(t)=or_log_posterior_density(no,p,lasso,yo,xo,B,phi);

		deltaB=dot(B_trace.col(t)-B_trace.col(t-1),B_trace.col(t)-B_trace.col(t-1));
		deltaphi=phi_trace(t)-phi_trace(t-1);
	} while((deltaB>0.00001 || deltaphi>0.00001) && t<19999);
	cout << "EM Algorithm Converged in " << t << " Iterations" << endl;

	//Resize Trace Matrices//
	B_trace.resize(p,t);
	phi_trace.resize(t);
	lpd_trace.resize(t);

	return Rcpp::List::create(
			Rcpp::Named("B") = B,
			Rcpp::Named("B_trace") = B_trace,
			Rcpp::Named("phi") = phi,
			Rcpp::Named("phi_trace") = phi_trace,
			Rcpp::Named("lpd_trace") = lpd_trace
			) ;

}




double or_log_posterior_density(int no, int p, double lasso, const Col& yo, const Mat& xo, const Col& B,double phi){

	double lpd;
	lpd=(double)0.5*((double)no-1)*log(phi/(2*M_PI))-0.5*phi*dot(yo-xo*B,yo-xo*B)+0.5*(double)p*log(phi*lasso)-sqrt(phi*lasso)*sum(abs(B))-log(phi);
	return(lpd);

}

An Example in R

rm(list=ls())

#Generate Design Matrix
set.seed(3)
no=100
foo=rnorm(no,0,1)
sd=4
xo=cbind(foo+rnorm(no,0,sd),foo+rnorm(no,0,sd),foo+rnorm(no,0,sd),foo+rnorm(no,0,sd),foo+rnorm(no,0,sd),foo+rnorm(no,0,sd),foo+rnorm(no,0,sd),foo+rnorm(no,0,sd))
for(i in 1:40) xo=cbind(xo,foo+rnorm(no,0,sd),foo+rnorm(no,0,sd),foo+rnorm(no,0,sd),foo+rnorm(no,0,sd),foo+rnorm(no,0,sd),foo+rnorm(no,0,sd),foo+rnorm(no,0,sd))

#Scale and Center Design Matrix
xo=scale(xo,center=T,scale=F)
var=apply(xo^2,2,sum)
xo=scale(xo,center=F,scale=sqrt(var/no))

#Generate Data under True Model
p=dim(xo)[2]
b=rep(0,p)
b[1]=1
b[2]=2
b[3]=3
b[4]=4
b[5]=5
xo%*%b
yo=xo%*%b+rnorm(no,0,1)
yo=yo-mean(yo)

#Run Lasso
or_lasso=or_lasso_em(yo,xo,100)

#Posterior Density Increasing at Every Iteration?
or_lasso$lpd_trace[2:dim(or_lasso$lpd_trace)[1],1]-or_lasso$lpd_trace[1:(dim(or_lasso$lpd_trace)[1]-1),1]>=0
mean(or_lasso$lpd_trace[2:dim(or_lasso$lpd_trace)[1],1]-or_lasso$lpd_trace[1:(dim(or_lasso$lpd_trace)[1]-1),1]>=0)

#Plot Results
plot(or_lasso$B,ylab=expression(beta[lasso]),main="Lasso MAP Estimate of Regression Coefficients")

Park, T., & Casella, G. (2008). The Bayesian Lasso Journal of the American Statistical Association, 103 (482), 681-686 DOI: 10.1198/016214508000000337
Figueiredo M.A.T. (2003). Adaptive sparseness for supervised learning, IEEE Transactions on Pattern Analysis and Machine Intelligence, 25 (9) 1150-1159. DOI: http://dx.doi.org/10.1109/tpami.2003.1227989
Better Shrinkage Priors:
Armagan A., Dunson D.B. & Lee J. GENERALIZED DOUBLE PARETO SHRINKAGE., Statistica Sinica, PMID: 24478567

The post EM Algorithm for Bayesian Lasso R Cpp Code appeared first on Ive Moved.

Compile R and OpenBLAS from Source Guide

admin — Wed, 16 Jul 2014 20:54:15 +0000

1. Get OpenBLAS
2.1 Get R
2.2 Specific Instructions for DSS Users
3. Validation
4. Benchmark

This guide is intended to aid any R and Linux user who desires a threaded version of BLAS. In particular I hope this will allow other grad students, who like me do not have many user privileges on their office computer, to follow suit and exploit multiple cores to speed up their linear algebra computations within R. The following will be performed on Scientific Linux 6.4 but has should be completely general. If you are a Ubuntu user, then there is an elegant and streamlined process for changing BLAS libraries and a recommended post about it here. I use Fedora on my laptop, and the following has also been tested thereupon.

My office computer has a quadcore processor with two threads per core but I also have access to a departmental computer with 4 sockets and 12 cores per socket (1 thread per core), so it really makes sense to use a threaded version of BLAS. If you are curious about the hardware on your own computer you can run the command “cat /proc/cpuinfo” or “lscpu”.

Unfortunately my office computer is part of a network upon which I do not have permissions to change ‘/usr/lib64/R/lib/libRblas.so’. Moreover R appears to be running serially: if you start up R and get the PID (process ID) from ‘top’ or ‘ps aux | grep R’ or something and then execute ‘cat /proc/PID/status | grep Threads’ you can see there is only one thread available.

[msl33@cabbage ~]$ cat /proc/13605/status | grep Threads
Threads:	1

(where 13605 was the process ID of my R process. That is using the default R on the network. One could appeal to the network administrator to change things for you but they probably won’t because a parallel BLAS implementation may cause problems for other users who require a serial BLAS, such as those that use the multicore environment to perform inherently parallel algorithms such as parallel tempering instead of using idle cores to speed up the linear algebra. There are also some known conflicts with the multicore package in R. There is, however, nothing stopping the user from compiling one’s own custom R build in one’s home directory and just changing the executable path thereto. In addition, you then have the power and freedom customize R to your needs – at the moment I have some very large matrices which would benefit from a threaded BLAS but at some point I may want to revert to a tuned serial BLAS such at ATLAS for certain parallel algorithms.

Firstly, go ahead and create a directory in which to keep all your custom software.

[msl33@cabbage ~]$ pwd
/home/grad/msl33
[msl33@cabbage ~]$ mkdir software

Download OpenBLAS

Make a directory “openblas” in the “software directory.

[msl33@cabbage ~]$ cd software/
[msl33@cabbage software]$ mkdir openblas

Next, grab the tarball from the OpenBLAS homepage. Change directory into where you downloaded the tarball and extract the files from it.

[msl33@cabbage ~]$ cd Downloads/
[msl33@cabbage Downloads]$ tar -xvf xianyi-OpenBLAS-v0.2.9-0-gf773f49.tar.gz

While this is running, fill a kettle with some water and turn it on, this stage is very important.

Change directory into where you extracted the files and verify that NO_AFFINITY=1 is uncommented in the Makefile.rule. If so proceed and run make.

[msl33@cabbage ~/Downloads]$ cd xianyi-OpenBLAS-347dded/
[msl33@cabbage xianyi-OpenBLAS-347dded]$ cat Makefile.rule | grep NO_AFFINITY
NO_AFFINITY = 1
[msl33@cabbage xianyi-OpenBLAS-347dded]$ make

Now is a good time to “make” some tea with the water prepared earlier. When done successfully one will see

Now, as instructed above, install to the “software” directory made earlier.

[msl33@cabbage xianyi-OpenBLAS-347dded]$ make PREFIX=/home/grad/msl33/software/openblas install
...
Install OK!

In openblas/lib there will be a file “libopenblas.so”, needed for later. That’s it for openblas, next we will do R.

Download R

Let’s create an R directory in software. Go onto the R homepage, then download, then choose a mirror and grab the tarball of the latest version. Download it to your “software” directory and extract it as before with “tar -xvf R-3.1.1.tar.gz”. Once extracted, remove the tarball and change directory into R-3.1.1. Before running the configure script one might bring some customizations into consideration in the name of efficiency. One might consider upping the optimization level from 2 to 3 and adding march or mtune by editing “config.site” and changing “## CFLAGS=” on line 53 to “CFLAGS=’-O3 -march=native'” and making similar changes for FFLAGS and CXXFLAGS. It is noted in the R Installation and Administration documentation that these can produce worthwhile speedups but come with a warning that the build will be less reliable, with segfaults and numerical errors creeping in. It is safest to leave things regular (reccommended link) but I’ll take the risk. Now, if you are not using a computer on the duke statistical science network, run the configure script, otherwise see the additional instructions before running configure.

[msl33@cabbage R-3.1.1]$ ./configure --prefix=/home/grad/msl33/software/R --enable-R-shlib --enable-BLAS-shlib --enable-memory-profiling --with-tcltk=no

BEGIN ADDITIONAL INSTRUCTIONS FOR DUKE STATISTICAL SCIENCE STUDENTS

[On the DSS computers some further instructions are required to locate headers and libraries. The first time I tried to make on my office computer I encountered this error. “jni.h” could not be found. The error was resolved by locating it and then export the environment variable JAVA_HOME.

[msl33@cabbage software]$ locate jni.h
/usr/lib/jvm/java-1.7.0-sun-1.7.0.11/include/jni.h
[msl33@cabbage software]$ export JAVA_HOME=/usr/lib/jvm/java-1.7.0-sun-1.7.0.11/

In addition, when running the configure script the readline headers/libs could not be found. We’ll just borrow them from some other software. Add to CFLAGS, FFLAGS, CXXFLAGS “-I/opt/EPD_Free/include -L/opt/EPD_Free/lib” in addition to any other flags that you have set. Also make a lib directory and copy them in.

[msl33@cabbage R-3.1.1]$ mkdir lib
[msl33@cabbage R-3.1.1]$ cp /opt/EPD_Free/lib/libreadline.* lib/
[msl33@cabbage R-3.1.1]$ cp /opt/EPD_Free/lib/libncurses* lib/

Now run the configure line above.]

END ADDITIONAL INSTRUCTIONS FOR DUKE STATISTICAL SCIENCE STUDENTS

Once the configure has completed, you’ll see a summary below like

Now issue the command “make”, it will take some time. Once make has finished, you can execute “make install” to populate the software/R directory created earlier but you don’t need to. Change directories to lib and make a backup of libRblas.so and create a symbolic link to the openblas library that was made earlier.

[msl33@cabbage ~]$ cd software/R-3.1.1/lib
[msl33@cabbage lib]$ pwd
/home/grad/msl33/software/R-3.1.1/lib
[msl33@cabbage lib]$ mv libRblas.so libRblas.so.keep
[msl33@cabbage lib]$ ln -s /home/grad/msl33/software/openblas/lib/libopenblas.so libRblas.so

That was the last step.

Setup Validation

The R executable in the bin directory should now use openblas. Note this is the R executable you now need to run in order to use the custom built R with openblas. Just typing R in terminal will load the old /usr/lib64… which we students didn’t have the permissions to alter. You can, however, create an alias in your .bashrc file by inserting the line ‘alias R=”/home/grad/msl33/software/R-3.1.1/bin/./R”‘. Now when you type R in a terminal it will load the new R and not the old one. One can check that R executable depends on the correct linked shared blas library with the “ldd” command.

[msl33@cabbage bin]$ pwd
/home/grad/msl33/software/R-3.1.1/bin
[msl33@cabbage bin]$ ./R CMD ldd exec/./R | grep blas
	libRblas.so => /home/grad/msl33/software/R-3.1.1/lib/libRblas.so (0x00007f62e3fb7000)
[msl33@cabbage bin]$ ls -lt ../lib | grep openblas
lrwxrwxrwx  1 msl33 grad      53 Jul 16 15:35 libRblas.so -> /home/grad/msl33/software/openblas/lib/libopenblas.so

In addition, execute “./R” from the “bin” directory (or just R if you set up the alias) and grab the process id.

[msl33@cabbage bin]$ ps aux | grep R | grep software | awk '{print $2}'
2412
[msl33@cabbage bin]$ cat /proc/`ps aux | grep R | grep software | awk '{print $2}'`/status | grep Threads
Threads:	8
[msl33@cabbage bin]$

Evidently the R session now has 8 threads available. Finally, lets perform an eigen-decomposition and look at the cpu usage using top. You’ll see it light up all of your cores.

Benchmark

Using this benchmark the reference BLAS took 32.1 seconds whilst openBLAS took 7.1 seconds.

The post Compile R and OpenBLAS from Source Guide appeared first on Ive Moved.