Compile R and OpenBLAS from Source Guide

1. Get OpenBLAS
2.1 Get R
2.2 Specific Instructions for DSS Users
3. Validation
4. Benchmark

This guide is intended to aid any R and Linux user who desires a threaded version of BLAS. In particular I hope this will allow other grad students, who like me do not have many user privileges on their office computer, to follow suit and exploit multiple cores to speed up their linear algebra computations within R. The following will be performed on Scientific Linux 6.4 but has should be completely general. If you are a Ubuntu user, then there is an elegant and streamlined process for changing BLAS libraries and a recommended post about it here. I use Fedora on my laptop, and the following has also been tested thereupon.

My office computer has a quadcore processor with two threads per core but I also have access to a departmental computer with 4 sockets and 12 cores per socket (1 thread per core), so it really makes sense to use a threaded version of BLAS. If you are curious about the hardware on your own computer you can run the command “cat /proc/cpuinfo” or “lscpu”.

Unfortunately my office computer is part of a network upon which I do not have permissions to change ‘/usr/lib64/R/lib/’. Moreover R appears to be running serially: if you start up R and get the PID (process ID) from ‘top’ or ‘ps aux | grep R’ or something and then execute ‘cat /proc/PID/status | grep Threads’ you can see there is only one thread available.

[msl33@cabbage ~]$ cat /proc/13605/status | grep Threads
Threads:	1

(where 13605 was the process ID of my R process. That is using the default R on the network. One could appeal to the network administrator to change things for you but they probably won’t because a parallel BLAS implementation may cause problems for other users who require a serial BLAS, such as those that use the multicore environment to perform inherently parallel algorithms such as parallel tempering instead of using idle cores to speed up the linear algebra. There are also some known conflicts with the multicore package in R. There is, however, nothing stopping the user from compiling one’s own custom R build in one’s home directory and just changing the executable path thereto. In addition, you then have the power and freedom customize R to your needs – at the moment I have some very large matrices which would benefit from a threaded BLAS but at some point I may want to revert to a tuned serial BLAS such at ATLAS for certain parallel algorithms.

Firstly, go ahead and create a directory in which to keep all your custom software.

[msl33@cabbage ~]$ pwd
[msl33@cabbage ~]$ mkdir software

Download OpenBLAS

Make a directory “openblas” in the “software directory.

[msl33@cabbage ~]$ cd software/
[msl33@cabbage software]$ mkdir openblas

Next, grab the tarball from the OpenBLAS homepage. Change directory into where you downloaded the tarball and extract the files from it.

[msl33@cabbage ~]$ cd Downloads/
[msl33@cabbage Downloads]$ tar -xvf xianyi-OpenBLAS-v0.2.9-0-gf773f49.tar.gz 

While this is running, fill a kettle with some water and turn it on, this stage is very important.

Change directory into where you extracted the files and verify that NO_AFFINITY=1 is uncommented in the Makefile.rule. If so proceed and run make.

[msl33@cabbage ~/Downloads]$ cd xianyi-OpenBLAS-347dded/
[msl33@cabbage xianyi-OpenBLAS-347dded]$ cat Makefile.rule | grep NO_AFFINITY
[msl33@cabbage xianyi-OpenBLAS-347dded]$ make

Now is a good time to “make” some tea with the water prepared earlier. When done successfully one will see
openblas confirmation
Now, as instructed above, install to the “software” directory made earlier.

[msl33@cabbage xianyi-OpenBLAS-347dded]$ make PREFIX=/home/grad/msl33/software/openblas install
Install OK!

In openblas/lib there will be a file “”, needed for later. That’s it for openblas, next we will do R.

Download R

Let’s create an R directory in software. Go onto the R homepage, then download, then choose a mirror and grab the tarball of the latest version. Download it to your “software” directory and extract it as before with “tar -xvf R-3.1.1.tar.gz”. Once extracted, remove the tarball and change directory into R-3.1.1. Before running the configure script one might bring some customizations into consideration in the name of efficiency. One might consider upping the optimization level from 2 to 3 and adding march or mtune by editing “” and changing “## CFLAGS=” on line 53 to “CFLAGS=’-O3 -march=native'” and making similar changes for FFLAGS and CXXFLAGS. It is noted in the R Installation and Administration documentation that these can produce worthwhile speedups but come with a warning that the build will be less reliable, with segfaults and numerical errors creeping in. It is safest to leave things regular (reccommended link) but I’ll take the risk. Now, if you are not using a computer on the duke statistical science network, run the configure script, otherwise see the additional instructions before running configure.

[msl33@cabbage R-3.1.1]$ ./configure --prefix=/home/grad/msl33/software/R --enable-R-shlib --enable-BLAS-shlib --enable-memory-profiling --with-tcltk=no


[On the DSS computers some further instructions are required to locate headers and libraries. The first time I tried to make on my office computer I encountered this error. “jni.h” could not be found. The error was resolved by locating it and then export the environment variable JAVA_HOME.

[msl33@cabbage software]$ locate jni.h
[msl33@cabbage software]$ export JAVA_HOME=/usr/lib/jvm/java-1.7.0-sun-

In addition, when running the configure script the readline headers/libs could not be found. We’ll just borrow them from some other software. Add to CFLAGS, FFLAGS, CXXFLAGS “-I/opt/EPD_Free/include -L/opt/EPD_Free/lib” in addition to any other flags that you have set. Also make a lib directory and copy them in.

[msl33@cabbage R-3.1.1]$ mkdir lib
[msl33@cabbage R-3.1.1]$ cp /opt/EPD_Free/lib/libreadline.* lib/
[msl33@cabbage R-3.1.1]$ cp /opt/EPD_Free/lib/libncurses* lib/

Now run the configure line above.]


Once the configure has completed, you’ll see a summary below like
openblas configure
Now issue the command “make”, it will take some time. Once make has finished, you can execute “make install” to populate the software/R directory created earlier but you don’t need to. Change directories to lib and make a backup of and create a symbolic link to the openblas library that was made earlier.

[msl33@cabbage ~]$ cd software/R-3.1.1/lib
[msl33@cabbage lib]$ pwd
[msl33@cabbage lib]$ mv
[msl33@cabbage lib]$ ln -s /home/grad/msl33/software/openblas/lib/

That was the last step.

Setup Validation

The R executable in the bin directory should now use openblas. Note this is the R executable you now need to run in order to use the custom built R with openblas. Just typing R in terminal will load the old /usr/lib64… which we students didn’t have the permissions to alter. You can, however, create an alias in your .bashrc file by inserting the line ‘alias R=”/home/grad/msl33/software/R-3.1.1/bin/./R”‘. Now when you type R in a terminal it will load the new R and not the old one. One can check that R executable depends on the correct linked shared blas library with the “ldd” command.

[msl33@cabbage bin]$ pwd
[msl33@cabbage bin]$ ./R CMD ldd exec/./R | grep blas => /home/grad/msl33/software/R-3.1.1/lib/ (0x00007f62e3fb7000)
[msl33@cabbage bin]$ ls -lt ../lib | grep openblas
lrwxrwxrwx  1 msl33 grad      53 Jul 16 15:35 -> /home/grad/msl33/software/openblas/lib/

In addition, execute “./R” from the “bin” directory (or just R if you set up the alias) and grab the process id.

[msl33@cabbage bin]$ ps aux | grep R | grep software | awk '{print $2}'
[msl33@cabbage bin]$ cat /proc/`ps aux | grep R | grep software | awk '{print $2}'`/status | grep Threads
Threads:	8
[msl33@cabbage bin]$ 

Evidently the R session now has 8 threads available. Finally, lets perform an eigen-decomposition and look at the cpu usage using top. You’ll see it light up all of your cores.
openblas cpu usage


Using this benchmark the reference BLAS took 32.1 seconds whilst openBLAS took 7.1 seconds.

Woodbury Matrix Inverse Identity

Application in Conditional Distribution of Multivariate Normal

The Sherman-Woodbury-Morrison matrix inverse identity can be regarded as a transform between Schur complements. That is, given V_{22.1}^{-1} one can obtain V_{11.2}^{-1} by using the Woodbury matrix identity and vice versa. Recall the Woodbury Identity:




I recently stumbled across a neat application of this whilst deriving full conditionals for a multivariate normal. Recall that if the data are partitioned into two blocks, Y_{1},Y_{2}, then the variance of the conditional distribution Y_{1}|Y_{2},- is the Schur complement of the block V_{22} of total variance matrix V, that is, the variance of the conditional distribution is V_{11.2}=V_{11}-V_{12}V_{22}^{-1}V_{21} which is the variance of Y_{1} subtracted by something corresponding to the reduction in uncertainty about Y_{1} gained from the knowledge about Y_{2}. If, however, V_{22} has the form of a Schur complement itself, then it may be possible to exploit the Woodbury identity above to considerably simplify the variance term. I came across this when I derived two very different-looking expressions for the conditional distribution and found them equivalent by the Woodbury identity. Consider the model

\begin{bmatrix}   Y_{1}\\   Y_{2}  \end{bmatrix}  =  \begin{bmatrix}   X_{1}\\   X_{2}  \end{bmatrix}\beta_{ } + \varepsilon


\varepsilon \sim N\left( \begin{bmatrix}0\\ 0\end{bmatrix}, \sigma^{2} \begin{bmatrix}I_{1} & 0 \\ 0 & I_{2}\end{bmatrix} \right)

\beta_{ }| ,\sigma^{2} \sim N(0, \sigma^{2}\Lambda^{-1})

I was seeking the distribution Y_{1}| Y_{2},\sigma^{2} and arrived there through two different paths. The distributions derived looked very different, but they turned out to be equivalent upon considering the Woodbury identity.

Method 1

This simply manipulates properties of the multivariate normal. Marginalizing over \beta one gets

Cov \begin{bmatrix}   Y_{1} \\  Y_{2}  \end{bmatrix} = \begin{bmatrix}   X_{1} \\  X_{2}  \end{bmatrix} Cov (\beta_{ })  \begin{bmatrix}   X_{1}^{T} &  X_{2}^{T}  \end{bmatrix} + Cov(\varepsilon)

Cov \begin{bmatrix}   Y_{1} \\  Y_{2}  \end{bmatrix} = \sigma^{2}\begin{bmatrix}   X_{1}\Lambda^{-1} X_{1}^{T} &  X_{1}\Lambda^{-1} X_{2}^{T} \\  X_{2}\Lambda^{-1} X_{1}^{T} &  X_{2}\Lambda^{-1} X_{2}^{T}  \end{bmatrix}  + \sigma^{2}  \begin{bmatrix}  I_{1} & 0 \\ 0 & I_{2}  \end{bmatrix}

Such that the distribution

\begin{bmatrix}   Y_{1}\\   Y_{2}  \end{bmatrix}| ,\sigma^{2} \sim N \left(  \begin{bmatrix}  0\\  0  \end{bmatrix},  \sigma^{2} \left( \begin{bmatrix}  I_{1}+  X_{1}\Lambda^{-1} X_{1}^{T} &  X_{1}\Lambda^{-1} X_{2}^{T} \\  X_{2}\Lambda^{-1} X_{1}^{T} & I_{2}+ X_{2}\Lambda^{-1} X_{2}^{T}  \end{bmatrix}  \right) \right)

It follows that the conditional distribution is
Y_{1}| Y_{2} ,\sigma^{2} \sim N \left(  X_{1}\Lambda^{-1} X_{2}^{T} \left[  X_{2}\Lambda^{-1} X_{2}^{T} + I_{2}\right]^{-1} Y_{2}, I_{1} +  X_{1}\Lambda^{-1} X_{1}^{T} -  X_{1}\Lambda^{-1} X_{2}^{T} \left[ I_{2} +  X_{2}\Lambda^{-1} X_{2}^{T} \right]^{-1}  X_{2}\Lambda^{-1} X_{1}^{T}\right).
This looks a bit nasty, but notice that V_{22}^{-1} looks like it too could be a Schur complement of some matrix.

Method 2

An alternative route to this distribution is

f( Y_{1}| Y_{2},\sigma^{2} )=\int f( Y_{1}|\sigma^{2} ,\beta_{ })\pi(\beta_{ }| Y_{2},\sigma^{2} )d\beta_{ }


\beta_{ }| Y_{2} ,\sigma^{2}\sim N \left( ( X_{2}^{T} X_{2}+\Lambda)^{-1} X_{2}^{T} Y_{2}, \sigma^{2}( X_{2}^{T} X_{2}+\Lambda)^{-1} \right).

It follows that

Y_{1}| Y_{2} ,\sigma^{2} \sim N\left(  X_{1}( X_{2}^{T} X_{2}+\Lambda)^{-1} X_{2}^{T} Y_{2}, \sigma^{2} (I_{1} +  X_{1} ( X_{2}^{T} X_{2}+\Lambda)^{-1} X_{1}^{T}) \right)

which looks different from the distribution obtained through method 1. The expression for the variance is a lot neater. They are in fact identical by the Woodbury identity.


Mean (Submitted by Michelle Leigh)

\left[\Lambda+  X_{2}^TI_{2} X_{2}\right]^{-1} X_{2}^T\\  =\{\Lambda^{-1}-\Lambda^{-1} X_{2}^T\left[I_{2}+  X_{2}\Lambda^{-1} X_{2}^T\right]^{-1} X_{2}\Lambda^{-1}\} X_{2}^T\\  =\Lambda^{-1} X_{2}^T\left[I_{2}+  X_{2}\Lambda^{-1} X_{2}^T\right]^{-1}\left[I_{2}+  X_{2}\Lambda^{-1} X_{2}^T\right]-\Lambda^{-1} X_{2}^T\left[I_{2}+  X_{2}\Lambda^{-1} X_{2}^T\right]^{-1} X_{2}\Lambda^{-1} X_{2}^T\\  =\Lambda^{-1} X_{2}^T\left[I_{2}+  X_{2}\Lambda^{-1} X_{2}^T\right]^{-1}I_{2}

So mean1=mean2.


By the Woodbury Identity it follows that

\Lambda^{-1} - \Lambda^{-1} X_{2}^{T} \left[ I_{2} +  X_{2}\Lambda^{-1} X_{2}^{T} \right]^{-1}  X_{2}\Lambda^{-1} = ( X_{2}^{T}I_{2} X_{2}+\Lambda)^{-1}.


X_{1}\Lambda^{-1} X_{1}^{T}- X_{1}\Lambda^{-1} X_{2}^{T} \left[ I_{2}+ X_{2}\Lambda^{-1} X_{2}^{T} \right]^{-1}  X_{2}\Lambda^{-1} X_{1}^{T}={ X_{1} ( X_{2}^{T} X_{2}+\Lambda)^{-1} X_{1}^{T}}\\

and so variance1=variance2. The trick is recognizing the form of the formulas at the top of the page, then one can write the variance as a much neater expression.

Invert a Matrix using the Woodbury Matrix Inverse Formula Identity

Previously I wrote about the LDU decomposition and the Schur complement. These can be further used to derive the Sherman–Morrison–Woodbury formula, otherwise known as the matrix inversion lemma, for inverting a matrix. As shown in the previous post, a UDL and LDU are two ways of factorizing a matrix:

\begin{pmatrix}          V_{11} & V_{12}\\          V_{21} & V_{22}\\  \end{pmatrix}=    \begin{pmatrix}          1 &    V_{12}V_{22}^{-1}\\  0      & 1\\  \end{pmatrix}  \begin{pmatrix}           V_{11.2} & 0 \\           0 & V_{22} \\   \end{pmatrix}   \begin{pmatrix}            1 & 0 \\  V_{22}^{-1}V_{21} & 1 \\  \end{pmatrix}

\begin{pmatrix}          V_{11} & V_{12}\\          V_{21} & V_{22}\\  \end{pmatrix}    =  \begin{pmatrix}          1 & 0\\          V_{21}V_{11}^{-1} & 1\\  \end{pmatrix}  \begin{pmatrix}            V_{11} & 0 \\  0 & V_{22.1} \\  \end{pmatrix}  \begin{pmatrix}            1 & V_{11}^{-1}V_{12} \\  0 & 1 \\  \end{pmatrix}
Now consider taking the inverse of the matrices above, yielding

\begin{pmatrix}          V_{11} & V_{12}\\          V_{21} & V_{22}\\  \end{pmatrix}^{-1}=  \begin{pmatrix}  1 & 0 \\  -V_{22}^{-1}V_{21} & 1 \\  \end{pmatrix}      \begin{pmatrix}           V_{11.2}^{-1} & 0 \\           0 & V_{22}^{-1} \\   \end{pmatrix}   \begin{pmatrix}          1 &   -V_{12}V_{22}^{-1}\\  0      & 1\\  \end{pmatrix}

\begin{pmatrix}          V_{11} & V_{12}\\          V_{21} & V_{22}\\  \end{pmatrix}^{-1}  =  \begin{pmatrix}  1 & -V_{11}^{-1}V_{12} \\  0 & 1 \\  \end{pmatrix}  \begin{pmatrix}      V_{11}^{-1} & 0 \\  0 & V_{22.1}^{-1} \\  \end{pmatrix}    \begin{pmatrix}          1 & 0\\          -V_{21}V_{11}^{-1} & 1\\  \end{pmatrix}
Multiplying the matrices on the RHS yields
\begin{pmatrix}          V_{11} & V_{12}\\          V_{21} & V_{22}\\  \end{pmatrix}^{-1}  =  \begin{pmatrix}  V_{11.2}^{-1} & -V_{11.2}^{-1}V_{12}V_{22}^{-1}\\  -V_{22}^{-1}V_{21}V_{11.2}^{-1} & V_{22}^{-1}+V_{22}^{-1}V_{21}V_{11.2}^{-1}V_{12}V_{22}^{-1}  \end{pmatrix}

\begin{pmatrix}          V_{11} & V_{12}\\          V_{21} & V_{22}\\  \end{pmatrix}^{-1}  =\begin{pmatrix}          V_{11}^{-1}+V_{11}^{-1}V_{12}V_{22.1}^{-1}V_{21}V_{11}^{-1} & -V_{11}^{-1}V_{12}V_{22.1}^{-1}\\          -V_{22.1}^{-1}V_{21}V_{11}^{-1} & V_{22.1}^{-1}  \end{pmatrix}
These two results can be used to form neat expressions for the inverse of a partitioned block matrix. It follows that
\begin{pmatrix}  V_{11.2}^{-1} & -V_{11.2}^{-1}V_{12}V_{22}^{-1}\\  -V_{22}^{-1}V_{21}V_{11.2}^{-1} & V_{22}^{-1}+V_{22}^{-1}V_{21}V_{11.2}^{-1}V_{12}V_{22}^{-1}  \end{pmatrix}=  \begin{pmatrix}          V_{11}^{-1}+V_{11}^{-1}V_{12}V_{22.1}^{-1}V_{21}V_{11}^{-1} & -V_{11}^{-1}V_{12}V_{22.1}^{-1}\\          -V_{22.1}^{-1}V_{21}V_{11}^{-1} & V_{22.1}^{-1}  \end{pmatrix}
The equality holds for each block or element, so two expressions can be found for the Woodbury matrix inverse formula, namely:
where the dot notation corresponds to the Schur complement i.e.
V_{11.2}=V_{11}-V_{12}V_{22}^{-1}V_{21}\\  V_{22.1}=V_{22}-V_{21}V_{11}^{-1}V_{12}\\
An application of the Woodbury matrix inverse can be found in deriving conditional distributions for multivariate normals.

Diagonalize a Positive-Definite Symmetric Matrix using the Schur Complement and LDU Decomposition

Diagonalizing a matrix comes up frequently for me when wanting to diagonalize the variance matrix of a multivariate normal to derive conditional distributions. I prefer to proceed by doing an LDU Decomposition and leaving it in terms of the Schur complement as I find it easier to remember. Consider some matrix V and partition it as follows
\begin{pmatrix}          V_{11} & V_{12}\\          V_{21} & V_{22}\\  \end{pmatrix}=V
Consider right and left multiplying this matrix by
\begin{pmatrix}          1 & -V_{12}V_{22}^{-1}\\          0 & 1\\  \end{pmatrix}    \begin{pmatrix}          V_{11} & V_{12}\\          V_{21} & V_{22}\\  \end{pmatrix}  \begin{pmatrix}            1 & 0 \\  -V_{22}^{-1}V_{21} & 1 \\  \end{pmatrix}
The matrices really correspond to the L and U matrices in the LDU decomposition, noting that the matrix on the right or left is just the transpose of the other. Define the Schur complement of the matrix V with respect to the block V22 as
then working through the above multiplication it follows that
\begin{pmatrix}          1 & -V_{12}V_{22}^{-1}\\          0 & 1\\  \end{pmatrix}    \begin{pmatrix}          V_{11} & V_{12}\\          V_{21} & V_{22}\\  \end{pmatrix}  \begin{pmatrix}            1 & 0 \\  -V_{22}^{-1}V_{21} & 1 \\  \end{pmatrix}  =  \begin{pmatrix}           V_{11.2} & 0 \\           0 & V_{22} \\   \end{pmatrix}
This result is useful in deriving the Sherman–Morrison–Woodbury matrix inverse formula/identity, otherwise known as the matrix inversion lemma.
One can now simply multiply the above equation by the corresponding inverse matrices to obtain the LDU decomposition of the matrix V. The inverse matrices are easy

\begin{pmatrix}          1 & V_{12}V_{22}^{-1}\\          0 & 1\\  \end{pmatrix}  \begin{pmatrix}          1 & -V_{12}V_{22}^{-1}\\          0 & 1\\  \end{pmatrix}    \begin{pmatrix}          V_{11} & V_{12}\\          V_{21} & V_{22}\\  \end{pmatrix}  \begin{pmatrix}   1 & 0 \\  -V_{22}^{-1}V_{21} & 1 \\  \end{pmatrix}  \begin{pmatrix}            1 & 0 \\  V_{22}^{-1}V_{21} & 1 \\  \end{pmatrix}  =  \begin{pmatrix}          1 &     V_{12}V_{22}^{-1} \\      0 & 1\\  \end{pmatrix}  \begin{pmatrix}           V_{11.2} & 0 \\           0 & V_{22} \\   \end{pmatrix}   \begin{pmatrix}            1 & 0 \\  V_{22}^{-1}V_{21} & 1 \\  \end{pmatrix}
So finally
\begin{pmatrix}          V_{11} & V_{12}\\          V_{21} & V_{22}\\  \end{pmatrix}=    \begin{pmatrix}          1 &    V_{12}V_{22}^{-1}\\  0      & 1\\  \end{pmatrix}  \begin{pmatrix}           V_{11.2} & 0 \\           0 & V_{22} \\   \end{pmatrix}   \begin{pmatrix}            1 & 0 \\  V_{22}^{-1}V_{21} & 1 \\  \end{pmatrix}
Notice how this is a UDL factorization. It is entirely possible to do things a little differently and result with the LDU decomposition
Consider right and left multiplying this matrix by
\begin{pmatrix}          1 & 0\\          -V_{21}V_{11}^{-1} & 1\\  \end{pmatrix}    \begin{pmatrix}          V_{11} & V_{12}\\          V_{21} & V_{22}\\  \end{pmatrix}  \begin{pmatrix}            1 & -V_{11}^{-1}V_{12} \\  0 & 1 \\  \end{pmatrix}  =  \begin{pmatrix}            V_{11} & 0 \\  0 & V_{22.1} \\  \end{pmatrix}
i.e. the Schur complement of matrix V with respect to block 11. So finally
\begin{pmatrix}          V_{11} & V_{12}\\          V_{21} & V_{22}\\  \end{pmatrix}    =  \begin{pmatrix}          1 & 0\\          V_{21}V_{11}^{-1} & 1\\  \end{pmatrix}  \begin{pmatrix}            V_{11} & 0 \\  0 & V_{22.1} \\  \end{pmatrix}  \begin{pmatrix}            1 & V_{11}^{-1}V_{12} \\  0 & 1 \\  \end{pmatrix}