R snippets for vim-SnipMate

Vim is my editor of choice, reasonable so, whether it be for coding C++, LaTeX or even R. I’ve used RStudio, which even has a Vim-Mode, but I still prefer to use Vim. Vim has it’s own R plugin, namely Vim-R-plugin, but this post is about snippets. SnipMate is an awesome auto-completion plugin for Vim which is fully customizable. One simply writes a string, rnorm for example, and presses tab to autocomplete the code to rnorm(n=,mean=,sd=), where repeated press of tab cycles through the placeholders at the function parameters. The strings to recognize, referred to as snippets, are stored in a snippets file “languagetype.snippets” along with the corresponding code to auto-complete. These can be user defined for any language, not just R. It’s usually not necessary to write these snippets files yourself, as there are already existing snippets files within the vim community, including one for R. Here is a github repository containing snippets files for a great many languages, including R. Simply put this into your vim-SnipMate snippets directory. The last thing to do is to tell Vim to recognize an r filetype. If you open an R file and type,

:set ft=r

then this will tell Vim that the file is an R file. Obviously you don’t want to do this all the time, so to get Vim to automatically recognize “.r” and “.R” extensions as R files simply append your .vimrc file with:

au BufNewFile,BufRead *.r set filetype=r
au BufNewFile,BufRead *.R set filetype=r

Writing Custom Snippets

This is an example of how to complete a for loop having written only “for”. Append the r.snippets file with the following code.

snippet for
    for(${1:i} in ${2:begin}:${3:end}){
    ${4}
    }${5}

This defines “for” as the snippet, the string after which we will want to press tab. The dollar signs define the place holders. “${x:text}” defines the x’th placeholder and the highlighted text which will be replaced.

for<tab>

becomes

for(i in begin:end){

}

.

Repeated pressing of tab cycles between i, begin, end, body of foor loop and then breaks out of the for loop.

Calling C++ from R using Rcpp

Why call C/C++ from R?

I really like programming in R. The fact that it is open source immediately wins my favour over Matlab. It can, however, be quite slow especially if you “speak” R with a strong C/C++ accent. This sluggishness, especially when writing unavoidable for loops, has led me to consider other programming languages. I have come to the conclusion that there is no all-round best programming language but a combination of R and C++ is very hard to beat. My resolution is to write the expensive codes in C++ and to call them as functions within R so that I still have the nice dynamic interactivity of an R session with the speed and optimization of C++. There are 3 ways you can get started implementing C code in R, namely, “.C,”, “.Call” and Rcpp. It is generally not recommended to use “.C” but to use “.Call” instead. “.Call”, however, requires quite a bit of boilerplate code and can be tiresome and obfuscating to write. “Rcpp”, while not intrinsically part of R itself, is a package written by D Eddelbuettel and R Fran├žois which relies on “.Call” but allows you to write neater more efficient code by providing an interface between “.Call” and the programmer. There are many things to consider when writing C++ code for R but most likely you will want to get coding as fast as possible and worry about these other tangential matters later.

How to call C/C++ from R using Rcpp [The Hard/Conscientious Way]

Note: This is the longest and hardest way to compile C++ code for R, but it is arguably the most flexible for the conscientious user who requires complex code and desires to know all the details of whats going on. For a simpler no-nonsense approach scroll down to the “sourceCpp” command.
Lets jump right in with an example

#include <Rcpp.h>
#include <cstdlib>
#include <iostream>


using namespace std;
RcppExport SEXP comp(SEXP x, SEXP y){
        int i,n;
        Rcpp::NumericVector vector1(x);
        Rcpp::NumericVector vector2(y);
        n=vector2.size();
        Rcpp::NumericVector product(n);
for(i=0;i<n;i++){
product[i]=vector1[i]*vector2[i];
}
return(product);
}

Note: You do not need R.h and Rdefines.h when you use Rcpp.h.
Note: This is not the only way to compile and call code for R. See bottom for a neater alternative.
The function has two R objects as arguments, which are datatype “SEXP”, and returns an R object. The “SEXP” R objects, which are technically pointers to structures with typedef SEXPREC, are quite complex and so must be coerced into a vector using Rcpp::NumericVector, which makes a vector out of the SEXP object. Similarly, Rcpp::NumeriVector can be used to create a vector of length n. The rest is standard C++ code. Before compiling some compiler flags must be set:

[michael@michael rcpp]$ export PKG_CXXFLAGS=`Rscript -e "Rcpp:::CxxFlags()"`
[michael@michael rcpp]$ export PKG_LIBS=`Rscript -e "Rcpp:::LdFlags()"`

Compile with:

[michael@michael rcpp]$ R CMD SHLIB comp.cpp

The function is now ready to be used in R. Consider the following R code:

library(Rcpp)
dyn.load('comp.so')
x=rnorm(100,0,1)
y=rnorm(100,0,1)
.Call('comp',x,y)

First the Rcpp library is loaded. Then, C++ code is loaded with “dyn.load”, after which it is ready to be used in R. The “.Call” function calls the “comp” function and the two arguments are supplied after. Now lets consider a more interesting example by parallelizing the C++ with openmp.

Performance with OpenMP

#include <Rcpp.h>
#include <cstdlib>
#include <iostream>
#include <omp.h>

using namespace std;
RcppExport SEXP parad(SEXP x, SEXP y){
	int i,n,max;
	Rcpp::NumericVector vector1(x);
	Rcpp::NumericVector vector2(y);
	n=vector2.size();
	Rcpp::NumericVector product(n);
	max=omp_get_max_threads();
	omp_set_num_threads(max);

#pragma omp parallel for
for(i=0;i<n;i++){
product[i]=vector1[i]/vector2[i];
}
return(product);
}

The purpose of this code is rather trivial, just dividing each element of x by the corresponding element of y. Now the compiler must be told to link against the openmp libraries so to compile we write

[michael@michael rcpp]$ export PKG_LIBS='`Rscript -e "Rcpp:::LdFlags()"` -fopenmp -lgomp'
[michael@michael rcpp]$ export PKG_CXXFLAGS='`Rscript -e "Rcpp:::CxxFlags()"` -fopenmp'
[michael@michael rcpp]$ R CMD SHLIB parad.cpp

How does this fair against the standard R “x/y”? To compare the “rbenchmark” package will be used.

library(Rcpp)
library(rbenchmark)
dyn.load('unpar.so')
dyn.load('parad.so')

x=rnorm(10000000,0,1)
y=rnorm(10000000,0,1)
benchmark(replications=rep(1,0,1),.Call('unpar',x,y),.Call('parad',x,y),x/y)
> benchmark(replications=rep(1,0,1),.Call('unpar',x,y),.Call('parad',x,y),x/y)
                  test replications elapsed relative user.self sys.self
2   .Call("parad", x, y)            1   0.036    1.000     0.112    0.002
1 .Call("unpar", x, y)            1   0.082    2.278     0.081    0.000
3                  x/y            1   0.053    1.472     0.048    0.005

The parallel version of the x-divided-by-y code is even faster than R’s native “x/y”.

Converting between C++ and R datatypes using “as” and “wrap”

If you feel more comfortable seeing familiar C/C++ variable types, one is free to convert all of the function arguments of SEXP type to the desired C++ variable types for use in the body of the code and then to convert them back to SEXP at the end before returning the function. Say, for example, that I have a vector of 100 standard normal draws as one might obtain from x=rnorm(100,0,1). Passing x as an argument to the C++ function and the variable type of x is SEXP. I can then convert it to a vector of doubles using the “as” command, perform any processing with it in the body of the function and then convert it back to SEXP at the end using the “wrap” command. Here is a short example:

#include <Rcpp.h>
#include <cstdlib>
#include <iostream>
#include <vector>

using namespace std;
RcppExport SEXP conv(SEXP x, SEXP y){
	int i,n;
	vector<double> vector1=Rcpp::as<vector<double> >(x);
	vector<double> vector2=Rcpp::as<vector<double> >(y);
n=vector1.size();
vector<double> product(n);

for(i=0;i<n;i++){
product[i]=vector1[i]/vector2[i];
}
return( Rcpp::wrap(product) );
}

The x and y SEXP’s are converted to vector doubles and then converted back at the end. I like this approach because my function remains looking like C++.

Streamlined sourceCpp() and cppFunction()

It is not necessary to compile, link and load the C++ code as above. There exist some wrappers for all of these tasks such as include, sourceCpp and cppFunction. I like sourceCpp() the best. Consider the C++ code stored in parad.cpp shown below:

#include <cstdlib>
#include <iostream>
#include <Rcpp.h>
#include <omp.h>

using namespace std;
// [[Rcpp::export]]
Rcpp::NumericVector parad(Rcpp::NumericVector x, Rcpp::NumericVector y){
	int i,n,max;
	n=x.size();
	Rcpp::NumericVector product(n);
	max=omp_get_max_threads();
	omp_set_num_threads(max);

#pragma omp parallel for
for(i=0;i<n;i++){
product[i]=x[i]/y[i];
}
return(product);
}

This is a neat little parallel C++ function. This can be compiled, linked and loaded in one step using the sourceCpp() function. All we need to do is set some terminal environment variables for the compiler to use openMP.

library(Rcpp)
Sys.setenv("PKG_CXXFLAGS"="-fopenmp")
Sys.setenv("PKG_LIBS"="-fopenmp")
sourceCpp("parad.cpp")
a=rnorm(1000,0,1)
b=rnorm(1000,0,1)
c=parad(a,b)

Lines 2 and 3 are simply setting terminal environment variables from within R so that the compiler compiles as “g++ …. -fopenmp …” which we need for the “omp.h” header. sourceCpp is a wrapper that takes care of everything. After that one can immediately start using the parad() function.