Calling C++ from R using Rcpp

Why call C/C++ from R?

I really like programming in R. The fact that it is open source immediately wins my favour over Matlab. It can, however, be quite slow especially if you “speak” R with a strong C/C++ accent. This sluggishness, especially when writing unavoidable for loops, has led me to consider other programming languages. I have come to the conclusion that there is no all-round best programming language but a combination of R and C++ is very hard to beat. My resolution is to write the expensive codes in C++ and to call them as functions within R so that I still have the nice dynamic interactivity of an R session with the speed and optimization of C++. There are 3 ways you can get started implementing C code in R, namely, “.C,”, “.Call” and Rcpp. It is generally not recommended to use “.C” but to use “.Call” instead. “.Call”, however, requires quite a bit of boilerplate code and can be tiresome and obfuscating to write. “Rcpp”, while not intrinsically part of R itself, is a package written by D Eddelbuettel and R François which relies on “.Call” but allows you to write neater more efficient code by providing an interface between “.Call” and the programmer. There are many things to consider when writing C++ code for R but most likely you will want to get coding as fast as possible and worry about these other tangential matters later.

How to call C/C++ from R using Rcpp [The Hard/Conscientious Way]

Note: This is the longest and hardest way to compile C++ code for R, but it is arguably the most flexible for the conscientious user who requires complex code and desires to know all the details of whats going on. For a simpler no-nonsense approach scroll down to the “sourceCpp” command.
Lets jump right in with an example

#include <Rcpp.h>
#include <cstdlib>
#include <iostream>


using namespace std;
RcppExport SEXP comp(SEXP x, SEXP y){
        int i,n;
        Rcpp::NumericVector vector1(x);
        Rcpp::NumericVector vector2(y);
        n=vector2.size();
        Rcpp::NumericVector product(n);
for(i=0;i<n;i++){
product[i]=vector1[i]*vector2[i];
}
return(product);
}

Note: You do not need R.h and Rdefines.h when you use Rcpp.h.
Note: This is not the only way to compile and call code for R. See bottom for a neater alternative.
The function has two R objects as arguments, which are datatype “SEXP”, and returns an R object. The “SEXP” R objects, which are technically pointers to structures with typedef SEXPREC, are quite complex and so must be coerced into a vector using Rcpp::NumericVector, which makes a vector out of the SEXP object. Similarly, Rcpp::NumeriVector can be used to create a vector of length n. The rest is standard C++ code. Before compiling some compiler flags must be set:

[michael@michael rcpp]$ export PKG_CXXFLAGS=`Rscript -e "Rcpp:::CxxFlags()"`
[michael@michael rcpp]$ export PKG_LIBS=`Rscript -e "Rcpp:::LdFlags()"`

Compile with:

[michael@michael rcpp]$ R CMD SHLIB comp.cpp

The function is now ready to be used in R. Consider the following R code:

library(Rcpp)
dyn.load('comp.so')
x=rnorm(100,0,1)
y=rnorm(100,0,1)
.Call('comp',x,y)

First the Rcpp library is loaded. Then, C++ code is loaded with “dyn.load”, after which it is ready to be used in R. The “.Call” function calls the “comp” function and the two arguments are supplied after. Now lets consider a more interesting example by parallelizing the C++ with openmp.

Performance with OpenMP

#include <Rcpp.h>
#include <cstdlib>
#include <iostream>
#include <omp.h>

using namespace std;
RcppExport SEXP parad(SEXP x, SEXP y){
	int i,n,max;
	Rcpp::NumericVector vector1(x);
	Rcpp::NumericVector vector2(y);
	n=vector2.size();
	Rcpp::NumericVector product(n);
	max=omp_get_max_threads();
	omp_set_num_threads(max);

#pragma omp parallel for
for(i=0;i<n;i++){
product[i]=vector1[i]/vector2[i];
}
return(product);
}

The purpose of this code is rather trivial, just dividing each element of x by the corresponding element of y. Now the compiler must be told to link against the openmp libraries so to compile we write

[michael@michael rcpp]$ export PKG_LIBS='`Rscript -e "Rcpp:::LdFlags()"` -fopenmp -lgomp'
[michael@michael rcpp]$ export PKG_CXXFLAGS='`Rscript -e "Rcpp:::CxxFlags()"` -fopenmp'
[michael@michael rcpp]$ R CMD SHLIB parad.cpp

How does this fair against the standard R “x/y”? To compare the “rbenchmark” package will be used.

library(Rcpp)
library(rbenchmark)
dyn.load('unpar.so')
dyn.load('parad.so')

x=rnorm(10000000,0,1)
y=rnorm(10000000,0,1)
benchmark(replications=rep(1,0,1),.Call('unpar',x,y),.Call('parad',x,y),x/y)
> benchmark(replications=rep(1,0,1),.Call('unpar',x,y),.Call('parad',x,y),x/y)
                  test replications elapsed relative user.self sys.self
2   .Call("parad", x, y)            1   0.036    1.000     0.112    0.002
1 .Call("unpar", x, y)            1   0.082    2.278     0.081    0.000
3                  x/y            1   0.053    1.472     0.048    0.005

The parallel version of the x-divided-by-y code is even faster than R’s native “x/y”.

Converting between C++ and R datatypes using “as” and “wrap”

If you feel more comfortable seeing familiar C/C++ variable types, one is free to convert all of the function arguments of SEXP type to the desired C++ variable types for use in the body of the code and then to convert them back to SEXP at the end before returning the function. Say, for example, that I have a vector of 100 standard normal draws as one might obtain from x=rnorm(100,0,1). Passing x as an argument to the C++ function and the variable type of x is SEXP. I can then convert it to a vector of doubles using the “as” command, perform any processing with it in the body of the function and then convert it back to SEXP at the end using the “wrap” command. Here is a short example:

#include <Rcpp.h>
#include <cstdlib>
#include <iostream>
#include <vector>

using namespace std;
RcppExport SEXP conv(SEXP x, SEXP y){
	int i,n;
	vector<double> vector1=Rcpp::as<vector<double> >(x);
	vector<double> vector2=Rcpp::as<vector<double> >(y);
n=vector1.size();
vector<double> product(n);

for(i=0;i<n;i++){
product[i]=vector1[i]/vector2[i];
}
return( Rcpp::wrap(product) );
}

The x and y SEXP’s are converted to vector doubles and then converted back at the end. I like this approach because my function remains looking like C++.

Streamlined sourceCpp() and cppFunction()

It is not necessary to compile, link and load the C++ code as above. There exist some wrappers for all of these tasks such as include, sourceCpp and cppFunction. I like sourceCpp() the best. Consider the C++ code stored in parad.cpp shown below:

#include <cstdlib>
#include <iostream>
#include <Rcpp.h>
#include <omp.h>

using namespace std;
// [[Rcpp::export]]
Rcpp::NumericVector parad(Rcpp::NumericVector x, Rcpp::NumericVector y){
	int i,n,max;
	n=x.size();
	Rcpp::NumericVector product(n);
	max=omp_get_max_threads();
	omp_set_num_threads(max);

#pragma omp parallel for
for(i=0;i<n;i++){
product[i]=x[i]/y[i];
}
return(product);
}

This is a neat little parallel C++ function. This can be compiled, linked and loaded in one step using the sourceCpp() function. All we need to do is set some terminal environment variables for the compiler to use openMP.

library(Rcpp)
Sys.setenv("PKG_CXXFLAGS"="-fopenmp")
Sys.setenv("PKG_LIBS"="-fopenmp")
sourceCpp("parad.cpp")
a=rnorm(1000,0,1)
b=rnorm(1000,0,1)
c=parad(a,b)

Lines 2 and 3 are simply setting terminal environment variables from within R so that the compiler compiles as “g++ …. -fopenmp …” which we need for the “omp.h” header. sourceCpp is a wrapper that takes care of everything. After that one can immediately start using the parad() function.

  • I very much enjoyed reading your article. It is a long time ago that I programmed C++ the last time but it looks still familiar. Given the lack of Windows-compatible packages for utilizing my GPU I am eager to give CUDA a try using Rcpp.

    • admin

      schön dass es dir gefällt, Raffael. Ich muss aber zugeben dass ich es halt einfacher finde “.C” zu benutzen als Rcpp. Meine Meinung nach ist Rcpp eher für R-Nutzern als C++ Nutzern zugedacht…

  • I mean in that sense it would be the choice for as I am an R-user who would just consider resorting to C++ for specific purposes. But you also say that you would rather use “.C” than Rcpp. How is that? There is a very interesting SO thread between Hadley, Dirk and Romain and they (Dirk and Romain) make good points in favor of Rcpp. I would like to know more about your take on this. – your command of German is pretty good btw!

    • admin

      I recall that their main selling point was that using “.Call()” really obfuscates the code because you have to “protect” and “unprotect” variables all the time because otherwise the R garbage collector may delete variables that you are using. I tried “.Call()” once and Rcpp is certainly a large improvement on this because it takes care of the protection for you. I use “.C()”, however, which involves some copying of the variables to C++ and then you don’t need to worry about the garbage collector. Most people really object to this transfer “overhead”, but if I am porting something from R to C++ then it would necessarily be something very computationally intensive in the first place, in which case the transfer overhead is negligible in comparison. I guess the people who object to the transfer overhead are those who just want to port a small block of code like a nested for loop, in which case the objection is reasonable. This is what I mean by Rcpp is meant for R-users as opposed to C++ users. I started writing a package for R which I wanted to be as fast as possible, which was the reason I started looking in to this. The only interaction I wanted with R was receiving the initial data and sending the final results – so having nothing to do with R in the mean time, so .C() works just fine for me. Plus with .C() you use your regular C++ datatypes and expressions without worrying about converting them with as.something(). .C() just allows me to code C++ without getting in the way with strange SEXP datatypes and other things. I think you will find that most packages coded in C/C++ with just use .C(). That’s my take on it anyway.

      • I think I get your point! Right now I really don’t have any serious experience with either way. But I will come back to this topic when I do. Of course any introductory articles on your blog on .C/.Call/Rcpp are highly welcome – I guess I am not the only one who would appreciate it!

  • admin

    Cheers 🙂 I have another example of using Rcpp here:
    http://www.lindonslog.com/programming/parallel-random-number-generation-trng/

  • James

    replications=rep(1,0,1) – this is a misleading part of the whole article, could you please change it to sth. else?

    Also, i didn’t find the parad version faster than x/y – curious whether we are comparing 8cpus vs. 1 cpu?