Ralf Stubner — written Mar 7, 2018 — source
The RcppArrayFire package provides an interface from R to and from the ArrayFire library, an open source library that can make use of GPUs and other hardware accelerators via CUDA or OpenCL.
The official R bindings expose ArrayFire data structures as S4 objects in R, which would require a large amount of code to support all the methods defined in ArrayFire’s C/C++ API. RcppArrayFire instead, which is derived from RcppFire by Kazuki Fukui, follows the lead of packages like RcppArmadillo or RcppEigen to provide seamless communication between R and ArrayFire at the C++ level.
Please note that RcppArrayFire is developed and tested on Linux systems. There is preliminary support for Mac OS X.
In order to use RcppArrayFire you will need the ArrayFire library and header files. While ArrayFire has been packaged for Debian, I currently prefer using upstream’s binary installer or building from source.
RcppArrayFire is not on CRAN, but you can install the current version via drat:
If you have installed ArrayFire in a non-standard directory, you have to use the
configure argument --with-arrayfire
, e.g.:
Let’s look at the classical example of calculating \(\pi\) via simulation. The basic idea is to generate a large number of random points within the unit square. An approximation for \(\pi\) can then be calculated from the ratio of points within the unit circle to the total number of points. A vectorized implementation in R might look like this:
pi ~= 3.13999
user system elapsed 0.047 0.008 0.055
A simple way to use C++ code in R is to use the inline package or cppFunction()
from Rcpp, which are both possible with RcppArrayFire. An implementation in C++
using ArrayFire might look like this:
pi ~= 3.1451
user system elapsed 0.001 0.000 0.001
Several things are worth noting:
The syntax is almost identical. Besides the need for using types and a
different function name when generating random numbers, the argument f32
to
randu
as well as the float
type catches the eye. These instruct ArrayFire to
use single precision floats, since not all devices support double precision
floating point numbers. If you want and can to use double precision, you have to
specify f64
and double
.
The results are not the same since ArrayFire uses a different random number generator.
The speed-up can be quite impressive. However, the first invocation of a function is often not as fast as expected due to the just-in-time compilation used by ArrayFire. This can be circumvented by using a warm-up call with (normally) fewer computations.
Up to now we have only considered simple types like double
or int
as function
parameters and return values. However, we can also use arrays. Consider the case
of an European put option that was recently handled with R, Rcpp and RcppArmadillo.
The Armadillo based function from this post reads:
This function can be applied to a range of spot prices:
[,1] [1,] 5.52021 [2,] 4.58142 [3,] 3.68485 [4,] 2.85517 [5,] 2.11883 [6,] 1.49793
Porting this code to RcppArrayFire is straight forward:
Compared with the implementations in R, Rcpp and RcppArmadillo the syntax is again almost the same. One exception is that ArrayFire does not contain a function for the cumulative normal distribution function. However, the closely related error function is available.
Since an object of type af::array
can contain different data types,
the templated wrapper class RcppArrayFire::typed_array<>
is used to indicate the
desired data type when converting from R to C++. Again single precision floats are
used with ArrayFire, which leads to differences of the order \(10^{-6}\) compared to the
results from R, Rcpp and RcppArmadillo:
[1] 5.52021 4.58142 3.68485 2.85517 2.11883 1.49793
The reason to use hardware accelerators is of course the quest for increased performance. How does ArrayFire fare in this respect? Using the same benchmark as in the R, Rcpp and RcppArmadillo comparison:
test replications elapsed relative 2 AF 100 0.489 1.000 1 Arma 100 1.928 3.943
Here a Nvidia GeForce GT 1030 is used together with ArrayFire’s CUDA backend. With a build-in Intel HD Graphics 520 using the OpenCL backend the ArrayFire solution is about 6 times faster. Even without a high performance GPU the performance boost from using ArrayFire can be quite impressive. However, the results change dramatically, if fewer options are evaluated:
test replications elapsed relative 1 Arma 1000 0.007 1.000 2 AF 1000 0.088 12.571
But is it realistic to process \(10^6\) options at once? Probably not in the way used in the benchmark where only the spot price is allowed to vary. However, one can alter the function to process not only arrays of spot prices but also arrays of strikes, risk free rates etc.:
Note that ArrayFire does not recycle elements if arrays with non-matching dimensions are combined. In this particular case this means that all arrays must have the same length. One can ensure that by using a data frame for the values:
s k r y t sigma p 1 87.4192 50 0.01 0.02 1 0.05 7.45937e-29 2 48.7060 50 0.01 0.02 1 0.05 2.09402e+00 3 67.2626 50 0.01 0.02 1 0.05 2.34672e-09 4 72.6573 50 0.01 0.02 1 0.05 6.84430e-14 5 68.0854 50 0.01 0.02 1 0.05 5.26394e-10 6 57.8775 50 0.01 0.02 1 0.05 2.57756e-03
The ArrayFire library provides a convenient way to use hardware accelerators without the need to write low-level OpenCL or CUDA code. The C++ syntax is actually quite similar to properly vectorized R code. The RcppArrayFire package makes this available to useRs. However, one still has to be careful: using hardware accelerators is not a “silver bullet” due to the inherent memory transfer overhead.
Tweet