This package provides support for the foreach looping construct. Foreach is an idiom that allows for iterating over elements in a collection, without the use of an explicit loop counter. The main reason for using this package is that it supports parallel execution, that is, it can execute repeated operations on multiple processors/cores on your computer, or on multiple nodes of a cluster. Many different adapters exist to use foreach with a variety of computational backends, including:
- doParallel: execute foreach loops on clusters created with base R's parallel package
- doFuture: using the future framework
- doRedis: on a Redis database
- doAzureParallel: on a compute cluster in Azure
- and more.
A basic for
loop in R that fits a set of models:
dat_list <- split(iris, iris$Species)
mod_list <- vector("list", length(dat_list))
for(i in seq_along(dat_list)) {
mod_list[[i]] <- lm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width, data=dat_list[[i]])
}
The same using foreach:
library(foreach)
mod_list2 <- foreach(dat=dat_list) %do% {
lm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width, data=dat)
}
The same, but fit in parallel on a background cluster. We change the %do%
operator to %dopar%
to indicate parallel processing.
library(doParallel)
registerDoParallel(3)
mod_list2 <- foreach(dat=dat_list) %dopar% {
lm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width, data=dat)
}
stopImplicitCluster()