The library(parallel) package is the typical way to use all of your cores with R. The package comes installed with your R. I used the predecessor packages called library(snow) and library(snowfall). They are still available and may be good options. In fact, I may go to them instead of library(parallel) even now. More recently, the library(furrr) package has really impressed me. I have created a parallel (no pun intended) example using the library(purrr) package.
library(parallel)
Now we need to set up our computer and R environment to leverage our processors.
# Calculate the number of cores
no_cores <- detectCores() - 1
# Initiate cluster
cl <- makeCluster(no_cores)

We now need to use the clusterExport() function to pass the objects from or main environment to each one of the ‘R’ cores/sessions.
varlist variable in clusterExport().clusterEvalQ()# load libraries
library(tidyverse)
clusterEvalQ(cl, library(tidyverse))
#devtools::install_github("hathawayj/buildings")
library(buildings) # remember that the 'permits' data object is created when the library is loaded.
a <- 4
ff <- function(x){
for (i in 1:1000){
i
}
ggplot() + geom_point(x = permits[x, "value"])
}
clusterExport(cl, varlist = c("a", "ff", "permits"))
Using the clusterExport() is important. We want to push the things we need but we should not push to many things as each process is taking memory. The Win-Vector Blog showed this video to depict what tends to happen.
lapply() and parLapply()Now the magic works best when we think with list objects.
list_object <- as.list(1:7500)
system.time(temp1 <- lapply(list_object, ff))
system.time(temp2 <- parLapply(cl, list_object, ff))
Finally, when we are done running our script, we need to stop the cluster.
stopCluster(cl)