More often than not, our stream of code is interrupted by a block of
code that takes a long time to run. If we are unable to parallellize our
code, we generally have no other option but to wait. We can use the beepr
package to signal when our code has been executed, but unfortunately
this does not enable us to continue working on the (R)project.
Luckily, there does exist a solution. Using RStudio/Posit, we can run a block of code in the background (see https://docs.posit.co/ide/user/ide/guide/tools/jobs.html). A “background job” is an R script that runs in a separate, dedicated R session. By default, the job will run in a clean R session and the temporary workspace will be discarded once the job is complete. In addition, you need an R script that contains the code that has to be executed. At this point, you are probably wondering whether we can do this without breaking the flow of your code and in just a single line.
So… can we do this? The answer is yes and this is one of the many
reasons why I love R! Hereto, you can use the infix operators
%<=%
and %{}%
of the CalibrationCurves
(or actuaRE
with the next update) package. Both packages are available on CRAN and
can be installed using install.packages
.
Using these operators, you can run (a block of) code in the background without the need of setting up a separate R script and the configurations. Further, you can keep working in your current working directory. Once the job is finished, the results will be imported into your current working directory.
# Load package and suppress messages, warnings for outdated packages
suppressWarnings(suppressMessages(library(CalibrationCurves)))
?`%<=%`
## starting httpd help server ... done
Let’s check how the %<=%
operator works with a simple
example. In essence, this works similar to the assignment operator
<-
. On the right-hand side you have the value that you
want to assign to the variable/object on the left-hand side. In the
following block of code, we take a sample of 10 000 000 from a normal
distribution using the rnorm
function and assign it to the
object x
.
## [1] "494FCE56"
When you run this code, the job ID will be printed in your console.
Once the code has been executed in the background, the value of the
right-hand side (i.e., rnorm(1e7)
) will be assigned to the
object of the left-hand side (i.e., x
) and the object on
the left-hand side (i.e., x
) will be imported into your
global environment. This operator is especially useful when you are
fitting a computationally heavy model (such as a Tweedie random effects
model or any random machine learning method).
The second operator %{}%
is designed to run long blocks
of code, where you want to export all created objects into your current
environment. Below, we have an example of a block of code that stores
all of the objects created on the right-hand side of the
%{}%
operator.
`%{}%`({
n = 5e5
B = runif(5)
X = MASS::mvrnorm(n, runif(5), diag(1, nrow = 5))
y = X %*% B + rnorm(n)
i = sample(seq_along(y), 1e4, FALSE)
Xs = X[i, ]
ys = y[i]
Bhat = solve(t(Xs) %*% Xs) %*% t(Xs) %*% ys
})
## [1] "C4EAEB74"
The example above show the first way to use the operator. Here, we
enclose the operator with backticks `%{}%`
and we enclose
the right-hand side with round and curly brackets
({..code...})
, respectively. The second method to use the
%{}%
operator is as follows
NULL %{}% {
n = 5e5
B = runif(5)
X = MASS::mvrnorm(n, runif(5), diag(1, nrow = 5))
y = X %*% B + rnorm(n)
i = sample(seq_along(y), 1e4, FALSE)
Xs = X[i, ]
ys = y[i]
Bhat = solve(t(Xs) %*% Xs) %*% t(Xs) %*% ys
}
## [1] "87EF0A97"
Here, we can include NULL
on the left-hand side since
nothing will get assigned to the left-hand side. Currently, I haven’t
found a more elegant way to define the operator.
You might be wondering what is happening when you use the infix operators. To verify what kind of magic we use, we inspect the body of the code.
## function(lhs, rhs) {
## Pkgs = names(sessionInfo()$otherPkgs)
## lhs = as.character(enquote(substitute(lhs))[2])
## rhs = as.character(enquote(substitute(rhs))[2])
## Job = c(if(is.null(Pkgs)) NULL else paste0("LibraryM(", paste0(Pkgs, collapse = ", "), ")"),
## paste0(lhs, " <- ", rhs))
## tmpR = tempfile(fileext = ".R")
## writeLines(Job, tmpR)
## if(!file.exists(tmpR))
## stop("Temporary R script not created")
## jobRunScript(tmpR, exportEnv = "R_GlobalEnv", importEnv = TRUE)
## }
## <bytecode: 0x000001fad4d37648>
## <environment: namespace:CalibrationCurves>
Here’s what happens on each line of code:
Pkgs
object.lhs
.rhs
.?LibraryM
). With the next line of code, we assign
rhs
to lhs
.tempfile
, we create a temporary R
script.jobRunScript
function from the rstudioapi
package. We defineexportEnv = "R_GlobalEnv"
to indicate
that the result should be imported into the global environment.
Additionally, by setting importEnv = TRUE
we ensure that a
copy of the global environment is imported when starting the background
job.The code for the %{}%
operator is largely the same. The
only difference is that we do not assign any value to the left-hand side
of the operator.
## function(lhs, rhs) {
## Pkgs = names(sessionInfo()$otherPkgs)
## Job = c(if(is.null(Pkgs)) NULL else paste0("LibraryM(", paste0(Pkgs, collapse = ", "), ")"),
## as.character(enquote(substitute(rhs))[2]))
## tmpR = tempfile(fileext = ".R")
## writeLines(Job, tmpR)
## if(!file.exists(tmpR))
## stop("Temporary R script not created")
## jobRunScript(tmpR, exportEnv = "R_GlobalEnv", importEnv = TRUE)
## }
## <bytecode: 0x000001fad5bc1ed0>
## <environment: namespace:CalibrationCurves>