Skip to end of metadata
Go to start of metadata

Submitting an R job is very similar to submitting a serial job shown on Submitting Jobs


Running R scripts in batch

There are two primary commands to use when submitting R scripts: `Rscript` and `R CMD BATCH`. Both commands will execute the passed script but differ in the way they process output.

Running R scripts using `R CMD BATCH`

When utilizing `R CMD BATCH` all output will be directed to an `.Rout` file named after your script unless otherwise specified. For example:

serial_R.submit

#!/bin/sh
#SBATCH --time=00:30:00
#SBATCH --mem-per-cpu=1024
#SBATCH --job-name=TestJob

 

module load R/3.4

R CMD BATCH Rcode.R

In the above example, output for the job will be found in the file `Rcode.Rout`. Notice that we did not specify output and error files in our SLURM directives, these are not needed as all R output will go into the `.Rout` file. To direct output to a specific location, follow your `R CMD BATCH` command with the name of the file where you want output directed to, as follows:


serial_R.submit

#!/bin/sh
#SBATCH --time=00:30:00
#SBATCH --mem-per-cpu=1024
#SBATCH --job-name=TestJob

 

module load R/3.4

R CMD BATCH Rcode.R Rcodeoutput.txt

In this example, output from running the script `Rcode.R` will be placed in the file `Rcodeoutput.txt`.

To pass arguments to the script, they need to be specified after `R CMD BATCH` but before the script to be executed, and preferably preceded with `--args` as follows:

serial_R.submit

#!/bin/sh
#SBATCH --time=00:30:00
#SBATCH --mem-per-cpu=1024
#SBATCH --job-name=TestJob

 

module load R/3.4

R CMD BATCH "–args argument1 argument2 argument3" Rcode.R Rcodeoutput.txt

Running R scripts using `Rscript`

Using `Rscript` to execute R scripts differs from R CMD BATCH in that all output and errors from the script are directed to STDOUT and STDERR in a manner similar to other programs. This gives the user larger control over where to direct the output. For example, to run our script using `Rscript` the submit script could look like the following:

serial_R.submit

#!/bin/sh
#SBATCH --time=00:30:00
#SBATCH --mem-per-cpu=1024
#SBATCH --job-name=TestJob
#SBATCH --error=TestJob.%J.stderr
#SBATCH --output=TestJob.%J.stdout

 

module load R/3.4

Rscript Rcode.R

In the above example, STDOUT will be directed to the output file `TestJob.%J.stdout` and STDERR directed to `TestJob.%J.stderr`. You will notice that the example is very similar to to the serial example. The important line is the module load command. That tells Tusker to load the R framework into the environment so jobs may use it.

To pass arguments to the script when using `Rscript`, the arguments will follow the script name as in the example below:

serial_R.submit

#!/bin/sh
#SBATCH --time=00:30:00
#SBATCH --mem-per-cpu=1024
#SBATCH --job-name=TestJob
#SBATCH --error=TestJob.%J.stderr
#SBATCH --output=TestJob.%J.stdout

 

module load R/3.4

Rscript Rcode.R argument1 argument2 argument3

Multicore (parallel) R submission

Submitting a multicore R job to SLURM is very similar to Submitting an OpenMP Job, since both are running multicore jobs on a single node. Below is an example:

parallel_R.submit

#!/bin/sh
#SBATCH --ntasks-per-node=16
#SBATCH --nodes=1
#SBATCH --time=00:30:00
#SBATCH --mem-per-cpu=1024
#SBATCH --job-name=TestJob
#SBATCH --error=TestJob.%J.stdout
#SBATCH --output=TestJob.%J.stderr

 

module load R/3.4

R CMD BATCH Rcode.R

The above example will submit a single job which can use up to 16 cores.  

Be sure to use limits in your R code so you only use 16 cores, or your performance will suffer.  For example, when using the parallel package function mclapply:

parallel.R
library("parallel")
...
mclapply(rep(4, 5), rnorm, mc.cores=16)

Multinode R submission with Rmpi

Submitting a multinode MPI R job to SLURM is very similar to Submitting an MPI Job, since both are running multicore jobs on a multiple nodes. Below is an example of running Rmpi on Crane on 2 nodes and 32 cores:

Rmpi.submit

#!/bin/sh
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=16
#SBATCH --time=00:30:00
#SBATCH --mem-per-cpu=1024
#SBATCH --job-name=TestJob
#SBATCH --error=TestJob.%J.stdout
#SBATCH --output=TestJob.%J.stderr

module load compiler/gcc/4.9 openmpi/1.10 R/3.3
export OMPI_MCA_mtl=^psm
mpirun -n 1 R CMD BATCH Rmpi.R

When you run Rmpi job on Crane, please use the line export OMPI_MCA_mtl=^psm in your submit script. On the other hand, if you run Rmpi job on Tusker, you do not need to add this line. This is because of the different Infiniband cards Tusker and Crane use. Regardless of how may cores your job uses, the Rmpi package should always be run with mpirun -n 1 because it spawns additional processes dynamically.

Please find below an example of Rmpi R script provided by The University of Chicago Research Computing Center:

Rmpi.R
library(Rmpi)

# initialize an Rmpi environment
ns <- mpi.universe.size() - 1
mpi.spawn.Rslaves(nslaves=ns)

# send these commands to the slaves
mpi.bcast.cmd( id <- mpi.comm.rank() )
mpi.bcast.cmd( ns <- mpi.comm.size() )
mpi.bcast.cmd( host <- mpi.get.processor.name() )

# all slaves execute this command
mpi.remote.exec(paste("I am", id, "of", ns, "running on", host))

# close down the Rmpi environment
mpi.close.Rslaves(dellog = FALSE)
mpi.exit()

Adding packages

There are two options to install packages. The first is to run R on the login node and run R interactively to install packages. The second is to use the `R CMD INSTALL` command.

All R packages must be installed from the login node. R libraries are stored in user's home directories which are not writable from the worker nodes.

Installing packages interactively

  1. Load the R module with the command `module load R`
    1. Note that each version of R uses its own user libraries. To install packages under a specific version of R, specify which version by using the module load command followed by the version number. For example, to load R version 3.3, you would use the command `module load R/3.3`
  2. Run R interactively using the command `R`
  3. From within R, use the `install.packages()` command to install desired packages. For example, to install the package `ggplot2` use the command `install.packages("ggplot2")

Some R packages, require external compilers or additional libraries. If you see an error when installing your package you might need to load additional modules to make these compilers or libraries available. For more information about this, refer to the package documentation.

Installing packages using R CMD INSTALL

To install packages using `R CMD INSTALL` the zipped package must already be downloaded to the cluster. You can download package source using `wget`. Then the `R CMD INSTALL` command can be used when pointed to the full path of the source tar file. For example, to install ggplot2 the following commands are used:

# Download the package source:
wget https://cran.r-project.org/src/contrib/ggplot2_2.2.1.tar.gz

# Install the package:
R CMD INSTALL ./ggplot2_2.2.1.tar.gz

Additional information on using the `R CMD INSTALL` command can be found on the R documentation page.

  • No labels