ospool | ||
---|---|---|
|
Often we may need to add R external libraries that are not part of the base R installation. This tutorial describes how to create custom R libraries for use in jobs on the OSPool.
The material in this tutorial builds upon the Run R Scripts on the OSPool tutorial. If you are not already familiar with how to run R jobs on the OSPool, please see that tutorial first for a general introduction.
First we'll need to create a working directory, you can either run
$ git clone https://github.com/OSGConnect/tutorial-R-addlib
or type the following:
$ mkdir tutorial-R-addlib
$ cd tutorial-R-addlib
Similar to the general R tutorial, we will create a script to use as a test
example. If you did not clone the tutorial, create a script called
hello_world.R
that contains the following:
#!/usr/bin/env Rscript
library(cowsay)
say("Hello World!", "cow")
We will run one more command that makes the script executable, meaning that it can be run directly from the command line:
$ chmod +x hello_world.R
Using the same container that we used for the general R tutorial, we will
add the package we want to use (in this case, the cowsay
package) to create
a new container that we can use for our jobs.
The new container will be generated from a "definition" file. If it isn't already
present, create a file called cowsay.def
that has the following lines:
Bootstrap: docker
From: opensciencegrid/osgvo-r:3.5.0
%post
R -e "install.packages('cowsay', dependencies=TRUE, repos='http://cran.rstudio.com/')"
This file basically says that we want to start with one of the existing OSPool R
containers and add the cowsay
package from CRAN.
To create the new container, set the following variables:
$ export TMPDIR=$HOME
$ export APPTAINER_CACHE_DIR=$HOME
And then run this command:
apptainer build cowsay-test.sif cowsay.def
It may take 5-10 minutes to run. Once complete, if you run ls
, you should see a
file in your current directory called cowsay-test.sif
. This is the new container.
Building containers can be a new skill and slightly different for different packages! We recommend looking at our container guides and container training materials to learn more -- these are both linked from our main guides page. There are also some additional tips at the end of this tutorial on building containers with R packages.
Start the container you created by running:
$ apptainer shell cowsay-test.sif
Now we can test our R script:
Singularity :~/tutorial-R-addlib> ./hello_world.R
If this works, we will have a message with a cow printed to our terminal. Once we have this output, we'll exit the container for now with exit
:
Singularity :~/tutorial-R-addlib> exit
$
For this job, we want to use the custom container we just created. For efficiency, it is best to transfer this to the job using the OSDF. If you want to use the container you just built, copy it to the appropriate directory listed here, based on which Access Point you are using.
Our submit file, R.submit
should then look like this:
+SingularityImage = "osdf://osgconnect/public/osg/tutorial-R-addlib/cowsay-test.sif"
executable = hello_world.R
# arguments
log = R.log.$(Cluster).$(Process)
error = R.err.$(Cluster).$(Process)
output = R.out.$(Cluster).$(Process)
+JobDurationCategory = "Medium"
request_cpus = 1
request_memory = 1GB
request_disk = 1GB
queue 1
Change the osdf://
link in the submit file to be right for YOUR Access Point and
username, if you are using your own container file.
Reminder: Files placed in the OSDF can be copied to other data spaces ("caches") where they are NOT UPDATED. If you make a new container to use with your jobs, make sure to give it a different name or put it at a different path than the previous container. You will not be able to replace the exact path of the existing container.
Now we are ready to submit the job:
$ condor_submit R.submit
and check the job status:
$ condor_q
Once the job finished running, check the output file as before. They should look like this:
$ cat R.out.0000.0
-----
Hello World!
------
\ ^__^
\ (oo)\ ________
(__)\ )\ /\
||------w|
|| ||
There is a lot of variety in how to build custom containers! The two main decisions you need to make are a) what to use as your "base" or starting container and what packages to install.
There is a useful overview of building containers from our container training, linked on our training page.
In this guide we used one of the existing OSPool R containers. You can see the other versions of R that we support on our list of OSPool Supported Containers
Another good option for a base container are the "rocker" Docker containers: Rocker on DockerHub
To use a different container as the base container, you just change the top of the definition file. So to use the rocker tidyverse container as my starting point, I would have a definition file header like this:
Bootstrap: docker
From: rocker/tidyverse:4.1.3
When using containers from DockerHub, it's a good idea to pick a version (look at
the "Tags" tab for options). Above, this container would be version 4.1.3
of R.
The sample definition file from this tutorial installed one package. If you have multiple packages, you can change the "install.packages" command to install multiple packages:
%post
R -e "install.packages(c('cowsay','here'), dependencies=TRUE, repos='http://cran.rstudio.com/')"
If your base container is one of the "rocker" containers, you can use a different tool to install packages that looks like this:
%post
install2.r cowsay
or for multiple packages:
%post
install2.r cowsay here
Remember, you only need to install packages that aren't already in the container. If
you start with the tidyverse container, you don't need to install ggplot2
or dplyr
-
those are already in the container and you would be adding packages on top.