R is available on Wynton HPC via a contributed environment module.
To load the R module available in the CBI software stack, do:
[alice@dev1 ~]$ module load CBI
[alice@dev1 ~]$ module load r
which provides access to a modern version of R:
[alice@dev1 ~]$ R
R version 4.4.0 (2024-04-24) -- "Puppy Cup"
Copyright (C) 2024 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> 1+2
[1] 3
> quit()
Save workspace image? [y/n/c]: n
[alice@dev1 ~]$
To use an older version of R, specify the version when you load R, e.g.
[alice@dev1 ~]$ module load CBI
[alice@dev1 ~]$ module load r/3.5.3
In order to run R in jobs, the above R environment module needs to be
loaded just as when you run it interactively on a development node.
For example, to run the my_script.R
script, the job script should at
a minimum contain:
#! /usr/bin/env bash
#$ -S /bin/bash
#$ -cwd
module load CBI
module load r
Rscript my_script.R
R 4.4.0 was release on 2024-04-24 and Bioconductor 3.19 on 2024-05-01. As of 2024-05-03, there were 20,684 packages on CRAN and 3,578 packages on Bioconductor 3.19.
On 2024-05-03, we confirmed that 20,614 CRAN packages and 3,560 Bioconductor 3.19 packages install out of the box when following the below instructions. The packages that failed to install do so either because they depend on a system library that is not available on the cluster, or because they have bugs preventing them from being installed out of the box. If you need to install any of those, please reach out on one of the support channels.
The majority of R packages are available from CRAN (Comprehensive R Archive Network). Another dominant repository of R packages is Bioconductor, which provides R packages with a focus on bioinformatics. Packages available from Bioconductor are not available on CRAN, and vice versa. At times, you will find online instructions for installing R packages hosted on, for instance, GitHub and GitLab. Before installing an R package from such sources, we highly recommend to install the package from CRAN or Bioconductor, if it is available there, because packages hosted on the latter are stable releases and often better tested.
Before continuing, it is useful to understand where R packages looks for locally installed R packages. There are three locations that R considers:
Your personal R package library. This is located under ~/R/
,
e.g. ~/R/rocky8-x86_64-pc-linux-gnu-library/4.4-CBI-gcc13/
(optional) A site-wide R package library (not used on Wynton HPC)
The system-wide R package library part of the R installed, e.g. /wynton/home/cbi/shared/software/_rocky8/R-4.4.0-gcc13/lib64/R/library
For instance, when we try to load an R package:
> library(datasets)
R will search the above folders in order for R package ‘datasets’. When you start you fresh, the only R packages available to you are the ones installed in folder (3) - the system-wide library. The ‘datasets’ package comes with the R installation, so with a fresh setup, it will be loaded from the third location. As we will see below, when you install your own packages, they will all be installed into folder (1) - your personal library. The first time your run R, the personal library folder does not exists, so R will ask you whether or not you want to create that folder. If asked, you should always accept (answer ‘Yes’). If you had already created this folder, R will install into this folder without asking.
Finally, R undergoes a main update once a year (in April). For
example, R 4.4.0 was release in April 2024. The next main release
will be R 4.5.0 a year later. Whenever the y
component in R x.y.z
version is increased, you will start out with an empty personal
package folder specific for R x.y
(regardless of z
). This means
that you will have to re-install all R packages you had installed
during the year before the new main release came out. Yes, this can
be tedious and can take quite some time but it will improve stability
and yet allow the R developers to keep improving R. Of course, you
can still keep using an older version of R and all the packages you
have installed for that version - they will not be removed.
Packages available on CRAN can be installed using the
install.packages()
function in R. The default behavior of R is to
always ask you which one of the many CRAN mirrors you want to install
from (their content is all identical). To avoid this question, tell R
to always use the first one:
> chooseCRANmirror(ind = 1)
>
Now, in order to install, for instance, the zoo package available on CRAN, call:
> install.packages("zoo")
Warning in install.packages("zoo") :
'lib = "/wynton/home/cbi/shared/software/_rocky8/R-4.4.0-gcc13/lib64/R/library"' is not writable
Would you like to use a personal library instead? (yes/No/cancel)
We notice two things. First there is a warning mentioning that a “lib” folder was “not writable”. This is because your personal library folder did not yet exists and R tried to install to location (3) but failed (because you do not have write permission there). This is where R decided to ask you whether or not you want to install to a personal library. Answer ‘yes’:
Would you like to use a personal library instead? (yes/No/cancel) yes
Would you like to create a personal library
'~/R/rocky8-x86_64-pc-linux-gnu-library/4.4-CBI-gcc13'
to install packages into? (yes/No/cancel)
R wants to make sure you are aware what is done, so it will, conservatively, also ask if you accept the default location. Answer ‘yes’ for this folder to be created. After this, the current and all future package installation in R will be installed into this folder without further questions asked. In this example, we will get:
Would you like to create a personal library
'~/R/x86_64-pc-linux-gnu-library/4.4-CBI-gcc13'
to install packages into? (yes/No/cancel) yes
trying URL 'https://cloud.r-project.org/src/contrib/zoo_1.8-12.tar.gz'
Content type 'application/x-gzip' length 782344 bytes (764 KB)
==================================================
downloaded 764 KB
* installing *source* package ‘zoo’ ...
** package ‘zoo’ successfully unpacked and MD5 sums checked
** using staged installation
** libs
using C compiler: ‘gcc (GCC) 13.1.1 20230614 (Red Hat 13.1.1-4)’
gcc -I"/wynton/home/cbi/shared/software/_rocky8/R-4.4.0-gcc13/lib/R/include" -DNDEBUG -I../inst/include -I/usr/local/include -fpic -g -O2 -c coredata.c -o coredata.o
gcc -I"/wynton/home/cbi/shared/software/_rocky8/R-4.4.0-gcc13/lib/R/include" -DNDEBUG -I../inst/include -I/usr/local/include -fpic -g -O2 -c init.c -o init.o
gcc -I"/wynton/home/cbi/shared/software/_rocky8/R-4.4.0-gcc13/lib/R/include" -DNDEBUG -I../inst/include -I/usr/local/include -fpic -g -O2 -c lag.c -o lag.o
gcc -shared -L/wynton/home/cbi/shared/software/_rocky8/R-4.4.0-gcc13/lib/R/lib -L/usr/local/lib -o zoo.so coredata.o init.o lag.o -L/wynton/home/cbi/shared/software/_rocky8/R-4.4.0-gcc13/lib/R/lib -lR
installing to /wynton/home/boblab/aliceR/rocky8-x86_64-pc-linux-gnu-library/4.4-CBI-gcc13/00LOCK-zoo/00new/zoo/libs
** R
** demo
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded from temporary location
** checking absolute paths in shared objects and dynamic libraries
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (zoo)
The downloaded source packages are in
'/scratch/alice/RtmpVm3e6t/downloaded_packages'
>
If there is no mentioning of an “error” (a “warning” is ok in R but
never an “error”), then the package was successfully installed. If
you see * DONE (zoo)
at the end, it means that the package was
successfully installed. As with any other package in R, you can also
verify that it is indeed installed by loading it, i.e.
> library(zoo)
Attaching package: 'zoo'
The following objects are masked from 'package:base':
as.Date, as.Date.numeric
>
If a new version of one or more CRAN packages is released, they can be installed by calling:
> chooseCRANmirror(ind = 1)
> update.packages()
...
Per Bioconductor’s best practices, R packages from Bioconductor should
be installed using BiocManager::install()
. This is to guarantee
maximum compatibility between all Bioconductor packages.
If you already have BiocManager installed, you can skip this
section. When you start out fresh, the package BiocManager is not
installed meaning that calling BiocManager::install()
will fail. We
need to start by installing it from CRAN (sic!);
> install.packages("BiocManager")
Installing package into '/wynton/home/boblab/aliceR/rocky8-x86_64-pc-linux-gnu-library/4.4-CBI-gcc13'
(as 'lib' is unspecified)
trying URL 'https://cloud.r-project.org/src/contrib/BiocManager_1.30.22.tar.gz'
Content type 'application/x-gzip' length 582690 bytes (569 KB)
==================================================
downloaded 569 KB
* installing *source* package ‘BiocManager’ ...
** package ‘BiocManager’ successfully unpacked and MD5 sums checked
** using staged installation
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded from temporary location
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (BiocManager)
The downloaded source packages are in
'/scratch/alice/RtmpSRgaB4/downloaded_packages'
>
Comment: If this is the very first R package you installed, see above CRAN instructions for setting a default CRAN mirror and creating a personal library folder.
With BiocManager installed, we can now install any Bioconductor package. For instance, to install limma, and all of its dependencies, call:
> BiocManager::install("limma")
Bioconductor version 3.19 (BiocManager 1.30.22), R 4.4.0 (2024-04-24)
Installing package(s) 'BiocVersion'
trying URL 'https://bioconductor.org/packages/3.19/bioc/src/contrib/BiocVersion_3.19.1.tar.gz'
Content type 'application/x-gzip' length 987 bytes
==================================================
downloaded 987 bytes
* installing *source* package ‘limma’ ...
** using staged installation
** libs
using C compiler: ‘gcc (GCC) 10.3.1 20210422 (Red Hat 10.3.1-1)’
gcc -I"/wynton/home/cbi/shared/software/_rocky8/R-4.4.0-gcc13/lib64/R/include" -DNDEBUG -I/usr/local/include -fpic -g -O2 -c init.c -o init.o
gcc -I"/wynton/home/cbi/shared/software/_rocky8/R-4.4.0-gcc13/lib64/R/include" -DNDEBUG -I/usr/local/include -fpic -g -O2 -c normexp.c -o normexp.o
gcc -I"/wynton/home/cbi/shared/software/_rocky8/R-4.4.0-gcc13/lib64/R/include" -DNDEBUG -I/usr/local/include -fpic -g -O2 -c weighted_lowess.c -o weighted_lowess.o
gcc -shared -L/wynton/home/cbi/shared/software/_rocky8/R-4.4.0-gcc13/lib64/R/lib -L/usr/local/lib64 -o limma.so init.o normexp.o weighted_lowess.o -L/wynton/home/cbi/shared/software/_rocky8/R-4.4.0-gcc13/lib64/R/lib -lR
installing to /wynton/home/boblab/aliceR/rocky8-x86_64-pc-linux-gnu-library/4.4-CBI-gcc13/00LOCK-limma/00new/limma/libs
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded from temporary location
** checking absolute paths in shared objects and dynamic libraries
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (limma)
The downloaded source packages are in
‘/scratch/alice/Rtmp4dISqw/downloaded_packages’
>
There were no “error” messages, so the installation was successful. To verify that it worked, we can load the package in R as:
> library(limma)
>
To install Bioconductor updates, call BiocManager::install()
without arguments:
> BiocManager::install()
Comment: This will actually also update any CRAN packages.
If you have an R scripts, and it involves setting up a number of
parallel workers in R, do not use ncores <- detectCores()
of the
parallel package because it will result in your job hijacking
all cores on the compute node regardless of how many cores the
scheduler has given you. Taking up all CPU resources without
permission is really bad practice and a common cause for problems. A
much better solution is to use availableCores()
that is available in
the parallelly package, e.g. as ncores <-
parallelly::availableCores()
. This function is backward compatible
with detectCores()
while respecting what the scheduler has allocated
for your job.
As of 2024-04-26, the “recommended” MASS and Matrix packages require R (>= 4.4.0). If you run an older version of R, you can install older versions of them that are compatible with R (< 4.4.0) using:
> install.packages("https://cran.r-project.org/src/contrib/Archive/MASS/MASS_7.3-60.0.1.tar.gz", type = "source")
> install.packages("https://cran.r-project.org/src/contrib/Archive/Matrix/Matrix_1.6-5.tar.gz", type = "source")
Some R packages rely on the Message Passing Interface (MPI),
e.g. Rmpi, pbdMPI and bigGP. To use these, but also to
install them we need to load the built-in mpi
module;
[alice@dev1 ~]$ module load mpi/openmpi-x86_64
[alice@dev1 ~]$ module list
Currently Loaded Modules:
1) CBI 2) scl-gcc-toolset/13 3) r/4.4.0 4) mpi/openmpi-x86_64
Importantly, make sure to specify the exact version of the mpi
module as well so that your code will keep working also when a newer
version becomes the new default. Note that you will have to load the
same mpi
module, and version(!), also whenever you run R code that
requires these MPI-dependent R packages.
In addition to making OpenMPI available by loading the mpi
module,
several MPI-based R packages requires additional special care in order
to install. Below sections, show how to install them.
After loading the mpi
module, the Rmpi package installs
out-of-the-box like other R packages. After installing it, you can
verify that it works by running the following example commands:
[alice@dev1 ~]$ module load CBI r
[alice@dev1 ~]$ module load mpi/openmpi-x86_64
[alice@dev1 ~]$ R
...
> library(Rmpi)
[1684426121.677063] [c4-dev3:23125:0] sys.c:618 UCX ERROR shmget(size=2097152 flags=0xfb0) for mm_recv_desc failed: Operation not permitted, please check shared memory limits by 'ipcs -l'
> mpi.spawn.Rslaves() ## launch one or more MPI parallel workers
1 slaves are spawned successfully. 0 failed.
[1684426140.976380] [c4-dev3:23125:0] sys.c:618 UCX ERROR shmget(size=2097152 flags=0xb80) for ucp_am_bufs failed: Operation not permitted, please check shared memory limits by 'ipcs -l'
master (rank 0, comm 1) of size 2 is running on: dev1
slave1 (rank 1, comm 1) of size 2 is running on: dev1
> mpi.remote.exec(Sys.getpid()) ## get the process ID for one of them
out
1 189114
Contrary to Rmpi above, packages such pbdMPI and bigGP
require more hand-holding to install. For example, after having
loaded the mpi
module, we can install pbdMPI in R as:
> install.packages("pbdMPI", configure.args="--with-mpi-include=$MPI_INCLUDE --with-mpi-libpath=$MPI_LIB --with-mpi-type=OPENMPI")
Installing package into '/wynton/home/boblab/aliceR/rocky8-x86_64-pc-linux-gnu-library/4.4-CBI-gcc13'
(as 'lib' is unspecified)
* installing *source* package 'pbdMPI' ...
** package 'pbdMPI' successfully unpacked and MD5 sums checked
** using staged installation
setting mpi include path from MPI_INCLUDE
checking for sed... /usr/bin/sed
checking for mpicc... mpicc
checking for ompi_info... ompi_info
checking for pkg-config... /usr/bin/pkg-config
>> TMP_FOUND = Nothing found from mpicc --show & sed nor pkg-config ...
checking for openpty in -lutil... yes
checking for main in -lpthread... yes
******************* Results of pbdMPI package configure *****************
>> MPIRUN = /usr/lib64/openmpi/bin/mpirun
>> MPIEXEC = /usr/lib64/openmpi/bin/mpiexec
>> ORTERUN = /usr/lib64/openmpi/bin/orterun
>> TMP_INC =
>> TMP_LIB =
>> TMP_LIBNAME =
>> TMP_FOUND = Nothing found from mpicc --show & sed nor pkg-config ...
>> MPI_ROOT =
>> MPITYPE = OPENMPI
>> MPI_INCLUDE_PATH = /usr/include/openmpi-x86_64
>> MPI_LIBPATH = /usr/lib64/openmpi/lib
>> MPI_LIBNAME =
>> MPI_LIBS = -lutil -lpthread
>> MPI_DEFS = -DMPI2
>> MPI_INCL2 =
>> MPI_LDFLAGS =
>> PKG_CPPFLAGS = -I/usr/include/openmpi-x86_64 -DMPI2 -DOPENMPI
>> PKG_LIBS = -L/usr/lib64/openmpi/lib -lmpi -lutil -lpthread
>> PROF_LDFLAGS =
>> ENABLE_LD_LIBRARY_PATH = no
*************************************************************************
configure: creating ./config.status
config.status: creating src/Makevars
configure: creating ./config.status
config.status: creating src/Makevars
config.status: creating R/zzz.r
** libs
echo "MPIRUN = /usr/lib64/openmpi/bin/mpirun" > Makeconf
echo "MPIEXEC = /usr/lib64/openmpi/bin/mpiexec" >> Makeconf
echo "ORTERUN = /usr/lib64/openmpi/bin/orterun" >> Makeconf
echo "TMP_INC = " >> Makeconf
echo "TMP_LIB = " >> Makeconf
echo "TMP_LIBNAME = " >> Makeconf
echo "TMP_FOUND = Nothing found from mpicc --show & sed nor pkg-config ..." >> Makeconf
echo "MPI_ROOT = " >> Makeconf
echo "MPITYPE = OPENMPI" >> Makeconf
echo "MPI_INCLUDE_PATH = /usr/include/openmpi-x86_64" >> Makeconf
echo "MPI_LIBPATH = /usr/lib64/openmpi/lib" >> Makeconf
echo "MPI_LIBNAME = " >> Makeconf
echo "MPI_LIBS = -lutil -lpthread" >> Makeconf
echo "MPI_DEFS = -DMPI2" >> Makeconf
echo "MPI_INCL2 = " >> Makeconf
echo "MPI_LDFLAGS = " >> Makeconf
echo "PKG_CPPFLAGS = -I/usr/include/openmpi-x86_64 -DMPI2 -DOPENMPI" >> Makeconf
echo "PKG_LIBS = -L/usr/lib64/openmpi/lib -lmpi -lutil -lpthread" >> Makeconf
echo "PROF_LDFLAGS = " >> Makeconf
echo "ENABLE_LD_LIBRARY_PATH = no" >> Makeconf
gcc -I"/wynton/home/cbi/shared/software/_rocky8/R-4.4.0-gcc13/lib64/R/include" -DNDEBUG -I/usr/include/openmpi-x86_64 -DMPI2 -DOPENMPI -I/usr/local/include -fpic -g -O2 -c comm_errors.c -o comm_errors.o
gcc -I"/wynton/home/cbi/shared/software/_rocky8/R-4.4.0-gcc13/lib64/R/include" -DNDEBUG -I/usr/include/openmpi-x86_64 -DMPI2 -DOPENMPI -I/usr/local/include -fpic -g -O2 -c comm_sort_double.c -o comm_sort_double.o
...
gcc -I"/wynton/home/cbi/shared/software/_rocky8/R-4.4.0-gcc13/lib64/R/include" -DNDEBUG -I/usr/include/openmpi-x86_64 -DMPI2 -DOPENMPI -I/usr/local/include -fpic -g -O2 -c zzz.c -o zzz.o
gcc -shared -L/wynton/home/cbi/shared/software/_rocky8/R-4.4.0-gcc13/lib64/R/lib -L/usr/local/lib64 -o pbdMPI.so comm_errors.o comm_sort_double.o comm_sort_integer.o pkg_dl.o pkg_tools.o spmd.o spmd_allgather.o spmd_allgatherv.o spmd_allreduce.o spmd_alltoall.o spmd_alltoallv.o spmd_bcast.o spmd_communicator.o spmd_communicator_spawn.o spmd_gather.o spmd_gatherv.o spmd_info.o spmd_recv.o spmd_reduce.o spmd_scatter.o spmd_scatterv.o spmd_send.o spmd_sendrecv.o spmd_sendrecv_replace.o spmd_tool.o spmd_utility.o spmd_wait.o zzz.o -L/usr/lib64/openmpi/lib -lmpi -lutil -lpthread -L/wynton/home/cbi/shared/software/_rocky8/R-4.4.0-gcc13/lib64/R/lib -lR
installing via 'install.libs.R' to /wynton/home/boblab/aliceR/rocky8-x86_64-pc-linux-gnu-library/4.4-CBI-gcc13/00LOCK-pbdMPI/00new/pbdMPI
** R
** demo
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded from temporary location
[1684426347.259086] [c4-dev3:26986:0] sys.c:618 UCX ERROR shmget(size=2097152 flags=0xfb0) for mm_recv_desc failed: Operation not permitted, please check shared memory limits by 'ipcs -l'
[dev1.wynton.ucsf.edu:227206] pml_ucx.c:208 Error: Failed to create UCP worker
** checking absolute paths in shared objects and dynamic libraries
** testing if installed package can be loaded from final location
[1684426349.593601] [c4-dev3:27017:0] sys.c:618 UCX ERROR shmget(size=2097152 flags=0xfb0) for mm_recv_desc failed: Operation not permitted, please check shared memory limits by 'ipcs -l'
[dev1.wynton.ucsf.edu:227248] pml_ucx.c:208 Error: Failed to create UCP worker
** testing if installed package keeps a record of temporary installation path
* DONE (pbdMPI)
The downloaded source packages are in
'/scratch/alice/RtmpKNz5KF/downloaded_packages'
The bigGP installs the same way.
If we try to install the rjags package, we’ll get the following installation error in R:
> install.packages("rjags")
...
* installing *source* package 'rjags' ...
** package 'rjags' successfully unpacked and MD5 sums checked
** using staged installation
checking for pkg-config... /usr/bin/pkg-config
configure: WARNING: pkg-config file for jags 4 unavailable
configure: WARNING: Consider adding the directory containing `jags.pc`
configure: WARNING: to the PKG_CONFIG_PATH environment variable
configure: Attempting legacy configuration of rjags
checking for jags... no
configure: error: "automatic detection of JAGS failed. Please use pkg-config to locate the JAGS library. See the INSTALL file for details."
ERROR: configuration failed for package 'rjags'
* removing '/wynton/home/boblab/aliceR/rocky8-x86_64-pc-linux-gnu-library/4.4-CBI-gcc13/rjags'
ERROR: dependency 'rjags' is not available for package 'infercnv'
* removing '/wynton/home/boblab/aliceR/rocky8-x86_64-pc-linux-gnu-library/4.4-CBI-gcc13/infercnv'
The error says that the “JAGS library” is missing. It’s available via the CBI software stack. Load it before starting R:
$ module load CBI jags
and you’ll find that install.packages("rjags")
will complete
successfully.
Importantly, you need to load the jags
CBI module any time you run R
where the rjags R package needs to be loaded.
If we try to install the jqr package, it fails to compile;
> install.packages("jqr")
...
* installing *source* package ‘jqr’ ...
** package ‘jqr’ successfully unpacked and MD5 sums checked
** using staged installation
Using PKG_CFLAGS=
Using PKG_LIBS=-ljq
--------------------------- [ANTICONF] --------------------------------
Configuration failed because libjq was not found. Try installing:
* deb: libjq-dev (Debian, Ubuntu).
* rpm: jq-devel (Fedora, EPEL)
* csw: libjq_dev (Solaris)
* brew: jq (OSX)
If is already installed set INCLUDE_DIR and LIB_DIR manually via:
R CMD INSTALL --configure-vars='INCLUDE_DIR=... LIB_DIR=...'
-------------------------- [ERROR MESSAGE] ---------------------------
<stdin>:1:10: fatal error: jq.h: No such file or directory
compilation terminated.
--------------------------------------------------------------------
ERROR: configuration failed for package ‘jqr’
To fix this, load the jq
module from the CBI stack before launching R, i.e.
$ module load CBI r
$ module load CBI jq
$ R
after this, the jqr package will install out of the box.
Importantly, you need to load the jq
CBI module any time you run R
where the jqr R package needs to be loaded.
The udunits2 package does not install out of the box. It seems
to be due to a problem with the package itself, and the suggested
instructions that the package gives on setting environment variable
UDUNITS2_INCLUDE
do not work. A workaround to install the package
is to do:
install.packages("udunits2", configure.args="--with-udunits2-include=/usr/include/udunits2")