Skip to content

Learning to program in R

Barney Hill edited this page Aug 11, 2021 · 1 revision

R

The lab's prefered programming language is R. Please try to use R unless you really must use another language. If you need to use a python function, consider using reticulate to call it from within R.

Learning to program in R:

If you have never done any programming then I recommend starting with the Khan academy courses. Make sure that you are familiar with the following concepts: if-statements, for-loops, variables, arrays and functions. I have not done any of these courses myself so please let me know how you get on with them. If you find any better resources then please communicate them to me.

Imperial provides tutorials for command line, version control with git and python. While I don't use python much and would prefer that you learn R, the basic principles of programming are the same across languages.

Have a look to see if any workshops are being run by the Software Carpentries within the UK soon. They were given funding by CZI to help train people for bioinformatics so I assume the courses are good.

Here’s a tutorial written by a prominent R developer called Hadley Wickham. It’s a good intro to data visualisation using R. The ggplot2 package is probably the best data vis tool in any programming language.The tutorial does teach a style of code (‘tidyverse’) that I don’t use much, but it is popular:

For learning R, many people recommend the tutorials by SwirlStats.

This cheatsheet explains many basic functions using the two main styles of R.

Consider installing Sublime Text. A good text editor is always useful.

Memory issues in R

The default settings for R are not good for handling large datasets. Amend these as follows:

In bash:

cd ~
touch .Renviron
open .Renviron

Then add:

R_MAX_VSIZE=700Gb

Save

Restart R

Parallel processing in R

Don't worry about this if you are new to programming.

If you'll be using R on the cluster you might want to get used to using the ClusterMQ R package. This lets you submit jobs to the Imperial cluster from within R, much the same as with a normal loop. Create a .PBStemplate file in your home directory on the server, containing the code below, to enable it to run.

#PBS -N {{ job_name }}
#PBS -l select=1:ncpus={{ cores | 1 }}:mem=1gb
#PBS -l walltime={{ walltime | 0:05:00 }}

source activate monocle
ulimit -v $(( 1024 * {{ memory | 4096 }} ))
CMQ_AUTH={{ auth }} R --no-save --no-restore -e 'clustermq:::worker("{{ master }}")'

#NOTPBS -o {{ log_file | /rds/general/user/nskene/home/logs/ }}
#NOTPBS -j oe
Clone this wiki locally