-
Notifications
You must be signed in to change notification settings - Fork 0
Learning to program in R
The lab's prefered programming language is R. Please try to use R unless you really must use another language. If you need to use a python function, consider using reticulate to call it from within R.
If you have never done any programming then I recommend starting with the Khan academy courses. Make sure that you are familiar with the following concepts: if-statements, for-loops, variables, arrays and functions. I have not done any of these courses myself so please let me know how you get on with them. If you find any better resources then please communicate them to me.
Imperial provides tutorials for command line, version control with git and python. While I don't use python much and would prefer that you learn R, the basic principles of programming are the same across languages.
Have a look to see if any workshops are being run by the Software Carpentries within the UK soon. They were given funding by CZI to help train people for bioinformatics so I assume the courses are good.
Here’s a tutorial written by a prominent R developer called Hadley Wickham. It’s a good intro to data visualisation using R. The ggplot2 package is probably the best data vis tool in any programming language.The tutorial does teach a style of code (‘tidyverse’) that I don’t use much, but it is popular:
For learning R, many people recommend the tutorials by SwirlStats.
This cheatsheet explains many basic functions using the two main styles of R.
Consider installing Sublime Text. A good text editor is always useful.
The default settings for R are not good for handling large datasets. Amend these as follows:
In bash:
cd ~
touch .Renviron
open .Renviron
Then add:
R_MAX_VSIZE=700Gb
Save
Restart R
Don't worry about this if you are new to programming.
If you'll be using R on the cluster you might want to get used to using the ClusterMQ R package. This lets you submit jobs to the Imperial cluster from within R, much the same as with a normal loop. Create a .PBStemplate file in your home directory on the server, containing the code below, to enable it to run.
#PBS -N {{ job_name }}
#PBS -l select=1:ncpus={{ cores | 1 }}:mem=1gb
#PBS -l walltime={{ walltime | 0:05:00 }}
source activate monocle
ulimit -v $(( 1024 * {{ memory | 4096 }} ))
CMQ_AUTH={{ auth }} R --no-save --no-restore -e 'clustermq:::worker("{{ master }}")'
#NOTPBS -o {{ log_file | /rds/general/user/nskene/home/logs/ }}
#NOTPBS -j oe
- Home
- Useful Info
- To do list for new starters
- Recommended Reading
-
Computing
- Our Private Cloud System
- Cloud Computing
- Docker
- Creating a Bioconductor package
- PBS example scripts for the Imperial HPC
- HPC Issues list
- Nextflow
- Analysing TIP-seq data with the nf-core/cutandrun pipeline
- Shared tools on Imperial HPC
- VSCode
- Working with Google Cloud Platform
- Retrieving raw sequence data from the SRA
- Submitting read data to the European Nucleotide Archive
- R markdown
- Lab software
- Genetics
- Reproducibility
- The Lab Website
- Experimental
- Lab resources
- Administrative stuff