RNAModR provides functions to map lists of genomic loci of RNA modifications to a reference mRNA transcriptome, and perform exploratory functional analyses of sites across the transcriptome trough visualisation and statistical analysis of the distribution of sites across transcriptome sections (5'UTR, CDS, 3'UTR).
RNAModR performs enrichment analyses to assess the statistical significance of the spatial and sequence-specific localisation of sites relative to null sites. Null sites can be generated automatically or may be supplied manually in the form of a list of genomic loci.
Note that enrichment analyses results may depend critically on the choice and validity of null sites. For example, in establishing a list of null sites for 5-methylcytidine modifications, the non-uniformity in the actual distribution of cytosines across transcript sections makes a simple position permutation approach inappropriate for generating null sites.
NOTE
This is production code! Use it at your own risk. That means that documentation may be incomplete and functions may return unexpected errors.
[Update September 2019]
Due to some substantial changes in functions/methods from some R/Bioconductor packages that RNAModR depends on, RNAModR's functionality is currently limited. Specifically,
BuildTx
(building a custom reference transcriptome) is broken (but you can still use the pre-generated transcriptomes from the links below); this seems to be related to major changes inGenomicRanges
version 1.32.0 and changes inRMariaDB
that completely break functionality ofGenomicFeatures::makeTxDbFromUCSC
(verified on MacOS Sierra). I am unsure how badly other OS are affected. A workaround/fix for MacOS Sierra (that should also work for other flavours) has been posted as part of Issue #5.plotRelDistDistribution
andplotRelDistEnrichment()
are broken; this is related to afore-mentioned changes inGenomicRanges
, which introduced aCompressedGRangesList
class as a replacement for theGRangesList
class.
I appreciate any and all testing; for issues, please open an official Issue on the GitHub project site.
[Update October 2019]
Functionality of most (all?) functions has been restored in a series of major code revisions. This has led to a new stable version 0.2.1. Due to substantial changes in R/Bioconductor packages that RNAModR
depends on and in the code base of RNAModR
itself, reference transcriptomes from older RNAModR
versions will not work with the current stable version 0.2.1 of RNAModR
. A full list of changes can be found in NEWS.
I appreciate and encourage any and all testing; for issues, please open an official Issue on the GitHub project site.
The github way (requires the devtools package)
-
Make sure you have the following R/Bioconductor packages installed
- AnnotationDbi
- beanplot
- Biostrings
- GenomeInfoDb
- GenomicFeatures
- GenomicRanges
- gplots
- RSQLite
- rtracklayer
The recommended way to install R/Bioconductor packages is to use
BiocManager::install
:BiocManager::install(c("AnnotationDbi", "beanplot", "Biostrings", "GenomeInfoDb", "GenomicFeatures", "GenomicRanges", "gplots", "RSQLite", "rtracklayer"))
Additionally, RNAModR requires two organism-specific R packages to contruct a custom transcriptome. Currently, RNAModR supports human and mouse data, based on the following reference genome versions
- Human: hg38, hg19
- Mouse: mm10, mm9.
Please install the corresponding organism- and version-matching R/Bioconductor packages. For example, if genomic loci of RNA modification are based on the human GRCh38/hg38 reference genome, RNAModR requires the following R/Bioconductor packages:
- BSgenome.Hsapiens.UCSC.hg38
- org.Hs.eg.db
which you can install in the usual way
BiocManager::install(c("BSgenome.Hsapiens.UCSC.hg38", "org.Hs.eg.db"))
We also offer the possibility to download pre-constructed transcriptome data, see section Downloadable transcriptome data.
-
If all package dependencies are met, install RNAModR with devtools
if (!require("devtools")) install.packages("devtools") devtools::install_github("mevers/RNAModR", build_vignettes = FALSE)
You can force a re-install of RNAModR by adding
devtools::install_github(..., force = TRUE)
The following lines of R code will load the RNAModR library, and plot the distribution of m6A sites [Linder et al., Nature Methods 12, 767 (2015)] across the 5'UTR, CDS and 3'UTR of the human hg38-based transcriptome.
Note: You can also download pre-constructed transcriptome data, see the next section for details.
# Load the library.
library(RNAModR)
library(magrittr)
# Build reference transcriptome.
# This might take a few minutes.
BuildTx("hg38")
# Load and map m6A sites to reference transcriptome.
posSites <- system.file("extdata", "miCLIP_m6A_Linder2015_hg38.bed", package = "RNAModR") %>%
ReadBED() %>%
SmartMap("m6A_Linder")
# Keep sites located in the 5'UTR, CDS and 3'UTR
posSites <- posSites %>%
FilterTxLoc(c("5'UTR", "CDS", "3'UTR"))
# Plot distribution across transcript sections
PlotSectionDistribution(posSites)
It is recommended to always custom-build transcriptome data using BuildTx()
.
To be added.
You may download pre-constructed transcriptome data files for the following reference genome versions
Homo sapiens
Mus musculus
There's no guarantee that older transcriptome data will still work; major changes in GenomicRanges
and other Bioconductor libraries may cause unpredictable behaviour and errors.
Homo sapiens
Mus musculus
In order to use the transcriptome data you need to copy the RData file into your working directory. You can check that RNAModR correctly finds the transcriptome data by running e.g.
BuildTx("hg38")
Provided you have copied the file tx_hg38.RData into the working directory, this should produce the following message
Found existing transcriptome data. Nothing to do.
To rebuild run with force = TRUE.
The most current RNAModR manual can be downloaded here.
Please contact Maurits Evers in case of questions/suggestions. In case of bugs/feature requests please open an issue on github.
The RNAModR R package is open source licensed under the GNU Public License, version 3 (GPLv3).