This package contains useful tools for the analysis of single-cell gene expression data using the statistical software R. The package places an emphasis on tools for quality control, visualisation and pre-processing of data before further downstream analysis.
We hope that scater
fills a useful niche between raw RNA-sequencing
count data and more focused downstream modelling tools such as
monocle,
scLVM,
SCDE,
edgeR,
limma and
so on.
Briefly, scater
enables the following:
- Automated computation of QC metrics
- Transcript quantification from read data with pseudo-alignment
- Data format standardisation
- Rich visualisations for exploratory analysis
- Seamless integration into the Bioconductor universe
- Simple normalisation methods
See below for information about installation, getting started and highlights of the package.
This package currently lives on GitHub, so I recommend using Hadley Wickham's
devtools
package to install scater
directly from GitHub. If you don't have
devtools
installed, then install that from CRAN (as shown below) and then run
the call to install scater
:
If you are using the development version of R, 3.3:
install.packages("devtools")
devtools::install_github("davismcc/scater", build_vignettes = TRUE)
If you are using the current release version of R, 3.2.3:
devtools::install_github("davismcc/scater", ref = "release-R-3.2", build_vignettes = TRUE)
I have recently submitted scater
to Bioconductor, so development of the
package is proceeding with the development version of R (version 3.3). As such,
the master branch of this repository requires R >= 3.3. If you are using the
release version of R, then please install using the adjusted command above.
Using the most recent version of R is strongly recommended (R 3.2.3 at the time of writing). Effort has been made to ensure the package works with R >3.0, but the package has not been tested with R <3.1.1.
There are several other packages from CRAN and Bioconductor that scater
uses,
so you will need to have these packages installed as well. The CRAN packages
should install automatically when scater
is installed, but you will need to
install the Bioconductor packages manually.
Not all of the following are strictly necessary, but they enhance the
functionality of scater
and are good packages in their own right. The commands
below should help with package installations.
CRAN packages:
install.packages(c("data.table", "ggplot2", "knitr", "matrixStats", "MASS",
"plyr", "reshape2", "rjson", "testthat", "viridis"))
Bioconductor packages:
source("http://bioconductor.org/biocLite.R")
biocLite(c("Biobase", "biomaRt", "edgeR", "limma", "rhdf5"))
Optional packages that are not strictly required but enhance the functionality of scater
:
install.packages(c("cowplot", "cluster", "mvoutlier", "parallel", "Rtsne"))
biocLite(c("destiny", "monocle"))
You might also like to install dplyr
for convenient data manipulation:
install.packages("dplyr")
The scater
package has been submitted to
Bioconductor and is currently under review.
The best place to start is the vignette. From inside an R session, load scater
and then browse the vignettes:
library(scater)
browseVignettes("scater")
There is a detailed HTML document available that introduces the main features
and functionality of scater
.
The diagram below provised an overview of the pre-processing and QC workflow possible in scater
, listing the functions that can be used at various stages.
The scater
package allows you to do some neat things relatively quickly. Some highlights are shown below with example code and screenshots.
- Automated computation of QC metrics
- Transcript quantification from read data with pseudo-alignment approaches
- Data format standardisation
- Rich visualisations for QC and exploratory analysis
- Seamless integration into the Bioconductor universe
- Simple normalisation methods
For details of how to use these functions, please consult the vignette and package documentation. The plots shown use the example data included with the package (for which there is no interesting structure) and as shown require only one or two lines of code to generate.
Use the calculateQCMetrics
function to compute many metrics useful for gene/transcript-level and cell-level QC. Metrics computed include number of genes expressed per cell, percentage of expression from control genes (e.g. ERCC spike-ins) and many more.
The runKallisto
function provides a wrapper to the kallisto
software for quantifying transcript abundance from FASTQ files using a pseudo-alignment approach. This new approach is extremely fast. With readKallisto
, transcript quantities can be read into a data object in R
.
Default plot
for an SCESet object gives cumulative expression for the
most-expressed features (genes or transcripts)
The plotTSNE
function produces a t-distributed stochastic neighbour embedding
plot for the cells.
The plotPCA
function produces a principal components analysis plot for the
cells.
The plotDiffusionMap
function produces a diffusion map plot for the cells.
The plotExpression
function plots the expression values for a selection of
features.
The plotQC
function produces a variety of QC plots useful for diagnostics and
feature and cell filtering. It can be used to plot the most highly-expressed
genes (or features) in the data set or create density plots to assess the
relative importance of explanatory variables, as well as many other
visualisations useful for QC.
The plotPhenoData
function plots two phenotype metadata variables (such as QC
metrics).
See also plotFeatureData
to plot feature (gene) metadata variables, including QC metrics.
Plus many, many more possibilities. Please consult the vignette and documentation for details.
The package leans heavily on previously published work and packages, namely
edgeR and
limma. The
SCESet
class is inspired by the CellDataSet
class from monocle,
and SCESet
objects in scater
can be easily converted to and from monocle's
CellDataSet
objects.
The package is currently in an Beta state. The major functionality of the package is settled, but it is still under development so may change from time to time. Please do try it and contact me with bug reports, feedback, feature requests, questions and suggestions to improve the package.
Davis McCarthy, December 2015