Skip to content

Commit 3798470

Browse files
committed
Update readme and vignettes
1 parent 0127a35 commit 3798470

File tree

6 files changed

+175
-38
lines changed

6 files changed

+175
-38
lines changed

README.md

+6-7
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,7 @@
11
# sctransform
2-
## R package for modeling single cell UMI expression data using regularized negative binomial regression
2+
## R package for normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression
33

4-
This packaged was developed by Christoph Hafemeister in [Rahul Satija's lab](https://satijalab.org/) at the New York Genome Center. A previous version of this work was used in the paper [Developmental diversification of cortical inhibitory interneurons, Nature 555, 2018](https://github.com/ChristophH/in-lineage). We are currently working on integrating the functionality of this package into [Seurat](https://satijalab.org/seurat/), an R package designed for QC, analysis, and exploration of single cell RNA-seq data.
5-
6-
This package is in beta status, please sanity check any results, and notify me of any issues you find.
4+
This packaged was developed by Christoph Hafemeister in [Rahul Satija's lab](https://satijalab.org/) at the New York Genome Center. Core functionality of this package has been integrated into [Seurat](https://satijalab.org/seurat/), an R package designed for QC, analysis, and exploration of single cell RNA-seq data.
75

86
## Quick start
97
`devtools::install_github(repo = 'ChristophH/sctransform')`
@@ -15,6 +13,7 @@ For usage examples see vignettes in inst/doc or use the built-in help after inst
1513

1614
Available vignettes:
1715
[Variance stabilizing transformation](https://rawgit.com/ChristophH/sctransform/master/inst/doc/variance_stabilizing_transformation.html)
18-
[Differential expression](https://rawgit.com/ChristophH/sctransform/master/inst/doc/differential_expression.html)
19-
[Batch correction](https://rawgit.com/ChristophH/sctransform/master/inst/doc/batch_correction.html)
20-
[Denoising](https://rawgit.com/ChristophH/sctransform/master/inst/doc/denoising.html)
16+
[Using sctransform in Seurat](https://rawgit.com/ChristophH/sctransform/master/inst/doc/seurat.html)
17+
18+
## Reference
19+
An early version of this work was used in the paper [Developmental diversification of cortical inhibitory interneurons, Nature 555, 2018](https://github.com/ChristophH/in-lineage).

inst/doc/denoising.R renamed to inst/doc/correcting.R

+4-4
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ maturation_score <- pricu$lambda/max(pricu$lambda)
3434
y_smooth <- sctransform::smooth_via_pca(vst_out$y, do_plot = TRUE)
3535

3636
## ------------------------------------------------------------------------
37-
cm_denoised <- sctransform::denoise(vst_out, data = y_smooth, show_progress = FALSE)
37+
cm_corrected <- sctransform::correct(vst_out, data = y_smooth, show_progress = FALSE)
3838

3939
## ---- fig.width=7, fig.height=7, out.width='100%'------------------------
4040
goi <- c('Nes', 'Ccnd2', 'Tuba1a')
@@ -46,10 +46,10 @@ df[[2]] <- melt(t(as.matrix(vst_out$y[goi, ])), varnames = c('cell', 'gene'), va
4646
df[[2]]$type <- 'Pearson residual'
4747
df[[2]]$maturation_rank <- rank(maturation_score)
4848
df[[3]] <- melt(t(as.matrix(y_smooth[goi, ])), varnames = c('cell', 'gene'), value.name = 'value')
49-
df[[3]]$type <- 'de-noised Pearson residual'
49+
df[[3]]$type <- 'corrected Pearson residual'
5050
df[[3]]$maturation_rank <- rank(maturation_score)
51-
df[[4]] <- melt(t(as.matrix(cm_denoised[goi, ])), varnames = c('cell', 'gene'), value.name = 'value')
52-
df[[4]]$type <- 'de-noised UMI'
51+
df[[4]] <- melt(t(as.matrix(cm_corrected[goi, ])), varnames = c('cell', 'gene'), value.name = 'value')
52+
df[[4]]$type <- 'corrected UMI'
5353
df[[4]]$maturation_rank <- rank(maturation_score)
5454
df <- do.call(rbind, df)
5555
df$gene <- factor(df$gene, ordered=TRUE, levels=unique(df$gene))

inst/doc/denoising.Rmd renamed to inst/doc/correcting.Rmd

+10-10
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
---
2-
title: "Denoising"
2+
title: "Correcting UMI counts"
33
author: "Christoph Hafemeister"
44
date: "`r Sys.Date()`"
55
output: html_document
66
vignette: >
7-
%\VignetteIndexEntry{Denoising}
7+
%\VignetteIndexEntry{Correcting UMI counts}
88
%\VignetteEngine{knitr::rmarkdown}
99
%\VignetteEncoding{UTF-8}
1010
---
@@ -26,10 +26,10 @@ knitr::opts_chunk$set(
2626
old_theme <- theme_set(theme_classic(base_size=8))
2727
```
2828

29-
In this vignette we show how the regression model in the variance stabilizing transformation can be used to output de-noised data. Implicitly there are two levels of smoothing/de-noising when we apply the standard workflow using `vst`. First we specify latent variables that are used in the regression model - their contribution to the overall variance in the data will be removed. Second, we usually perform dimensionality reduction, which acts like a smoothing operation. Here we show how to reverse these two operations to obtain de-noised UMI counts.
29+
In this vignette we show how the regression model in the variance stabilizing transformation can be used to output corrected data. Implicitly there are two levels of smoothing/de-noising when we apply the standard workflow using `vst`. First we specify latent variables that are used in the regression model - their contribution to the overall variance in the data will be removed. Second, we usually perform dimensionality reduction, which acts like a smoothing operation. Here we show how to reverse these two operations to obtain corrected UMI counts.
3030

3131
## Load data and transform
32-
We use data from a recent publication: [Mayer, Hafemeister, Bandler et al., Nature 2018](https://dx.doi.org/10.1038/nature25999) [(free read-only version)](http://rdcu.be/JA5l). We load a subset of the cells, namely one of the CGE E12.5 dropseq samples with contaminating cell populations removed. These cells come from a developing continuum and provide a nice example for de-noising.
32+
We use data from a recent publication: [Mayer, Hafemeister, Bandler et al., Nature 2018](https://dx.doi.org/10.1038/nature25999) [(free read-only version)](http://rdcu.be/JA5l). We load a subset of the cells, namely one of the CGE E12.5 dropseq samples with contaminating cell populations removed. These cells come from a developing continuum and provide a nice example for de-noising and count correction.
3333

3434
First load the data and run variance stabilizing transformation.
3535
```{r}
@@ -58,13 +58,13 @@ We will now smooth the Pearson residual by PCA. Internally `smooth_via_pca` perf
5858
y_smooth <- sctransform::smooth_via_pca(vst_out$y, do_plot = TRUE)
5959
```
6060

61-
The data matrix `y_smooth` is in Pearson residual space. Based on these values we can reverse the negative binomial regression model to derive UMI counts per gene. To remove the variability from the latent factor (here `log_umi_per_gene` as a proxy of sequencing depth), we can use a fixed value for all cells. The next step uses the smoothed Pearson residual and the median of all latent factors to obtain de-noised UMI counts.
61+
The data matrix `y_smooth` is in Pearson residual space. Based on these values we can reverse the negative binomial regression model to derive UMI counts per gene. To remove the variability from the latent factor (here `log_umi_per_gene` as a proxy of sequencing depth), we can use a fixed value for all cells. The next step uses the smoothed Pearson residual and the median of all latent factors to obtain corrected UMI counts.
6262

6363
```{r}
64-
cm_denoised <- sctransform::denoise(vst_out, data = y_smooth, show_progress = FALSE)
64+
cm_corrected <- sctransform::correct(vst_out, data = y_smooth, show_progress = FALSE)
6565
```
6666

67-
To give a better idea of what the data really looks like we will plot UMI, Pearson residual, de-noised Pearson residual, and de-noised UMI counts for some key genes related to neuronal development.
67+
To give a better idea of what the data really looks like we will plot UMI, Pearson residual, corrected Pearson residual, and corrected UMI counts for some key genes related to neuronal development.
6868

6969
```{r, fig.width=7, fig.height=7, out.width='100%'}
7070
goi <- c('Nes', 'Ccnd2', 'Tuba1a')
@@ -76,10 +76,10 @@ df[[2]] <- melt(t(as.matrix(vst_out$y[goi, ])), varnames = c('cell', 'gene'), va
7676
df[[2]]$type <- 'Pearson residual'
7777
df[[2]]$maturation_rank <- rank(maturation_score)
7878
df[[3]] <- melt(t(as.matrix(y_smooth[goi, ])), varnames = c('cell', 'gene'), value.name = 'value')
79-
df[[3]]$type <- 'de-noised Pearson residual'
79+
df[[3]]$type <- 'corrected Pearson residual'
8080
df[[3]]$maturation_rank <- rank(maturation_score)
81-
df[[4]] <- melt(t(as.matrix(cm_denoised[goi, ])), varnames = c('cell', 'gene'), value.name = 'value')
82-
df[[4]]$type <- 'de-noised UMI'
81+
df[[4]] <- melt(t(as.matrix(cm_corrected[goi, ])), varnames = c('cell', 'gene'), value.name = 'value')
82+
df[[4]]$type <- 'corrected UMI'
8383
df[[4]]$maturation_rank <- rank(maturation_score)
8484
df <- do.call(rbind, df)
8585
df$gene <- factor(df$gene, ordered=TRUE, levels=unique(df$gene))

inst/doc/seurat.R

+44
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
## ----setup, include = FALSE----------------------------------------------
2+
library('Matrix')
3+
library('ggplot2')
4+
library('reshape2')
5+
library('sctransform')
6+
library('knitr')
7+
knitr::opts_chunk$set(
8+
collapse = TRUE,
9+
comment = "#>",
10+
digits = 2,
11+
fig.width=8, fig.height=5, dpi=100, out.width = '70%'
12+
)
13+
library('Seurat')
14+
#old_theme <- theme_set(theme_classic(base_size=8))
15+
16+
## ----eval=FALSE----------------------------------------------------------
17+
# devtools::install_github(repo = 'ChristophH/sctransform', ref = 'develop')
18+
# devtools::install_github(repo = 'satijalab/seurat', ref = 'release/3.0')
19+
# library(Seurat)
20+
# library(sctransform)
21+
22+
## ----load_data, warning=FALSE, message=FALSE, cache = T------------------
23+
pbmc_data <- Read10X(data.dir = "~/Downloads/pbmc3k_filtered_gene_bc_matrices/hg19/")
24+
pbmc <- CreateSeuratObject(counts = pbmc_data)
25+
26+
## ----apply_sct, warning=FALSE, message=FALSE, cache = T------------------
27+
# Note that this single command replaces NormalizeData, ScaleData, and FindVariableFeatures.
28+
# Transformed data will be available in the SCT assay, which is set as the default after running sctransform
29+
pbmc <- SCTransform(object = pbmc, verbose = FALSE)
30+
31+
## ----pca, fig.width=5, fig.height=5, cache = T---------------------------
32+
# These are now standard steps in the Seurat workflow for visualization and clustering
33+
pbmc <- RunPCA(object = pbmc, verbose = FALSE)
34+
pbmc <- RunUMAP(object = pbmc, dims = 1:20, verbose = FALSE)
35+
pbmc <- FindNeighbors(object = pbmc, dims = 1:20, verbose = FALSE)
36+
pbmc <- FindClusters(object = pbmc, verbose = FALSE)
37+
DimPlot(object = pbmc, label = TRUE) + NoLegend()
38+
39+
## ----fplot, fig.width = 10, fig.height=10, cache = F---------------------
40+
# These are now standard steps in the Seurat workflow for visualization and clustering
41+
FeaturePlot(object = pbmc, features = c("CD8A","GZMK","CCL5","S100A4"), pt.size = 0.3)
42+
FeaturePlot(object = pbmc, features = c("S100A4","CCR7","CD4","ISG15"), pt.size = 0.3)
43+
FeaturePlot(object = pbmc, features = c("TCL1A","FCER2","XCL1","FCGR3A"), pt.size = 0.3)
44+

inst/doc/seurat.Rmd

+68
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
---
2+
title: "Using sctransform in Seurat"
3+
author: "Christoph Hafemeister & Rahul Satija"
4+
date: '`r Sys.Date()`'
5+
output:
6+
html_document: default
7+
pdf_document: default
8+
vignette: >
9+
%\VignetteIndexEntry{Using sctransform in Seurat}
10+
%\VignetteEngine{knitr::rmarkdown}
11+
%\VignetteEncoding{UTF-8}
12+
---
13+
14+
```{r setup, include = FALSE}
15+
library('Matrix')
16+
library('ggplot2')
17+
library('reshape2')
18+
library('sctransform')
19+
library('knitr')
20+
knitr::opts_chunk$set(
21+
collapse = TRUE,
22+
comment = "#>",
23+
digits = 2,
24+
fig.width=8, fig.height=5, dpi=100, out.width = '70%'
25+
)
26+
library('Seurat')
27+
#old_theme <- theme_set(theme_classic(base_size=8))
28+
```
29+
30+
This vignette shows how to use the sctransform wrapper in Seurat.
31+
Install sctransform and Seurat v3
32+
```{r eval=FALSE}
33+
devtools::install_github(repo = 'ChristophH/sctransform', ref = 'develop')
34+
devtools::install_github(repo = 'satijalab/seurat', ref = 'release/3.0')
35+
library(Seurat)
36+
library(sctransform)
37+
```
38+
Load data and create Seurat object
39+
```{r load_data, warning=FALSE, message=FALSE, cache = T}
40+
pbmc_data <- Read10X(data.dir = "~/Downloads/pbmc3k_filtered_gene_bc_matrices/hg19/")
41+
pbmc <- CreateSeuratObject(counts = pbmc_data)
42+
```
43+
Apply sctransform normalization
44+
```{r apply_sct, warning=FALSE, message=FALSE, cache = T}
45+
# Note that this single command replaces NormalizeData, ScaleData, and FindVariableFeatures.
46+
# Transformed data will be available in the SCT assay, which is set as the default after running sctransform
47+
pbmc <- SCTransform(object = pbmc, verbose = FALSE)
48+
```
49+
Perform dimensionality reduction by PCA and UMAP embedding
50+
```{r pca, fig.width=5, fig.height=5, cache = T}
51+
# These are now standard steps in the Seurat workflow for visualization and clustering
52+
pbmc <- RunPCA(object = pbmc, verbose = FALSE)
53+
pbmc <- RunUMAP(object = pbmc, dims = 1:20, verbose = FALSE)
54+
pbmc <- FindNeighbors(object = pbmc, dims = 1:20, verbose = FALSE)
55+
pbmc <- FindClusters(object = pbmc, verbose = FALSE)
56+
DimPlot(object = pbmc, label = TRUE) + NoLegend()
57+
```
58+
Users can individually annotate clusters based on canonical markers. However, the sctransform normalization reveals sharper biological distinctions compared to the [standard Seurat workflow](https://satijalab.org/seurat/pbmc3k_tutorial.html), in a few ways:
59+
* Clear separation of three CD8 T cell populations (naive, memory, effector), based on CD8A, GZMK, CCL5, GZMK expression
60+
* Clear separation of three CD4 T cell populations (naive, memory, IFN-activated) based on S100A4, CCR7, IL32, and ISG15
61+
* Additional developmental sub-structure in B cell cluster, based on TCL1A, FCER2
62+
* Additional separation of NK cells into CD56dim vs. bright clusters, based on XCL1 and FCGR3A
63+
```{r fplot, fig.width = 10, fig.height=10, cache = F}
64+
# These are now standard steps in the Seurat workflow for visualization and clustering
65+
FeaturePlot(object = pbmc, features = c("CD8A","GZMK","CCL5","S100A4"), pt.size = 0.3)
66+
FeaturePlot(object = pbmc, features = c("S100A4","CCR7","CD4","ISG15"), pt.size = 0.3)
67+
FeaturePlot(object = pbmc, features = c("TCL1A","FCER2","XCL1","FCGR3A"), pt.size = 0.3)
68+
```

vignettes/seurat.Rmd

+43-17
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,14 @@
11
---
22
title: "Using sctransform in Seurat"
3-
author: "Christoph Hafemeister"
3+
author: "Christoph Hafemeister & Rahul Satija"
44
date: '`r Sys.Date()`'
55
output:
66
html_document: default
77
pdf_document: default
8-
vignette: |
9-
%\VignetteIndexEntry{Using sctransform in Seurat} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8}
8+
vignette: >
9+
%\VignetteIndexEntry{Using sctransform in Seurat}
10+
%\VignetteEngine{knitr::rmarkdown}
11+
%\VignetteEncoding{UTF-8}
1012
---
1113

1214
```{r setup, include = FALSE}
@@ -26,27 +28,51 @@ library('Seurat')
2628
```
2729

2830
This vignette shows how to use the sctransform wrapper in Seurat.
31+
Install sctransform and Seurat v3
2932

30-
Load data
31-
```{r load_data}
32-
pbmc_data <- readRDS(file = "~/Projects/data/pbmc3k_umi_counts.Rds")
33-
class(x = pbmc_data)
34-
dim(x = pbmc_data)
33+
```{r eval=FALSE}
34+
devtools::install_github(repo = 'ChristophH/sctransform', ref = 'develop')
35+
devtools::install_github(repo = 'satijalab/seurat', ref = 'release/3.0')
36+
library(Seurat)
37+
library(sctransform)
3538
```
3639

37-
Create Seurat object
38-
```{r create_s, warning=FALSE}
39-
s <- CreateSeuratObject(counts = pbmc_data)
40+
Load data and create Seurat object
41+
42+
```{r load_data, warning=FALSE, message=FALSE, cache = T}
43+
pbmc_data <- Read10X(data.dir = "~/Downloads/pbmc3k_filtered_gene_bc_matrices/hg19/")
44+
pbmc <- CreateSeuratObject(counts = pbmc_data)
4045
```
4146

4247
Apply sctransform normalization
43-
```{r apply_sct}
44-
s <- SCTransform(object = s, verbose = FALSE)
48+
49+
```{r apply_sct, warning=FALSE, message=FALSE, cache = T}
50+
# Note that this single command replaces NormalizeData, ScaleData, and FindVariableFeatures.
51+
# Transformed data will be available in the SCT assay, which is set as the default after running sctransform
52+
pbmc <- SCTransform(object = pbmc, verbose = FALSE)
4553
```
4654

4755
Perform dimensionality reduction by PCA and UMAP embedding
48-
```{r pca, fig.width=5, fig.height=5}
49-
s <- RunPCA(object = s, npcs = 20, verbose = FALSE)
50-
s <- RunUMAP(object = s, dims = 1:20, verbose = FALSE)
51-
DimPlot(object = s) + NoLegend()
56+
57+
```{r pca, fig.width=5, fig.height=5, cache = T}
58+
# These are now standard steps in the Seurat workflow for visualization and clustering
59+
pbmc <- RunPCA(object = pbmc, verbose = FALSE)
60+
pbmc <- RunUMAP(object = pbmc, dims = 1:20, verbose = FALSE)
61+
pbmc <- FindNeighbors(object = pbmc, dims = 1:20, verbose = FALSE)
62+
pbmc <- FindClusters(object = pbmc, verbose = FALSE)
63+
DimPlot(object = pbmc, label = TRUE) + NoLegend()
64+
```
65+
66+
Users can individually annotate clusters based on canonical markers. However, the sctransform normalization reveals sharper biological distinctions compared to the [standard Seurat workflow](https://satijalab.org/seurat/pbmc3k_tutorial.html), in a few ways:
67+
68+
* Clear separation of three CD8 T cell populations (naive, memory, effector), based on CD8A, GZMK, CCL5, GZMK expression
69+
* Clear separation of three CD4 T cell populations (naive, memory, IFN-activated) based on S100A4, CCR7, IL32, and ISG15
70+
* Additional developmental sub-structure in B cell cluster, based on TCL1A, FCER2
71+
* Additional separation of NK cells into CD56dim vs. bright clusters, based on XCL1 and FCGR3A
72+
73+
```{r fplot, fig.width = 10, fig.height=10, cache = F}
74+
# These are now standard steps in the Seurat workflow for visualization and clustering
75+
FeaturePlot(object = pbmc, features = c("CD8A","GZMK","CCL5","S100A4"), pt.size = 0.3)
76+
FeaturePlot(object = pbmc, features = c("S100A4","CCR7","CD4","ISG15"), pt.size = 0.3)
77+
FeaturePlot(object = pbmc, features = c("TCL1A","FCER2","XCL1","FCGR3A"), pt.size = 0.3)
5278
```

0 commit comments

Comments
 (0)