Skip to content

Commit

Permalink
Update documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
saketkc committed Jan 8, 2022
1 parent c3111d3 commit e4deca0
Show file tree
Hide file tree
Showing 13 changed files with 133 additions and 45 deletions.
18 changes: 12 additions & 6 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,16 +1,22 @@
Package: sctransform
Type: Package
Title: Variance Stabilizing Transformations for Single Cell UMI Data
Version: 0.3.2.9009
Authors@R: person(given = 'Christoph', family = 'Hafemeister', email = 'christoph.hafemeister@nyu.edu', role = c('aut', 'cre'), comment = c(ORCID = '0000-0001-6365-8254'))
Version: 0.3.2.9010
Date: 2022-01-08
Authors@R: c(
person(given = "Christoph", family = "Hafemeister", email = "christoph.hafemeister@nyu.edu", role = c("aut", "cre"), comment = c(ORCID = "0000-0001-6365-8254")),
person(given = "Saket", family = "Choudhary", email = "schoudhary@nygenome.org", role = "ctb", comment = c(ORCID = "0000-0001-5202-7633")),
person(given = "Rahul", family = "Satija", email = "rsatija@nygenome.org", role = "ctb", comment = c(ORCID = "0000-0001-9448-8833"))
)
Description: A normalization method for single-cell UMI count data using a
variance stabilizing transformation. The transformation is based on a
negative binomial regression model with regularized parameters. As part of the
same regression framework, this package also provides functions for
batch correction, and data correction. See Hafemeister and Satija 2019
<doi:10.1186/s13059-019-1874-1> for more details.
URL: https://github.com/ChristophH/sctransform
BugReports: https://github.com/ChristophH/sctransform/issues
batch correction, and data correction. See Hafemeister and Satija (2019)
<doi:10.1186/s13059-019-1874-1>, and Choudhary and Satija (2021) <doi:10.1101/2021.07.07.451498>
for more details.
URL: https://github.com/satijalab/sctransform
BugReports: https://github.com/satijalab/sctransform/issues
License: GPL-3 | file LICENSE
Encoding: UTF-8
LazyData: true
Expand Down
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ importFrom(stats,predict)
importFrom(stats,t.test)
importFrom(stats,var)
importFrom(utils,capture.output)
importFrom(utils,packageVersion)
importFrom(utils,setTxtProgressBar)
importFrom(utils,txtProgressBar)
useDynLib(sctransform)
19 changes: 18 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,23 @@
# News
All notable changes will be documented in this file.

## [0.3.3] - UNRELEASED

### Added
- `vst.flavor` argument to `vst()` to allow for invoking running updated regularization (sctransform v2, proposed in [Satija and Choudhary, 2021](https://doi.org/10.1101/2021.07.07.451498). See paper for details.
- `scale_factor` to `correct()` to allow for a custom library size when correcting counts


## [0.3.2.9008] - 2021-07-28
### Added
- Add future.seed = TRUE to all `future_lapply()` calls

### Changed
- Wrap MASS::theta.ml() in suppressWarnings()

### Fixed
- Fix logical comparison of vectors of length one in `diff_mean_test()`

## [0.3.2.9003] - 2020-02-11
### Added
- `compare` argument to the nonparametric differential expression test `diff_mean_test()` to allow for multiple comparisons and various ways to specify which groups to compare
Expand Down Expand Up @@ -39,7 +56,7 @@ All notable changes will be documented in this file.
- Remove `poisson_fast` method (replaced by `qpoisson`)
- Use `matrixStats` package and remove `RcppEigen` dependency
- Use quasi poisson regression where possible
- Define cell detection event as counts >= 0.01 (instead of > 0) - this only matters to people playing around with fractional counts (see [issue #65](https://github.com/ChristophH/sctransform/issues/65))
- Define cell detection event as counts >= 0.01 (instead of > 0) - this only matters to people playing around with fractional counts (see [issue #65](https://github.com/satijalab/sctransform/issues/65))
- Internal code restructuring and improvements

### Fixed
Expand Down
2 changes: 2 additions & 0 deletions R/denoise.R
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,8 @@ correct <- function(x, data = 'y', cell_attr = x$cell_attr, as_is = FALSE,
#' @param x A list that provides model parameters and optionally meta data; use output of vst function
#' @param umi The count matrix
#' @param cell_attr Provide cell meta data holding latent data info
#' @param scale_factor Replace all values of UMI in the regression model by this value. Default is NA
#' which uses median of total UMI as the latent factor.
#' @param verbosity An integer specifying whether to show only messages (1), messages and progress bars (2) or nothing (0) while the function is running; default is 2
#' @param verbose Deprecated; use verbosity instead
#' @param show_progress Deprecated; use verbosity instead
Expand Down
8 changes: 4 additions & 4 deletions R/utils.R
Original file line number Diff line number Diff line change
Expand Up @@ -495,13 +495,13 @@ get_model_var <- function(vst_out, cell_attr = vst_out$cell_attr, use_nonreg = F

#' Get median of non zero UMIs from a count matrix using a subset of genes (slow)
#'
#' @param cm Count matrix
#' @param umi Count matrix
#' @param genes List of genes to calculate statistics. Default is NULL which returns the non-zero median using all genes
#'
#' @return A numeric value representing the median of non-zero entries from the UMI matrix
get_nz_median <- function(umi, genes = NULL){
cm.T <- Matrix::t(umi)
n_g <- dim(cm)[1]
n_g <- dim(umi)[1]
allnonzero <- c()
if (is.null(genes)) {
gene_index <- seq(1, nrow(umi))
Expand All @@ -517,10 +517,10 @@ get_nz_median <- function(umi, genes = NULL){

#' Get median of non zero UMIs from a count matrix
#'
#' @param cm Count matrix
#' @param umi Count matrix
#'
#' @return A numeric value representing the median of non-zero entries from the UMI matrix
get_nz_median2 <- function(umi, genes = NULL){
get_nz_median2 <- function(umi){
return (median(umi@x))
}

13 changes: 8 additions & 5 deletions R/vst.R
Original file line number Diff line number Diff line change
Expand Up @@ -34,10 +34,11 @@ NULL
#' @param gmean_eps Small value added when calculating geometric mean of a gene to avoid log(0); default is 1
#' @param theta_estimation_fun Character string indicating which method to use to estimate theta (when method = poisson); default is 'theta.ml', but 'theta.mm' seems to be a good and fast alternative
#' @param theta_given If method is set to nb_theta_given, this should be a named numeric vector of fixed theta values for the genes; if method is offset, this should be a single value; default is NULL
#' @param exclude_poisson Exclude poisson genes (i.e. mu < 0.001 or mu > variance) from regularization; default is FALSE
#' @param use_geometric_mean Use geometric mean instead of arithmetic mean for all calculations ; default is TRUE
#' @param use_geometric_mean_offset Use geoemtric mean insteaf of arithmetic mean in the offset model; default is FALSE
#' @param use_geometric_mean_offset Use geometric mean instead of arithmetic mean in the offset model; default is FALSE
#' @param fix_intercept Fix intercept as defined in the offset model; default is FALSE
#' @param fix_slope Fix slope to log(10) (eqivalent to using library size as an offset); default is FALSE
#' @param fix_slope Fix slope to log(10) (equivalent to using library size as an offset); default is FALSE
#' @param scale_factor Replace all values of UMI in the regression model by this value instead of the median UMI; default is NA
#' @param vst.flavor When set to `v2` sets method = glmGamPoi_offset, n_cells=2000, and exclude_poisson = TRUE which causes the model to learn theta and intercept only besides excluding poisson genes from learning and regularization; default is NULL which uses the original sctransform model
#' @param verbosity An integer specifying whether to show only messages (1), messages and progress bars (2) or nothing (0) while the function is running; default is 2
Expand Down Expand Up @@ -97,6 +98,7 @@ NULL
#' @importFrom stats glm glm.fit df.residual ksmooth model.matrix as.formula approx density poisson var bw.SJ
#' @importFrom utils txtProgressBar setTxtProgressBar capture.output
#' @importFrom methods as
#' @importFrom utils packageVersion
#'
#' @export
#'
Expand Down Expand Up @@ -206,7 +208,6 @@ vst <- function(umi,
umi <- umi[genes, ]
if (use_geometric_mean){
genes_log_gmean <- log10(row_gmean(umi, eps = gmean_eps))

} else {
genes_log_gmean <- log10(rowMeans(umi))
}
Expand Down Expand Up @@ -312,7 +313,8 @@ vst <- function(umi,
model_pars_fit <- reg_model_pars(model_pars, genes_log_gmean_step1, genes_log_gmean, cell_attr,
batch_var, cells_step1, genes_step1, umi, bw_adjust, gmean_eps,
theta_regularization, genes_amean, genes_var,
exclude_poisson, fix_intercept, fix_slope, use_geometric_mean_offset, verbosity)
exclude_poisson, fix_intercept, fix_slope,
use_geometric_mean, use_geometric_mean_offset, verbosity)
model_pars_outliers <- attr(model_pars_fit, 'outliers')
} else {
model_pars_fit <- model_pars
Expand Down Expand Up @@ -710,7 +712,8 @@ reg_model_pars <- function(model_pars, genes_log_gmean_step1, genes_log_gmean, c
batch_var, cells_step1, genes_step1, umi, bw_adjust, gmean_eps,
theta_regularization,
genes_amean = NULL, genes_var = NULL, exclude_poisson = FALSE,
fix_intercept = FALSE, fix_slope = FALSE, use_geometric_mean_offset = FALSE, verbosity = 0) {
fix_intercept = FALSE, fix_slope = FALSE, use_geometric_mean = TRUE,
use_geometric_mean_offset = FALSE, verbosity = 0) {
genes <- names(genes_log_gmean)
if (exclude_poisson | fix_slope | fix_intercept){
# exclude this from the fitting procedure entirely
Expand Down
53 changes: 35 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,48 +4,65 @@
This package was developed by Christoph Hafemeister in [Rahul Satija's lab](https://satijalab.org/) at the New York Genome Center. Core functionality of this package has been integrated into [Seurat](https://satijalab.org/seurat/), an R package designed for QC, analysis, and exploration of single cell RNA-seq data.

## Quick start
`devtools::install_github(repo = 'ChristophH/sctransform')`
`normalized_data <- sctransform::vst(umi_count_matrix)$y`

(you can also install from CRAN: `install.packages('sctransform'))`)
```r
# Install sctransform from CRAN
# install.packages("sctransform")

# Or the development version from GitHub:
# install.packages("remotes")
remotes::install_github("satijalab/sctransform", ref="develop")

normalized_data <- sctransform::vst(umi_count_matrix)$y
```

To invoke the `v2` flavor:

```r
normalized_data <- sctransform::vst(umi_count_matrix, vst.flavor="v2")$y

# Using Seurat
seurat_object <- Seurat::SCTransform(seurat_object, vst.flavor="v2")
```

## Help

For usage examples see vignettes in inst/doc or use the built-in help after installation
`?sctransform::vst`

Available vignettes:
[Variance stabilizing transformation](https://rawgit.com/ChristophH/sctransform/supp_html/supplement/variance_stabilizing_transformation.html)
[Using sctransform in Seurat](https://rawgit.com/ChristophH/sctransform/supp_html/supplement/seurat.html)

## Known Issues
- [Variance stabilizing transformation](https://rawgit.com/satijalab/sctransform/supp_html/supplement/variance_stabilizing_transformation.html)
- [Using sctransform in Seurat](https://rawgit.com/satijalab/sctransform/supp_html/supplement/seurat.html)

* `Error in is.nan` when a batch variable is used. Fixed in the develop branch. ([issue #88](https://github.com/ChristophH/sctransform/issues/88))
* `node stack overflow` error when Rfast package is loaded. The Rfast package does not play nicely with the future.apply package. Try to avoid loading the Rfast package. See discussions: https://github.com/RfastOfficial/Rfast/issues/5 https://github.com/ChristophH/sctransform/issues/108
## Known Issues

To install from the develop branch run `remotes::install_github("ChristophH/sctransform@develop")`
* `node stack overflow` error when Rfast package is loaded. The Rfast package does not play nicely with the future.apply package. Try to avoid loading the Rfast package. See discussions: https://github.com/RfastOfficial/Rfast/issues/5 https://github.com/satijalab/sctransform/issues/108

Please use [the issue tracker](https://github.com/ChristophH/sctransform/issues) if you encounter a problem
Please use [the issue tracker](https://github.com/satijalab/sctransform/issues) if you encounter a problem

## News
For a detailed change log have a look at the file [NEWS.md](https://github.com/ChristophH/sctransform/blob/master/NEWS.md)
For a detailed change log have a look at the file [NEWS.md](https://github.com/satijalab/sctransform/blob/master/NEWS.md)

### v0.3.2
This release improves the coefficient initialization in quasi poisson regression that sometimes led to errors. There are also some minor bug fixes and a new non-parametric differential expression test for sparse non-negative data (`diff_mean_test`, [this vignette](https://rawgit.com/ChristophH/sctransform/supp_html/supplement/np_diff_mean_test.html) gives some details).
This release improves the coefficient initialization in quasi poisson regression that sometimes led to errors. There are also some minor bug fixes and a new non-parametric differential expression test for sparse non-negative data (`diff_mean_test`, [this vignette](https://rawgit.com/satijalab/sctransform/supp_html/supplement/np_diff_mean_test.html) gives some details).

### v0.3.1
This release fixes a performance regression when `sctransform::vst` was called via `do.call`, as is the case in the Seurat wrapper.

Additionally, model fitting is significantly faster now, because we use a fast Rcpp quasi poisson regression implementation (based on `Rfast` package). This applies to methods `poisson`, `qpoisson` and `nb_fast`.

The `qpoisson` method is new and uses the dispersion parameter from the quasi poisson regression directly to estimate `theta` for the NB model. This can speed up the model fitting step considerably, while giving similar results to the other methods. [This vignette](https://rawgit.com/ChristophH/sctransform/supp_html/supplement/method_comparison.html) compares the methods.
The `qpoisson` method is new and uses the dispersion parameter from the quasi poisson regression directly to estimate `theta` for the NB model. This can speed up the model fitting step considerably, while giving similar results to the other methods. [This vignette](https://rawgit.com/satijalab/sctransform/supp_html/supplement/method_comparison.html) compares the methods.

### v0.3
The latest version of `sctransform` now supports the [glmGamPoi](https://github.com/const-ae/glmGamPoi) package to speed up the model fitting step. You can see more about the different methods supported and how they compare in terms of results and speed [in this new vignette](https://rawgit.com/ChristophH/sctransform/supp_html/supplement/method_comparison.html).
The latest version of `sctransform` now supports the [glmGamPoi](https://github.com/const-ae/glmGamPoi) package to speed up the model fitting step. You can see more about the different methods supported and how they compare in terms of results and speed [in this new vignette](https://rawgit.com/satijalab/sctransform/supp_html/supplement/method_comparison.html).

Also note that default theta regularization is now based on overdispersion factor (`1 + m / theta` where m is the geometric mean of the observed counts) not `log10(theta)`. The old behavior is still available via `theta_regularization` parameter. You can see how this changes (or doesn't change) the results [in this new vignette](https://rawgit.com/satijalab/sctransform/supp_html/supplement/theta_regularization.html).


Also note that default theta regularization is now based on overdispersion factor (`1 + m / theta` where m is the geometric mean of the observed counts) not `log10(theta)`. The old behavior is still available via `theta_regularization` parameter. You can see how this changes (or doesn't change) the results [in this new vignette](https://rawgit.com/ChristophH/sctransform/supp_html/supplement/theta_regularization.html).
## References

- Hafemeister, C. & Satija, R. Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression. Genome Biol 20, 296 (December 23, 2019). [https://doi.org/10.1186/s13059-019-1874-1](https://doi.org/10.1186/s13059-019-1874-1). An early version of this work was used in the paper [Developmental diversification of cortical inhibitory interneurons, Nature 555, 2018](https://github.com/ChristophH/in-lineage).

## Reference
Hafemeister, C. & Satija, R. Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression. Genome Biol 20, 296 (December 23, 2019). [https://doi.org/10.1186/s13059-019-1874-1](https://doi.org/10.1186/s13059-019-1874-1)
- Choudhary, S. & Satija, R. Comparison and evaluation of statistical error models for scRNA-seq. bioRxiv (2021). [https://doi.org/10.1101/2021.07.07.451498](https://doi.org/10.1101/2021.07.07.451498)

An early version of this work was used in the paper [Developmental diversification of cortical inhibitory interneurons, Nature 555, 2018](https://github.com/ChristophH/in-lineage).
4 changes: 2 additions & 2 deletions man/correct.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 4 additions & 0 deletions man/correct_counts.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

19 changes: 19 additions & 0 deletions man/get_nz_median.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

17 changes: 17 additions & 0 deletions man/get_nz_median2.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit e4deca0

Please sign in to comment.