R package expanding integrative analysis capabilities of Seurat by providing seamless access to popular integration methods. It also implements an integration benchmarking toolkit that gathers well-established performance metrics to help select the most appropriate integration.
Examples, documentation, memos, etc. are available on the SeuratIntegrate's website.
SeuratIntegrate provides support to R- and Python-based integration methods. The table below summarizes which methods are compatible with SeuratIntegrate:
Package | Method | Function | |
---|---|---|---|
R | SeuratIntegrate | ComBat | CombatIntegration() |
Harmony | HarmonyIntegration() |
||
MNN | MNNIntegration() |
||
Seurat | CCA | CCAIntegration() |
|
RPCA | RPCAIntegration() |
||
SeuratWrappers | FastMNN (batchelor) |
FastMNNIntegration() |
|
Python | SeuratIntegrate | BBKNN | bbknnIntegration() |
scVI | scVIIntegration() |
||
scANVI | scANVIIntegration() |
||
Scanorama | ScanoramaIntegration() |
||
trVAE | trVAEIntegration() |
Install SeuratIntegrate from github directly:
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
if (!require("remotes", quietly = TRUE))
install.packages("remotes")
remotes::install_github("cbib/Seurat-Integrate", dependencies = NA, repos = BiocManager::repositories())
To use Python methods, run the following commands (once) to set up the necessary conda environments:
library(SeuratIntegrate)
# Create envrionments
UpdateEnvCache("bbknn")
UpdateEnvCache("scvi") # also scANVI
UpdateEnvCache("scanorama")
UpdateEnvCache("trvae")
# Show cached environments
getCache()
Environments are persistently stored in the cache and the UpdateEnvCache()
commands should not need to be executed again.
While these environments should work well in most cases, conda's dependencies occasionally encounter conflicts. Manual adjustment might be needed. You may find helpful information in this vignette.
To integrate data with SeuratIntegrate, you need to preprocess your SeuratObject
until you obtain at least a PCA. Importantly, the SeuratObject
must have its layers split by batches.
Not familiar with Seurat?
Have a look at Seurat's website, especially the tutorials covering SCTransform and integrative analyses.
To fully benefit from the benchmarking toolkit, you'll need cell-type annotations of sufficient quality to be considered suitable as ground truth.
The benchmarking toolkit can benefit from additional dependencies:
# required to test for k-nearest neighbour batch effects
remotes::install_github('theislab/kBET')
# fast distance computation
install.packages('distances')
# faster Local Inverse Simpson’s Index computation
remotes::install_github('immunogenomics/lisi')
When your SeuratObject
is ready, you can launch multiple integrations (from Table 1) with a single command. DoIntegrate()
provides a flexible interface to customise integration-specific parameters and to control over associated data and features.
seu <- DoIntegrate(seu,
# ... integrations
CombatIntegration(layers = "data"),
HarmonyIntegration(orig = "pca", dims = 1:30),
ScanoramaIntegration(ncores = 4L, layers = "data"),
scVIIntegration(layers = "counts", features = Features(seu)),
# ...
use.hvg = TRUE, # `VariableFeatures()`
use.future = c(FALSE, FALSE, TRUE, TRUE)
)
In this example, all integration methods will use the variable features as input, with the exception of scVIIntegration()
which is set to use all features (features = Features(seu)
). CombatIntegration()
will correct the normalised counts (layers = "data"
), while scVIIntegration()
will train on raw counts (layers = "counts"
).
use.future
must be TRUE
for Python methods, and FALSE
for R methods (see Table 1).
Integration methods produce one or several outputs. Because they can be of different types, the following table indicates the post-processing steps to generate a UMAP.
Table 2: Output types and processingOutput type | Object name | Processing |
---|---|---|
Corrected counts | Assay |
ScaleData() ➔ RunPCA() ➔ RunUMAP() |
Dimensional reduction | DimReduc |
RunUMAP() |
KNN graph | Graph |
RunUMAP(umap.method = "umap-learn") |
Output types are summarized for each method in the Memo vignette about integration methods
SeuratIntegrate incorporates 11 scoring metrics: 6 quantify the degree of batch mixing (batch correction), while 5 assess the preservation of biological differences (bio-conservation) based on ground truth cell type labels.
To score your integrations, you must process their outputs as in the Processing column of Table 2. You'll also need to get a graph by running FindNeighbors(return.neighbor = TRUE)
(this vignette provides further guidance).
Then, scores can be obtained using the function Score[score_name]()
, or directly saved in the Seurat object using the AddScore[score_name]()
as follows:
# save the score in a variable
rpca_score <- ScoreRegressPC(seu, reduction = "[dimension_reduction]") #e.g. "pca"
# or save the score in the Seurat object
seu <- AddScoreRegressPC(seu, integration = "[name_of_integration]", reduction = "[dimension_reduction]")
It is worth noting that the unintegrated version must also be scored to perform a complete comparative analysis. When scores have been computed, they can be used to compare the integration outputs. See this vignette for a complete overview of available scores.
The advantage of the AddScore
over the Score
functions is that they facilitate score scaling and plotting:
# scale
seu <- ScaleScores(seu)
# plot
PlotScores(seu)
Examples, documentation, memos, etc. are available on SeuratIntegrate's website.
If you encounter a bug, please create an issue on GitHub. Likewise if you have a specific comment or question not covered on the website.
If you find SeuratIntegrate useful, please consider citing:
Specque, F., Barré, A., Nikolski, M., & Chalopin, D. (2025). SeuratIntegrate: an R package to facilitate the use of integration methods with Seurat. Bioinformatics. doi: 10.1093/bioinformatics/btaf358