-
Notifications
You must be signed in to change notification settings - Fork 650
Description
Please make sure these conditions are met
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of scanpy.
- (optional) I have confirmed this bug exists on the main branch of scanpy.
What happened?
Whether the highly variable gene selection method seurat_v3 of Scanpy is really the same as the vst method of Seurat.
Hello, I am very grateful that you have developed a single-cell analysis package in the Python environment, but I have little knowledge of Scanpy. Today, when I was using Scanpy and R for preprocessing the same counts matrix, I encountered a problem: Whether the highly variable gene selection method seurat_v3 of Scanpy is really the same as the vst method of Seurat? I used the same standardization method.
R:
seu_obj_corrected <- NormalizeData(seu_obj_corrected, normalization.method = "LogNormalize", scale.factor = 10000)
python:
sc.pp.normalize_total(adata_Fy32_filtered, target_sum=1e4)
After my careful examination, the two are almost the same in standardization, with only a slight difference after many decimal places. This lays the groundwork for me to explain the highly variable gene selection method, avoiding that the difference in highly variable genes is due to the difference in standardization.
This is the Venn diagram I made for highly variable genes using R and Python. I used three different methods to select highly variable genes.
Scanpy:Seurat sc.pp.highly_variable_genes(adata_Fy32_filtered,flavor='seurat',n_top_genes=2000)
Scanpy:Seurat_v3 sc.pp.highly_variable_genes(adata_Fy32_filtered,flavor='seurat_v3',n_top_genes=2000)
Seurat seu_obj <- FindVariableFeatures(seu_obj, selection.method = "vst", nfeatures = 2000)
Supplement, my out is a matrix that has been corrected by soupX. Could it be related to this?
scanpy version = '1.11.0'
Thanks!
Minimal code sample
scanpy pipeline:
adata_Fy32_filtered.layers["counts"] = adata_Fy32_filtered.X
adata_Fy32_filtered.layers["soupX_counts"] = out.T
adata_Fy32_filtered.X = adata_Fy32_filtered.layers["soupX_counts"]
sc.pp.normalize_total(adata_Fy32_filtered, target_sum=1e4)
sc.pp.log1p(adata_Fy32_filtered)
sc.pp.highly_variable_genes(adata_Fy32_filtered,flavor='seurat_v3',n_top_genes=2000)
sc.pp.scale(adata_Fy32_filtered, zero_center=True)
sc.tl.pca(adata_Fy32_filtered,use_highly_variable=True)
sc.pp.neighbors(adata_Fy32_filtered,n_pcs = 10)
sc.tl.umap(adata_Fy32_filtered)
sc.tl.leiden(adata_Fy32_filtered,resolution=0.6)
sc.pl.umap(adata_Fy32_filtered, color=["leiden"])
Seurat pipeline:
seu_obj_corrected <- CreateSeuratObject(counts = out)
seu_obj_corrected <- NormalizeData(seu_obj_corrected, normalization.method = "LogNormalize", scale.factor = 10000)
seu_obj_corrected <- FindVariableFeatures(seu_obj_corrected, selection.method = "vst", nfeatures = 2000)
all.genes <- rownames(seu_obj_corrected)
seu_obj_corrected <- ScaleData(seu_obj_corrected, features = all.genes)
seu_obj_corrected <- RunPCA(seu_obj_corrected, features = VariableFeatures(object = seu_obj))
seu_obj_corrected <- FindNeighbors(seu_obj_corrected, dims = 1:10)
seu_obj_corrected <- FindClusters(seu_obj_corrected, resolution = 0.5)
seu_obj_corrected <- RunUMAP(seu_obj_corrected, dims = 1:10)
DimPlot(seu_obj_corrected, reduction = "umap")
Error output
Versions
| Package | Version |
| --------------- | ------- |
| session_info | 1.0.0 |
| rpy2 | 3.5.17 |
| matplotlib-venn | 1.1.2 |
| Dependency | Version |
| ------------------ | ---------------------- |
| overrides | 7.7.0 |
| jsonpointer | 3.0.0 |
| tqdm | 4.67.1 |
| igraph | 0.11.8 |
| tornado | 6.4.2 |
| rfc3339-validator | 0.1.4 |
| certifi | 2025.1.31 (2025.01.31) |
| pynndescent | 0.5.13 |
| pure_eval | 0.2.3 |
| babel | 2.17.0 |
| Send2Trash | 1.8.3 |
| isoduration | 20.11.0 |
| tzlocal | 5.3.1 |
| psutil | 7.0.0 |
| rfc3986-validator | 0.1.1 |
| websocket-client | 1.8.0 |
| json5 | 0.10.0 |
| natsort | 8.4.0 |
| h5py | 3.13.0 |
| asttokens | 3.0.0 |
| requests | 2.32.3 |
| pytz | 2025.1 |
| pillow | 11.1.0 |
| jedi | 0.19.2 |
| setuptools | 75.8.2 |
| python-dateutil | 2.9.0.post0 |
| cffi | 1.17.1 |
| pycparser | 2.22 |
| anyio | 4.9.0 |
| MarkupSafe | 3.0.2 |
| texttable | 1.7.0 |
| umap-learn | 0.5.7 |
| executing | 2.1.0 |
| python-json-logger | 3.3.0 |
| wcwidth | 0.2.13 |
| uri-template | 1.3.0 |
| parso | 0.8.4 |
| six | 1.17.0 |
| llvmlite | 0.44.0 |
| prometheus_client | 0.21.1 |
| fastjsonschema | 2.21.1 |
| torch | 2.6.0 (2.6.0+cu124) |
| ipython | 8.33.0 |
| matplotlib-inline | 0.1.7 |
| pickleshare | 0.7.5 |
| joblib | 1.4.2 |
| leidenalg | 0.10.2 |
| patsy | 1.0.1 |
| pexpect | 4.9.0 |
| ipywidgets | 8.1.5 |
| Cython | 3.0.12 |
| kiwisolver | 1.4.8 |
| numba | 0.61.0 |
| fqdn | 1.5.1 |
| defusedxml | 0.7.1 |
| cycler | 0.12.1 |
| stack_data | 0.6.3 |
| sniffio | 1.3.1 |
| PyYAML | 6.0.2 |
| debugpy | 1.8.12 |
| charset-normalizer | 3.4.1 |
| statsmodels | 0.14.4 |
| decorator | 5.2.1 |
| prompt_toolkit | 3.0.50 |
| Component | Info |
| --------- | ------------------------------------------------------------------------------ |
| Python | 3.10.16 | packaged by conda-forge | (main, Dec 5 2024, 14:16:10) [GCC 13.3.0] |
| OS | Linux-5.15.167.4-microsoft-standard-WSL2-x86_64-with-glibc2.35 |
| CPU | 32 logical CPU cores, x86_64 |
| GPU | No GPU found |
| Updated | 2025-03-30 17:05 |