Skip to content

Whether the highly variable gene selection method seurat_v3 of Scanpy is really the same as the vst method of Seurat. #3542

@AdotaLover

Description

@AdotaLover

Please make sure these conditions are met

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of scanpy.
  • (optional) I have confirmed this bug exists on the main branch of scanpy.

What happened?

Whether the highly variable gene selection method seurat_v3 of Scanpy is really the same as the vst method of Seurat.
Hello, I am very grateful that you have developed a single-cell analysis package in the Python environment, but I have little knowledge of Scanpy. Today, when I was using Scanpy and R for preprocessing the same counts matrix, I encountered a problem: Whether the highly variable gene selection method seurat_v3 of Scanpy is really the same as the vst method of Seurat? I used the same standardization method.

R:
seu_obj_corrected <- NormalizeData(seu_obj_corrected, normalization.method = "LogNormalize", scale.factor = 10000)

python:
sc.pp.normalize_total(adata_Fy32_filtered, target_sum=1e4)

After my careful examination, the two are almost the same in standardization, with only a slight difference after many decimal places. This lays the groundwork for me to explain the highly variable gene selection method, avoiding that the difference in highly variable genes is due to the difference in standardization.

Image

This is the Venn diagram I made for highly variable genes using R and Python. I used three different methods to select highly variable genes.
Scanpy:Seurat sc.pp.highly_variable_genes(adata_Fy32_filtered,flavor='seurat',n_top_genes=2000)
Scanpy:Seurat_v3 sc.pp.highly_variable_genes(adata_Fy32_filtered,flavor='seurat_v3',n_top_genes=2000)
Seurat seu_obj <- FindVariableFeatures(seu_obj, selection.method = "vst", nfeatures = 2000)

Supplement, my out is a matrix that has been corrected by soupX. Could it be related to this?

scanpy version = '1.11.0'

Thanks!

Minimal code sample

scanpy pipelineadata_Fy32_filtered.layers["counts"] = adata_Fy32_filtered.X
adata_Fy32_filtered.layers["soupX_counts"] = out.T
adata_Fy32_filtered.X = adata_Fy32_filtered.layers["soupX_counts"]

sc.pp.normalize_total(adata_Fy32_filtered, target_sum=1e4)
sc.pp.log1p(adata_Fy32_filtered)
sc.pp.highly_variable_genes(adata_Fy32_filtered,flavor='seurat_v3',n_top_genes=2000)
sc.pp.scale(adata_Fy32_filtered, zero_center=True)
sc.tl.pca(adata_Fy32_filtered,use_highly_variable=True)
sc.pp.neighbors(adata_Fy32_filtered,n_pcs = 10)
sc.tl.umap(adata_Fy32_filtered)
sc.tl.leiden(adata_Fy32_filtered,resolution=0.6)
sc.pl.umap(adata_Fy32_filtered, color=["leiden"])

Seurat pipelineseu_obj_corrected <- CreateSeuratObject(counts = out)
seu_obj_corrected <- NormalizeData(seu_obj_corrected, normalization.method = "LogNormalize", scale.factor = 10000)
seu_obj_corrected <- FindVariableFeatures(seu_obj_corrected, selection.method = "vst", nfeatures = 2000)
all.genes <- rownames(seu_obj_corrected)
seu_obj_corrected <- ScaleData(seu_obj_corrected, features = all.genes)
seu_obj_corrected <- RunPCA(seu_obj_corrected, features = VariableFeatures(object = seu_obj))
seu_obj_corrected <- FindNeighbors(seu_obj_corrected, dims = 1:10)
seu_obj_corrected <- FindClusters(seu_obj_corrected, resolution = 0.5)
seu_obj_corrected <- RunUMAP(seu_obj_corrected, dims = 1:10)
DimPlot(seu_obj_corrected, reduction = "umap")

Error output

Versions

| Package         | Version |
| --------------- | ------- |
| session_info    | 1.0.0   |
| rpy2            | 3.5.17  |
| matplotlib-venn | 1.1.2   |

| Dependency         | Version                |
| ------------------ | ---------------------- |
| overrides          | 7.7.0                  |
| jsonpointer        | 3.0.0                  |
| tqdm               | 4.67.1                 |
| igraph             | 0.11.8                 |
| tornado            | 6.4.2                  |
| rfc3339-validator  | 0.1.4                  |
| certifi            | 2025.1.31 (2025.01.31) |
| pynndescent        | 0.5.13                 |
| pure_eval          | 0.2.3                  |
| babel              | 2.17.0                 |
| Send2Trash         | 1.8.3                  |
| isoduration        | 20.11.0                |
| tzlocal            | 5.3.1                  |
| psutil             | 7.0.0                  |
| rfc3986-validator  | 0.1.1                  |
| websocket-client   | 1.8.0                  |
| json5              | 0.10.0                 |
| natsort            | 8.4.0                  |
| h5py               | 3.13.0                 |
| asttokens          | 3.0.0                  |
| requests           | 2.32.3                 |
| pytz               | 2025.1                 |
| pillow             | 11.1.0                 |
| jedi               | 0.19.2                 |
| setuptools         | 75.8.2                 |
| python-dateutil    | 2.9.0.post0            |
| cffi               | 1.17.1                 |
| pycparser          | 2.22                   |
| anyio              | 4.9.0                  |
| MarkupSafe         | 3.0.2                  |
| texttable          | 1.7.0                  |
| umap-learn         | 0.5.7                  |
| executing          | 2.1.0                  |
| python-json-logger | 3.3.0                  |
| wcwidth            | 0.2.13                 |
| uri-template       | 1.3.0                  |
| parso              | 0.8.4                  |
| six                | 1.17.0                 |
| llvmlite           | 0.44.0                 |
| prometheus_client  | 0.21.1                 |
| fastjsonschema     | 2.21.1                 |
| torch              | 2.6.0 (2.6.0+cu124)    |
| ipython            | 8.33.0                 |
| matplotlib-inline  | 0.1.7                  |
| pickleshare        | 0.7.5                  |
| joblib             | 1.4.2                  |
| leidenalg          | 0.10.2                 |
| patsy              | 1.0.1                  |
| pexpect            | 4.9.0                  |
| ipywidgets         | 8.1.5                  |
| Cython             | 3.0.12                 |
| kiwisolver         | 1.4.8                  |
| numba              | 0.61.0                 |
| fqdn               | 1.5.1                  |
| defusedxml         | 0.7.1                  |
| cycler             | 0.12.1                 |
| stack_data         | 0.6.3                  |
| sniffio            | 1.3.1                  |
| PyYAML             | 6.0.2                  |
| debugpy            | 1.8.12                 |
| charset-normalizer | 3.4.1                  |
| statsmodels        | 0.14.4                 |
| decorator          | 5.2.1                  |
| prompt_toolkit     | 3.0.50                 |

| Component | Info                                                                           |
| --------- | ------------------------------------------------------------------------------ |
| Python    | 3.10.16 | packaged by conda-forge | (main, Dec  5 2024, 14:16:10) [GCC 13.3.0] |
| OS        | Linux-5.15.167.4-microsoft-standard-WSL2-x86_64-with-glibc2.35                 |
| CPU       | 32 logical CPU cores, x86_64                                                   |
| GPU       | No GPU found                                                                   |
| Updated   | 2025-03-30 17:05                                                               |

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions