Skip to content

sc.pl.dotplot is surprisingly ineffective for subsets of data #3717

@VladimirShitov

Description

@VladimirShitov

Please make sure these conditions are met

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of scanpy.
  • (optional) I have confirmed this bug exists on the main branch of scanpy.

What happened?

During data analysis, a very common operation is to check some markers in particular clusters of the data. Typically I do it by something like:

sc.pl.dotplot(
    adata[adata.obs["cell_type"].str.startswith("Macrophage")], 
    var_names=macrophage_markers,
    groupby="cell_type"
)

Surprisingly, this is a very computationally intensive step if a subset of cells is selected. It runs longer than on the full adata and sometimes even causes a kernel crash if I’m close to the RAM limit. I assume, part of adata is copied in this step? Would be great to fix this because this is a very frequent operation in the analysis, and I don’t see reasons why it is so expensive

Minimal code sample

cd_4_t_cell_markers = {
    "Central memory": ["IL7R", "TCF7", "LEF1", "CCR7", "SELL", "KLF2", "SOCS3"],
    "NR3C1": ["NR3C1", "NCOA1", "ZEB2", "AOAH", "PARP8", "CMIP", "CHST11", "RBPJ", "RUNX2", "ADAM19"],
    "Naive": ["BACH2", "KLF12", "FOXP1", "PRKCA", "PDE3B", "ABCC1", "SMCHD1"],
    "Treg": ["FOXP3", "IL2RA", "CTLA4", "TNFRSF4", "TNFRSF18", "PELI1", "TIGIT", "IKZF2"],
    "Effector memory": ["S100A4", "IL32", "LGALS1", "ALOX5AP", "CD52", "CCL5"],
    "Resident memory": ["ITGAE", "CD69"],
}

adata = sc.datasets.pbmc3k_processed()

# Uncomment in jupyter notebook
# %%time
sc.pl.dotplot(adata, var_names=cd_4_t_cell_markers, groupby="louvain")

# Run in a separate cell
# %%time
sc.pl.dotplot(
    adata[adata.obs["louvain"].str.contains(["T cells"]), 
    var_names=cd_4_t_cell_markers, groupby="louvain"
)

The second cell runs for longer even though it uses less data. For my dataset (200k cells), a plot for subset of the cells took twice as long – 11.1s vs 23s

Error output

Versions

-----
anndata     0.10.9
scanpy      1.10.3
-----
PIL                         10.1.0
adjustText                  1.3.0
anyio                       NA
appnope                     0.1.2
asttokens                   NA
attr                        23.1.0
attrs                       23.1.0
autograd                    NA
autograd_gamma              NA
babel                       2.11.0
backcall                    0.2.0
bottleneck                  1.4.2
brotli                      1.0.9
cachetools                  5.3.2
causallearn                 NA
cell2cell                   0.7.4
certifi                     2023.11.17
cffi                        1.16.0
chardet                     5.2.0
charset_normalizer          2.0.4
comm                        0.1.2
cryptography                41.0.7
cycler                      0.12.1
cython_runtime              NA
dateutil                    2.8.2
db_dtypes                   1.2.0
debugpy                     1.6.7
decorator                   5.1.1
decoupler                   1.8.0
defusedxml                  0.7.1
dill                        0.3.8
docrep                      0.3.2
dowhy                       0.11.1
ehrapy                      0.8.0
executing                   0.8.3
fastjsonschema              NA
fhiry                       3.2.2
filelock                    3.14.0
fontTools                   4.46.0
formulaic                   1.1.1
future                      1.0.0
google                      NA
grpc                        1.64.1
grpc_status                 NA
h5py                        3.9.0
idna                        3.4
igraph                      0.11.5
imblearn                    0.12.3
interface_meta              1.3.0
ipykernel                   6.25.0
ipywidgets                  8.0.4
jedi                        0.18.1
jinja2                      3.0.3
joblib                      1.3.2
json5                       NA
jsonschema                  4.19.2
jsonschema_specifications   NA
jupyter_events              0.8.0
jupyter_server              2.10.0
jupyterlab_server           2.25.1
kiwisolver                  1.4.5
kneed                       0.8.5
lamin_utils                 0.13.2
legacy_api_wrap             NA
leidenalg                   0.10.2
liana                       1.5.1
lifelines                   0.29.0
llvmlite                    0.43.0
markupsafe                  2.1.1
matplotlib                  3.8.2
matplotlib_inline           0.1.6
memory_profiler             0.61.0
missingno                   0.5.2
mizani                      0.14.2
mpl_toolkits                NA
mpmath                      1.3.0
mudata                      0.3.1
natsort                     7.1.1
nbformat                    5.9.2
networkx                    3.2.1
numba                       0.60.0
numexpr                     2.8.7
numpy                       1.26.2
overrides                   NA
packaging                   23.1
pandas                      2.2.3
parso                       0.8.3
patient_representation      0.1.29
patpy                       0.0.3
patsy                       0.5.4
pexpect                     4.8.0
pickleshare                 0.7.5
pkg_resources               NA
platformdirs                3.10.0
plotnine                    0.15.0
prometheus_client           NA
prompt_toolkit              3.0.36
psutil                      5.9.0
ptyprocess                  0.7.0
pure_eval                   0.2.2
pyarrow                     16.1.0
pycparser                   2.21
pydev_ipython               NA
pydevconsole                NA
pydevd                      2.9.5
pydevd_file_utils           NA
pydevd_plugins              NA
pydevd_tracing              NA
pydot                       2.0.0
pygments                    2.15.1
pynndescent                 0.5.11
pyparsing                   3.1.1
pythonjsonlogger            NA
pytz                        2023.3.post1
rapidfuzz                   3.9.3
referencing                 NA
requests                    2.31.0
rfc3339_validator           0.1.4
rfc3986_validator           0.1.1
rich                        NA
rpds                        NA
scipy                       1.11.4
seaborn                     0.13.2
send2trash                  NA
session_info                1.0.0
setuptools                  68.0.0
six                         1.16.0
sklearn                     1.6.1
sniffio                     1.2.0
socks                       1.7.1
sparse                      0.15.4
stack_data                  0.2.0
statannotations             0.7.2
statsmodels                 0.14.0
sympy                       1.13.1
tableone                    0.8.0
tabulate                    0.9.0
tensorly                    0.8.1
texttable                   1.7.0
thefuzz                     0.22.1
threadpoolctl               3.2.0
torch                       2.5.0
torchgen                    NA
tornado                     6.3.3
tqdm                        4.66.5
traitlets                   5.7.1
typing_extensions           NA
umap                        0.5.5
urllib3                     1.26.18
vscode                      NA
wcwidth                     0.2.5
websocket                   0.58.0
wrapt                       1.16.0
yaml                        6.0.1
zmq                         25.1.0
-----
IPython             8.15.0
jupyter_client      8.6.0
jupyter_core        5.5.0
jupyterlab          4.0.8
notebook            7.0.6
-----
Python 3.11.5 (main, Sep 11 2023, 08:31:25) [Clang 14.0.6 ]
macOS-12.6-arm64-arm-64bit
-----
Session information updated at 2025-07-17 15:35
/Users/vladimir.shitov/miniconda3/envs/2023_12_COPD/lib/python3.11/site-packages/session_info/main.py:213: DeprecationWarning: Accessing jsonschema.__version__ is deprecated and will be removed in a future release. Use importlib.metadata directly to query for jsonschema's version.
/Users/vladimir.shitov/miniconda3/envs/2023_12_COPD/lib/python3.11/site-packages/session_info/main.py:213: DeprecationWarning: Accessing attrs.__version__ is deprecated and will be removed in a future release. Use importlib.metadata directly to query for attrs's packaging metadata.
/Users/vladimir.shitov/miniconda3/envs/2023_12_COPD/lib/python3.11/site-packages/session_info/main.py:213: DeprecationWarning: Accessing attr.__version__ is deprecated and will be removed in a future release. Use importlib.metadata directly to query for attrs's packaging metadata.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions