-
Notifications
You must be signed in to change notification settings - Fork 653
Open
Description
Please make sure these conditions are met
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of scanpy.
- (optional) I have confirmed this bug exists on the main branch of scanpy.
What happened?
During data analysis, a very common operation is to check some markers in particular clusters of the data. Typically I do it by something like:
sc.pl.dotplot(
adata[adata.obs["cell_type"].str.startswith("Macrophage")],
var_names=macrophage_markers,
groupby="cell_type"
)
Surprisingly, this is a very computationally intensive step if a subset of cells is selected. It runs longer than on the full adata
and sometimes even causes a kernel crash if I’m close to the RAM limit. I assume, part of adata
is copied in this step? Would be great to fix this because this is a very frequent operation in the analysis, and I don’t see reasons why it is so expensive
Minimal code sample
cd_4_t_cell_markers = {
"Central memory": ["IL7R", "TCF7", "LEF1", "CCR7", "SELL", "KLF2", "SOCS3"],
"NR3C1": ["NR3C1", "NCOA1", "ZEB2", "AOAH", "PARP8", "CMIP", "CHST11", "RBPJ", "RUNX2", "ADAM19"],
"Naive": ["BACH2", "KLF12", "FOXP1", "PRKCA", "PDE3B", "ABCC1", "SMCHD1"],
"Treg": ["FOXP3", "IL2RA", "CTLA4", "TNFRSF4", "TNFRSF18", "PELI1", "TIGIT", "IKZF2"],
"Effector memory": ["S100A4", "IL32", "LGALS1", "ALOX5AP", "CD52", "CCL5"],
"Resident memory": ["ITGAE", "CD69"],
}
adata = sc.datasets.pbmc3k_processed()
# Uncomment in jupyter notebook
# %%time
sc.pl.dotplot(adata, var_names=cd_4_t_cell_markers, groupby="louvain")
# Run in a separate cell
# %%time
sc.pl.dotplot(
adata[adata.obs["louvain"].str.contains(["T cells"]),
var_names=cd_4_t_cell_markers, groupby="louvain"
)
The second cell runs for longer even though it uses less data. For my dataset (200k cells), a plot for subset of the cells took twice as long – 11.1s vs 23s
Error output
Versions
-----
anndata 0.10.9
scanpy 1.10.3
-----
PIL 10.1.0
adjustText 1.3.0
anyio NA
appnope 0.1.2
asttokens NA
attr 23.1.0
attrs 23.1.0
autograd NA
autograd_gamma NA
babel 2.11.0
backcall 0.2.0
bottleneck 1.4.2
brotli 1.0.9
cachetools 5.3.2
causallearn NA
cell2cell 0.7.4
certifi 2023.11.17
cffi 1.16.0
chardet 5.2.0
charset_normalizer 2.0.4
comm 0.1.2
cryptography 41.0.7
cycler 0.12.1
cython_runtime NA
dateutil 2.8.2
db_dtypes 1.2.0
debugpy 1.6.7
decorator 5.1.1
decoupler 1.8.0
defusedxml 0.7.1
dill 0.3.8
docrep 0.3.2
dowhy 0.11.1
ehrapy 0.8.0
executing 0.8.3
fastjsonschema NA
fhiry 3.2.2
filelock 3.14.0
fontTools 4.46.0
formulaic 1.1.1
future 1.0.0
google NA
grpc 1.64.1
grpc_status NA
h5py 3.9.0
idna 3.4
igraph 0.11.5
imblearn 0.12.3
interface_meta 1.3.0
ipykernel 6.25.0
ipywidgets 8.0.4
jedi 0.18.1
jinja2 3.0.3
joblib 1.3.2
json5 NA
jsonschema 4.19.2
jsonschema_specifications NA
jupyter_events 0.8.0
jupyter_server 2.10.0
jupyterlab_server 2.25.1
kiwisolver 1.4.5
kneed 0.8.5
lamin_utils 0.13.2
legacy_api_wrap NA
leidenalg 0.10.2
liana 1.5.1
lifelines 0.29.0
llvmlite 0.43.0
markupsafe 2.1.1
matplotlib 3.8.2
matplotlib_inline 0.1.6
memory_profiler 0.61.0
missingno 0.5.2
mizani 0.14.2
mpl_toolkits NA
mpmath 1.3.0
mudata 0.3.1
natsort 7.1.1
nbformat 5.9.2
networkx 3.2.1
numba 0.60.0
numexpr 2.8.7
numpy 1.26.2
overrides NA
packaging 23.1
pandas 2.2.3
parso 0.8.3
patient_representation 0.1.29
patpy 0.0.3
patsy 0.5.4
pexpect 4.8.0
pickleshare 0.7.5
pkg_resources NA
platformdirs 3.10.0
plotnine 0.15.0
prometheus_client NA
prompt_toolkit 3.0.36
psutil 5.9.0
ptyprocess 0.7.0
pure_eval 0.2.2
pyarrow 16.1.0
pycparser 2.21
pydev_ipython NA
pydevconsole NA
pydevd 2.9.5
pydevd_file_utils NA
pydevd_plugins NA
pydevd_tracing NA
pydot 2.0.0
pygments 2.15.1
pynndescent 0.5.11
pyparsing 3.1.1
pythonjsonlogger NA
pytz 2023.3.post1
rapidfuzz 3.9.3
referencing NA
requests 2.31.0
rfc3339_validator 0.1.4
rfc3986_validator 0.1.1
rich NA
rpds NA
scipy 1.11.4
seaborn 0.13.2
send2trash NA
session_info 1.0.0
setuptools 68.0.0
six 1.16.0
sklearn 1.6.1
sniffio 1.2.0
socks 1.7.1
sparse 0.15.4
stack_data 0.2.0
statannotations 0.7.2
statsmodels 0.14.0
sympy 1.13.1
tableone 0.8.0
tabulate 0.9.0
tensorly 0.8.1
texttable 1.7.0
thefuzz 0.22.1
threadpoolctl 3.2.0
torch 2.5.0
torchgen NA
tornado 6.3.3
tqdm 4.66.5
traitlets 5.7.1
typing_extensions NA
umap 0.5.5
urllib3 1.26.18
vscode NA
wcwidth 0.2.5
websocket 0.58.0
wrapt 1.16.0
yaml 6.0.1
zmq 25.1.0
-----
IPython 8.15.0
jupyter_client 8.6.0
jupyter_core 5.5.0
jupyterlab 4.0.8
notebook 7.0.6
-----
Python 3.11.5 (main, Sep 11 2023, 08:31:25) [Clang 14.0.6 ]
macOS-12.6-arm64-arm-64bit
-----
Session information updated at 2025-07-17 15:35
/Users/vladimir.shitov/miniconda3/envs/2023_12_COPD/lib/python3.11/site-packages/session_info/main.py:213: DeprecationWarning: Accessing jsonschema.__version__ is deprecated and will be removed in a future release. Use importlib.metadata directly to query for jsonschema's version.
/Users/vladimir.shitov/miniconda3/envs/2023_12_COPD/lib/python3.11/site-packages/session_info/main.py:213: DeprecationWarning: Accessing attrs.__version__ is deprecated and will be removed in a future release. Use importlib.metadata directly to query for attrs's packaging metadata.
/Users/vladimir.shitov/miniconda3/envs/2023_12_COPD/lib/python3.11/site-packages/session_info/main.py:213: DeprecationWarning: Accessing attr.__version__ is deprecated and will be removed in a future release. Use importlib.metadata directly to query for attrs's packaging metadata.
Metadata
Metadata
Assignees
Labels
No labels