-
Notifications
You must be signed in to change notification settings - Fork 706
Open
Labels
Area - Plotting 🌺Area – Performance 🐌good first issueeasy first issue to get started in OSS community contribution!easy first issue to get started in OSS community contribution!
Milestone
Description
What kind of feature would you like to request?
Additional function parameters / changed functionality / changed defaults?
Please describe your wishes
See #3717 for what prompted me to look at this code.
Currently BasePlot creates a in-memory copy as a dataframe of the main data of interest (obsm, X, layers etc.):
scanpy/src/scanpy/plotting/_baseplot_class.py
Lines 148 to 157 in 0b82c93
| self.categories, self.obs_tidy = _prepare_dataframe( | |
| adata, | |
| self.var_names, | |
| groupby, | |
| use_raw=use_raw, | |
| log=log, | |
| num_categories=num_categories, | |
| layer=layer, | |
| gene_symbols=gene_symbols, | |
| ) |
I believe this to be unnecessary as this dataframe is only ever used for groupby operations, for which we have a zero-copy solution in https://scanpy.readthedocs.io/en/latest/generated/scanpy.get.aggregate.html
Thus we should
- Refactor
Baseplotnot to create a copy - Use the https://scanpy.readthedocs.io/en/latest/generated/scanpy.get.aggregate.html for aggregation
- Ensure this doesn't affect performance
- Integrate (feat):
sc.get.aggregateviadask#3700
Metadata
Metadata
Assignees
Labels
Area - Plotting 🌺Area – Performance 🐌good first issueeasy first issue to get started in OSS community contribution!easy first issue to get started in OSS community contribution!