Description
What kind of feature would you like to request?
Additional function parameters / changed functionality / changed defaults?
Please describe your wishes
Meta issue tracking scanpy functions without dask support
Related #921
First we should test things more generically:
- add global
array_type
fixture to use in all tests for features below
@ivirshup came up with places where we should prioritize dask support:
-
normalize_total
- seems to work: Fix normalize_total for dask #2466
- has tests: Allow dask arrays to propagate through _normalize_data() #1663
-
log1p
,normalize_per_cell
,filter_cells
/*_genes
:- accept Dask arrays: Support running on distributed compute engines like Dask, and Spark via Zap #283
- Test that they are kept as Dask arrays throughout: Test full Dask support for
log1p
,normalize_per_cell
,filter_cells
/filter_genes
#2814
-
sc.pp.pca
: Dask PCA support #2563 Implement sparsecovariance_eigh
PCA using Dask #3263 -
sc.pp.calculate_qc_metrics
(feat):calculate_qc_metrics
withdask
#3307 sc.pp.highly_variable_genes
:-
sc.tl.rank_genes_groups
: Handle Dask arrays in some utilities #2621 -
sc.pp.subsample
/sc.pp.sample
: so simple it has always been supported - Aggregate: https://flox.readthedocs.io/en/latest/
sc.get.aggregate
with dask #3659
Later:
- scale updates sparse scale #2942
- nearest-neighbors: Nearest Neighbors dask/dask-ml#982 may need to implement ourselves...
- umap: implement ourselves?
- metrics: depednent on squidpy decisions but should be dask-ifiable
- scrublet?