-
Notifications
You must be signed in to change notification settings - Fork 40
Description
Is your feature request related to a problem? Please describe.
This may be a tall ask, but it would be great to have GPU-acceleration for single cell modeling. The current standard for highly accurate modeling on large complex human datasets is the MAST program (https://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0844-5), or simply pseudobuling. Wilcoxons, t-test, and others have significant statistical flaws that undermine the accuracy of their results when applied to biological questions (like disease vs healthy and whatnot).
Even at sub-million cell sizes, MAST was slow. At 1+ million cells, it becomes unbearably slow. Being able to run MAST-like analysis in a Dask array-based AnnData would truly unlock complex statistical analysis of large scale scRNAseq analysis
Describe the solution you'd like
Dask-array based statistical modeling of scRNAseq, based on the known principles/variables that have been figured out by the MAST authors.
Dask-array based linear modeling has been implemented here:
https://ml.dask.org/modules/generated/dask_ml.linear_model.LinearRegression.html
Is there a CPU based implementation
A link to an implementation or paper with the suggested functionality