Skip to content

[FEA] Dask-array based statistics on single cell data #412

@MPebworthEpana

Description

@MPebworthEpana

Is your feature request related to a problem? Please describe.
This may be a tall ask, but it would be great to have GPU-acceleration for single cell modeling. The current standard for highly accurate modeling on large complex human datasets is the MAST program (https://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0844-5), or simply pseudobuling. Wilcoxons, t-test, and others have significant statistical flaws that undermine the accuracy of their results when applied to biological questions (like disease vs healthy and whatnot).

Even at sub-million cell sizes, MAST was slow. At 1+ million cells, it becomes unbearably slow. Being able to run MAST-like analysis in a Dask array-based AnnData would truly unlock complex statistical analysis of large scale scRNAseq analysis

Describe the solution you'd like
Dask-array based statistical modeling of scRNAseq, based on the known principles/variables that have been figured out by the MAST authors.

Dask-array based linear modeling has been implemented here:
https://ml.dask.org/modules/generated/dask_ml.linear_model.LinearRegression.html

Is there a CPU based implementation
A link to an implementation or paper with the suggested functionality

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions