Support running on distributed compute engines like Dask, and Spark via Zap #283

tomwhite · 2018-10-05T15:17:40Z

These changes enable Scanpy's pre-processing functions to run on distributed engines including Dask and Spark. The Spark integration itself relies on Zap for a distributed version of NumPy.

The main change is the materialize_as_ndarray function that is used at certain points of the computation to materialize intermediate results (not the full matrix). This is a no-op in the non-distributed case.

…ia Zap.

falexwolf · 2018-10-05T18:33:09Z

This is awesome! Already very elegant and a very good start! 😄

tomwhite · 2018-10-08T08:10:09Z

Thanks @falexwolf!

Support running on distributed compute engines like Dask, and Spark v…

eea6e10

…ia Zap.

tomwhite mentioned this pull request Oct 5, 2018

Support running on distributed compute engines like Spark via Zap. scverse/anndata#66

Merged

falexwolf merged commit 0a6de90 into scverse:master Oct 5, 2018

flying-sheep mentioned this pull request Aug 4, 2023

Expand dask support #2578

Open

19 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support running on distributed compute engines like Dask, and Spark via Zap #283

Support running on distributed compute engines like Dask, and Spark via Zap #283

Uh oh!

tomwhite commented Oct 5, 2018

Uh oh!

falexwolf commented Oct 5, 2018

Uh oh!

tomwhite commented Oct 8, 2018

Uh oh!

Uh oh!

Support running on distributed compute engines like Dask, and Spark via Zap #283

Support running on distributed compute engines like Dask, and Spark via Zap #283

Uh oh!

Conversation

tomwhite commented Oct 5, 2018

Uh oh!

falexwolf commented Oct 5, 2018

Uh oh!

tomwhite commented Oct 8, 2018

Uh oh!

Uh oh!