Skip to content

Support running on distributed compute engines like Dask, and Spark via Zap #283

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 5, 2018

Conversation

tomwhite
Copy link
Contributor

@tomwhite tomwhite commented Oct 5, 2018

These changes enable Scanpy's pre-processing functions to run on distributed engines including Dask and Spark. The Spark integration itself relies on Zap for a distributed version of NumPy.

The main change is the materialize_as_ndarray function that is used at certain points of the computation to materialize intermediate results (not the full matrix). This is a no-op in the non-distributed case.

@falexwolf
Copy link
Member

This is awesome! Already very elegant and a very good start! 😄

@falexwolf falexwolf merged commit 0a6de90 into scverse:master Oct 5, 2018
@tomwhite
Copy link
Contributor Author

tomwhite commented Oct 8, 2018

Thanks @falexwolf!

@flying-sheep flying-sheep mentioned this pull request Aug 4, 2023
19 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants