Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dask integration with napari plugins #178

Open
scharlottej13 opened this issue Jan 8, 2022 · 3 comments
Open

Dask integration with napari plugins #178

scharlottej13 opened this issue Jan 8, 2022 · 3 comments
Labels
meta Issues about the dask community, not individual questions

Comments

@scharlottej13
Copy link
Contributor

Background

Napari, an open-source tool for browsing, annotating, and analyzing large multi-dimensional images, is already well-integrated with Dask and was one of the projects @GenevieveBuckley focused on during her year as a Dask life science fellow (more background on that here). One area for improvement Genevieve noted, was for better integration with Dask early on in project development, which was the impetus for me joining CZI’s napari Plugin Accelerator Kickoff in December 2021. Here are a few projects I think could be relevant for integrating with Dask.

Juan Nunez-Iglesias @ Monash University

  1. zarpaint: manually edit larger-than-RAM segmentations directly on disk. Solves the problem that zarr lacked fancy indexing (which is how napari paints to an array). Could integrate well with Dask because Dask is good at solving larger-than-memory problems. It seems relatively further along in development and there is proof of it already being using "in the wild". Genevieve is a contributor to the repo and could possibly do an intro.
  2. skan: automated skeleton generation and analysis, first used to compare images of cytoskelton from malaria-infected vs. healthy red blood cells. Extra exciting because it has also been used in nuclear materials research and can potentially be used for other problems (e.g. roads, rivers, cracks in materials, etc.). One of their project goals is explicitly to add support for Dask arrays.

Chris Havlin @ University of Illinois

  1. yt-napari plugin: yt is used for analysis and visualization of volumetric datasets (mostly for astrophysical simulation). The plugin hopes to not only support interactive visualization, but also handle validation and ingestion of complex datasets. A prototype exists, with a stable release targeted for June 1st, 2022 (more details here).

Virginie Uhlmann @ European Bioinformatics Institute

  1. splineit: allows interactive spline shaping in napari (think moving a squiggly line in paint), "allows curating segmentation results and prepare training sets in a 'vector graphics' manner" (more details here). I think there is potential for integration with Dask because one current issue they're facing is scalability-- with more than 200 layers, things get very slow and a better data structure is needed for interactive layers.
@GenevieveBuckley
Copy link

GenevieveBuckley commented Jan 9, 2022

Brief comments:

  • zarrpaint: you might end up finding that there isn't a lot of extra work do for Dask integration here. Since zarrpaint directly modifies the zarr file on disk, and Dask already has a from_zarr method, so there might not be much else needed. @jni and @AbigailMcGovern know more about this than me.
  • skan: yes, better Dask integration would be great. @jni and I did some initial work on this, but there's lots more scope for improvement. It could be quite fruitful. Resources:
  • splineit: you might find that if the slowness is caused by having large numbers of napari layers this might be a limitation in the current napari viewer and not something Dask can change. Worth investigating more to find out, though.

Note: the napari plugin accelerator grant program is 6 months, which is why many projects have release dates targeted for June 2022. I guess it's possible CZI might do some no cost grant extensions, but for best results interacting with these groups in the first half of this year would be most productive.

@jni
Copy link

jni commented Jan 10, 2022

Hi @scharlottej13, thanks for this writeup and pleased to meet you! I agree with @GenevieveBuckley's comments; zarpaint in particular relies on writing to bigger-than-RAM arrays, so it's not well suited to dask at the moment. @AbigailMcGovern is working on using the painted arrays to train pytorch networks, and that could work very well with dask.

Re skan, a lot of the work that @GenevieveBuckley and I did might be obsolete soon thanks to a brand new, NumPy-only way to create graphs, which is in skan main branch and which is copied wholesale from this PR to scikit-image. I haven't played around yet with whether dask works out of the box with this approach but it seems to me like it should be easier than the numba-based approach from earlier.

@GenevieveBuckley
Copy link

@AbigailMcGovern is working on using the painted arrays to train pytorch networks, and that could work very well with dask.

Ooh yeah, this would be a great project for Dask engagement.

@ian-r-rose ian-r-rose added the meta Issues about the dask community, not individual questions label Jan 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
meta Issues about the dask community, not individual questions
Projects
None yet
Development

No branches or pull requests

4 participants