-
-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distributed labeling #94
Conversation
This is now working on this tiny example:
If you read this into a dask array with chunks (3, 3), you get the following awesome graph: import numpy as np
from scipy import ndimage as ndi
import dask.array as da
from dask_image.ndmeasure import label
selem = ndi.generate_binary_structure(2, 1)
labeled_array = np.load('labels.npy')
dalabels = da.from_array(labeled_array, chunks=(3, 3))
labeled = label(dalabels, selem)
print(labeled.compute())
labeled.visualize() |
By specifying the chunking of the result from `map_blocks`, we are able to work with an older version of Dask.
As we can concatenate along a different axis, which serves our purpose just as well, go ahead and change the code accordingly to avoid a transpose.
As older versions of NumPy that we support and test against don't include `isin`, switching to using `in1d`, which has been around longer. Since the array in question is already 1-D, there is no need for us to reshape the result after calling `in1d`. So this is sufficient for our use case.
When there is a singleton chunk, there are no shared faces between chunks to construct an adjacency graph from. In this case ensure there is at least an empty array to start with. This doesn't alter the cases where there are multiple chunks that do share faces. Though it does avoid branching when there is a single chunk with no shared faces.
Previously we were incorrectly determining the connected components dtype. This fixes it by inspecting the result on a trivial case and seeing what the dtype is. Then using that to set the delayed type when converting to a Dask Array.
As we are already calling a delayed wrapped copy of `label`, there is no need to use `partial` to bind arguments first. So go ahead and drop `partial` and pass the arguments directly.
As we are already calling a delayed wrapped copy of `connected_components`, there is no need to use `partial` to bind arguments first. So go ahead and drop `partial` and pass the arguments directly.
As we are now making use of the `blocks` accessor of Dask Arrays and this requires a newer version of Dask, bump the minimum version of Dask to 0.18.2.
Ensure that `total` is a `LABEL_DTYPE` scalar. This is needed by `where`, which checks the `dtype` of the arguments it is provided.
Make sure that `0` in `da.where` is also of `LABEL_DTYPE`. This way we can ensure that the array generated by `where` has the expected type and thus avoid using `astype` to copy and cast the array.
Go ahead and make `n` an array in `block_ndi_label_delayed`. This ensures it matches what we expect later. Plus it avoids some boilerplate in `label` that makes things less clear.
Make sure to exercise the test case where a labeled region crosses the chunk boundary in two locations instead of just one. This is done to ensure that the multiple chunk implementation is able to resolve this down to a single labeled region.
Changes the `_utils` module into the `_utils` package. This should make it easier to add more specific collections of utility functions and group them accordingly.
Moves some utility functions from `dask_image.ndmeasure`'s `__init__` used to perform `label` over multiple chunks to `_utils._label`.
This reverts commit 2b2a84e.
By viewing the array as a structured array that merely groups together the other dimensions within the structure, `_unique_axis_0` is able to call NumPy's `unique` function on the array keeping the type information unchanged. Thus if `unique` is able to handle the specific type more efficiently, we benefit from that speed up still. Note: More work would be needed if we wanted to handle F-contiguous arrays, but that is not needed for our use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for working on this with me, @jni! 😄
Have done some tidying as discussed. This looks ready to merge to me. Do you want to give this another look before we merge?
Allow an arbitrary `axis` to specified in `_unique_axis`, but have it default to `0` if not specified. This keeps the previous behavior while making the function more generally useful.
@jakirkham I fixed a minor indentation issue in a doctest, but otherwise feel free to pull the trigger when the builds pass! 🎉 |
Thanks @jni! Merging 😄 |
None of the below guidelines are met, but that's why there's a WIP in the title. =DCurrently
dask_image.label
creates a giant array to do the labeling. This is suboptimal. This presents initial work to do it in a distributed fashion using a graph to relabel independently-labeled blocks.PR template:
your new functionality into a function with a docstring, and add the
feature to the list in README.rst.
https://travis-ci.org/dask/dask-image/pull_requests
and make sure that the tests pass for all supported Python versions.
Fixes #29