Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Initial work on proper distributed labeling * Use vindex (blech) * Initial untested implementation * _label_adj_graph working * Almost working except relabeling * Add missing imports * Fix indexing * Fix imports and total count * Specify the chunking in `_relabel_components` By specifying the chunking of the result from `map_blocks`, we are able to work with an older version of Dask. * Fix some flake8 errors * Drop transpose of `all_mappings` As we can concatenate along a different axis, which serves our purpose just as well, go ahead and change the code accordingly to avoid a transpose. * Use NumPy's `in1d` instead of `isin` As older versions of NumPy that we support and test against don't include `isin`, switching to using `in1d`, which has been around longer. Since the array in question is already 1-D, there is no need for us to reshape the result after calling `in1d`. So this is sufficient for our use case. * Handle empty adjacency graph for singleton chunk When there is a singleton chunk, there are no shared faces between chunks to construct an adjacency graph from. In this case ensure there is at least an empty array to start with. This doesn't alter the cases where there are multiple chunks that do share faces. Though it does avoid branching when there is a single chunk with no shared faces. * Drop unused `i` from `numblocks` `for`-loop * Fix connected components dtype Previously we were incorrectly determining the connected components dtype. This fixes it by inspecting the result on a trivial case and seeing what the dtype is. Then using that to set the delayed type when converting to a Dask Array. * Fix non-zero increment's type to be `LABEL_DTYPE` * Make sure relabeling array matches label type * Get right index for connected component array * Fix incorrect labeling between multiply-matched labels * Fix test to test equivalent labeling, not identical labeling * Major cleanup of the new label code * Drop `partial` from `block_ndi_label_delayed` As we are already calling a delayed wrapped copy of `label`, there is no need to use `partial` to bind arguments first. So go ahead and drop `partial` and pass the arguments directly. * Drop `partial` from `connected_components_delayed` As we are already calling a delayed wrapped copy of `connected_components`, there is no need to use `partial` to bind arguments first. So go ahead and drop `partial` and pass the arguments directly. * Bump Dask requirement to 0.18.2 As we are now making use of the `blocks` accessor of Dask Arrays and this requires a newer version of Dask, bump the minimum version of Dask to 0.18.2. * Force `total` to a `LABEL_DTYPE` scalar Ensure that `total` is a `LABEL_DTYPE` scalar. This is needed by `where`, which checks the `dtype` of the arguments it is provided. * Force `0` in `da.where` to `LABEL_DTYPE` Make sure that `0` in `da.where` is also of `LABEL_DTYPE`. This way we can ensure that the array generated by `where` has the expected type and thus avoid using `astype` to copy and cast the array. * Make `n` an array in `block_ndi_label_delayed` Go ahead and make `n` an array in `block_ndi_label_delayed`. This ensures it matches what we expect later. Plus it avoids some boilerplate in `label` that makes things less clear. * Ensure `n` is a `LABEL_DTYPE` scalar To make sure that `n` matches the type we want it to be, add a delayed cast to the desired type. Normally `n` would be of type `int`, but we need it to `LABEL_DTYPE` so we can add it to the label images without causing a type conversion. So this fixes that issue as well. * Update tests of `label` for multiple chunks As `label` now supports multiple chunks, drop all but one of the single chunk test cases. Also add a few multiple chunk cases with 2-D and 3-D data. Try a few different structured elements for these different cases. Also make sure there is still one test for the singleton chunk case. Freeze the NumPy random seed and threshold used for constructing the masks based on what seems to work well for different label cases (e.g. labels within a single chunk and labels crossing one or more chunk boundaries). * Test `label` with the "U" case Make sure to exercise the test case where a labeled region crosses the chunk boundary in two locations instead of just one. This is done to ensure that the multiple chunk implementation is able to resolve this down to a single labeled region. * Convert `_utils` module to `_utils` package Changes the `_utils` module into the `_utils` package. This should make it easier to add more specific collections of utility functions and group them accordingly. * Create module stub for label utility functions * Refactor label utility functions Moves some utility functions from `dask_image.ndmeasure`'s `__init__` used to perform `label` over multiple chunks to `_utils._label`. * Drop underscores from externally used functions * Mark some internal utility functions as private * Drop import alias * Use `slices_from_chunks` in `label` * Revert "Bump Dask requirement to 0.18.2" This reverts commit 2b2a84e. * Tweak docstring initial line * Drop unused import * Implement `_unique_axis_0` by with structured view By viewing the array as a structured array that merely groups together the other dimensions within the structure, `_unique_axis_0` is able to call NumPy's `unique` function on the array keeping the type information unchanged. Thus if `unique` is able to handle the specific type more efficiently, we benefit from that speed up still. Note: More work would be needed if we wanted to handle F-contiguous arrays, but that is not needed for our use case. * Use our `_unique_axis_0` implementation * Adjust whitespace in docstrings * Generalize `_unique_axis` Allow an arbitrary `axis` to specified in `_unique_axis`, but have it default to `0` if not specified. This keeps the previous behavior while making the function more generally useful. * Minor indentation fix in docstring
- Loading branch information