Skip to content

Conversation

@cjfuller
Copy link
Collaborator

In a previous commit, I found a small speedup for data that fits in memroy by using numpy arrays in this case rather than dask ones. Interestingly it's faster (TBD under what full range of conditions and why) to then convert this to a dask array before writing out to zarr.

It turns out we can speed up this conversion to a dask array by multiple seconds by providing a name= parameter to the from_array function. If you don't provide a name, it creates one by hashing the data, which for giant stitched images can take several seconds.

After this change, the 8x8fov test case I've been using is down to ~12s (from ~16 before this change and the previous one).

Tested by:

  • ./dev/autofix_lint.sh
  • ./dev/format.sh
  • ./dev/type_check.sh
  • ./dev/run_tests.sh

In the previous commit, I found a small speedup for data that fits in
memroy by using numpy arrays in this case rather than dask ones.
Interestingly it's faster (TBD under what full range of conditions and
why) to then convert this to a dask array before writing out to zarr.

It turns out we can speed up this conversion to a dask array by
multiple seconds by providing a `name=` parameter to the `from_array`
function. If you don't provide a name, it creates one by hashing the
data, which for giant stitched images can take several seconds.

After this change, the 8x8fov test case I've been using is down to ~12s
(from ~16 before this change and the previous one).

Tested by:
- `./dev/autofix_lint.sh`
- `./dev/format.sh`
- `./dev/type_check.sh`
- `./dev/run_tests.sh`
@cjfuller cjfuller merged commit 4bdc823 into main Jan 16, 2025
@cjfuller cjfuller deleted the colin/named_array branch January 16, 2025 23:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants