Skip to content

zarr.array from from an existing zarr.Array #2622

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 71 commits into from
Apr 10, 2025

Conversation

brokkoli71
Copy link
Member

@brokkoli71 brokkoli71 commented Jan 2, 2025

added concurrent streaming of source array into new array

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/tutorial.rst
  • Changes documented in docs/release.rst
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

@brokkoli71 brokkoli71 marked this pull request as draft January 2, 2025 16:54
@brokkoli71
Copy link
Member Author

Do we also want concurrency for different chunk sizes?

@normanrz
Copy link
Member

normanrz commented Jan 8, 2025

Do we also want concurrency for different chunk sizes?

That would be nice, if the chunk sizes are somewhat compatible, i.e. one is a multiple of the other.

@d-v-b
Copy link
Contributor

d-v-b commented Jan 8, 2025

  • (Is there some measure to prevent this that I am not aware of?)

if you are trying to write K input chunks into M output chunks, you can partition your K chunks into sets, where within each set elements can be written independently from all the other elements. then you write each set one after another. in the worst case scenario there will be 1 set per chunk, but you are guaranteed to avoid write collisions this way.

@dstansby dstansby added the needs release notes Automatically applied to PRs which haven't added release notes label Jan 9, 2025
@d-v-b
Copy link
Contributor

d-v-b commented Jan 14, 2025

one question to answer here is what "auto" means for chunks if the user passes in a chunked array, but they want to use zarr-python's auto-chunking instead of the chunks that came with the array.

We might want to use a separate value that means "copy the chunks this object already has", which is distinct from "generate some chunks using the chunking heuristics". maybe something like ChunksLike: Literal['auto'] | Literal['keep'] | ShapeLike?

@brokkoli71
Copy link
Member Author

brokkoli71 commented Jan 15, 2025

one question to answer here is what "auto" means for chunks if the user passes in a chunked array, but they want to use zarr-python's auto-chunking instead of the chunks that came with the array.

Good point! I like the idea of distinguishing between keep and auto.

@github-actions github-actions bot removed the needs release notes Automatically applied to PRs which haven't added release notes label Jan 15, 2025
@brokkoli71 brokkoli71 requested a review from d-v-b February 9, 2025 13:41
@brokkoli71 brokkoli71 requested a review from normanrz April 7, 2025 15:39
Copy link
Member

@normanrz normanrz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is ready. @d-v-b what do you think?

Copy link
Contributor

@d-v-b d-v-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, thanks @brokkoli71

@normanrz normanrz enabled auto-merge (squash) April 10, 2025 13:31
@normanrz normanrz merged commit 018f61d into zarr-developers:main Apr 10, 2025
29 of 30 checks passed
@brokkoli71 brokkoli71 deleted the creation-from-other-zarr branch April 11, 2025 13:12
d-v-b pushed a commit to d-v-b/zarr-python that referenced this pull request Apr 20, 2025
* add creation from other zarr

* remove duplicated tests

* improve test

* test_iter_grid for non-squares

* concurrent streaming for equal chunk sizes

* fix merge

* fix mypy

* fix mypy

* fix test_iter_grid

* extract to zarr.from_array

* fix mypy

* fix mypy

* format

* fix test_creation_from_other_zarr_format

* distinguish between keep and auto for from_array arguments

* partition concurrency along new_array chunks

* fix mypy

* improve test_creation_from_other_zarr_format

* add typing in test

* Update src/zarr/core/array.py

Co-authored-by: Norman Rzepka <code@normanrz.com>

* add from_array with npt.ArrayLike

* add write_data argument

* improve tests

* improve docstrings and add examples

* fix mypy and readthedocs

* fix mypy and readthedocs

* fix mypy and readthedocs

* fix mypy and readthedocs

* fix readthedocs ERROR: Unexpected indentation

* add release notes

* format docstring examples

* add write_data attr to synchronous.create_array

* `create_array` calls `from_array` calls `init_array`

* document changes

* fix serializer from_array v2 to v3

* fix mypy

* improve codecov

* fix mypy

* from_array: copy zarr format on default

* in ``from_array`` make all arguments except ``store`` keyword-only, to match ``create_array``

* in ``from_array`` default shards="keep"

* redundant ``ChunkKeyEncoding | ChunkKeyEncodingLike``

* fix argument order in calls of `from_array`

* fix numpydoc-validation

* add docstring to store2 pytest fixture

* extract `_parse_keep_array_attr` from `from_array`

* extract `_parse_keep_array_attr` from `from_array`

* correct _parse_keep_array_attr

* fix merge

* fix merge

---------

Co-authored-by: Norman Rzepka <code@normanrz.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[v3] zarr.array from from an existing zarr.Array
4 participants