-
-
Notifications
You must be signed in to change notification settings - Fork 331
implement .chunks on v3 arrays #1929
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -343,6 +343,16 @@ def ndim(self) -> int: | |||
def shape(self) -> ChunkCoords: | |||
return self.metadata.shape | |||
|
|||
@property | |||
def chunks(self) -> ChunkCoords: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a deprecation warning here? I think users should instead look at the chunk grid directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the chunk grid object is guaranteed to have a consistent structure, beyond {name: str, configuration: dict[str, JSON]}
? We might instead want users to look at a normalized (explicit) representation of the chunks, i.e. tuple[tuple[int, ...], ...]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My opinion is that having a backwards compatible .chunks
attribute is very important. Deprecating this or changing its behavior will have far-reaching consequences for libraries that depend on Zarr. Given that we only support regular chunk grids so far, this solution feels very reasonable to me. Once other chunk grids are developed this will raise an attribute error, and the user code can handle it some other way (i.e. looking at the chunk grid directly).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rabernat I think your reasoning makes sense, given that irregular chunks are prospective at this point.
Thinking about future changes to support irregular chunks, do you think it would make sense to transform the type of chunks
to tuple[tuple[int, ...], ...]
(matching the chunks
attribute of a dask array), assuming we give a long deprecation warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My opinion is that having a backwards compatible
.chunks
attribute is very important. Deprecating this or changing its behavior will have far-reaching consequences for libraries that depend on Zarr. Given that we only support regular chunk grids so far, this solution feels very reasonable to me. Once other chunk grids are developed this will raise an attribute error, and the user code can handle it some other way (i.e. looking at the chunk grid directly).
Fair enough, we can revisit this once we have other chunk grids.
tuple[tuple[int, ...], ...]
How would that scale if you have billions of chunks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would that scale if you have billions of chunks?
In that case, we would probably need to switch to a generator of tuple of ints. But I think someone with billions of chunks will have lots of other problems they run into first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice backward compatible change. We probably want to move users toward something like Array.get_chunk_grid() -> Generator[tuple[int, ...], ...]
but for now, this is good for supporting existing applications.
One small suggested change.
src/zarr/array.py
Outdated
chunk_shape = getattr(self.metadata.chunk_grid, "chunk_shape") # noqa: B009 | ||
return chunk_shape |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
chunk_shape = getattr(self.metadata.chunk_grid, "chunk_shape") # noqa: B009 | |
return chunk_shape | |
return self.metadata.chunk_grid.chunk_shape |
B009 is actually useful here.
B009: Do not call getattr(x, 'attr'), instead use normal property access: x.attr. Missing a default to getattr will cause an AttributeError to be raised for non-existent properties. There is no additional safety in using getattr if you know the attribute name ahead of time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok done.
This is my first PR to the V3 branch. How are we handling docs? Is there anything to update? |
Docs are a (big) tbd, for the moment code changes alone are fine. |
src/zarr/array.py
Outdated
@property | ||
def chunks(self) -> ChunkCoords: | ||
try: | ||
return self.metadata.chunk_grid.chunk_shape # type: ignore[attr-defined] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fighting with mypy here. My local pre-commit env does not like
# type: ignore[attr-defined, no-any-return]
(says src/zarr/array.py:349: error: Unused "type: ignore[no-any-return]" comment [unused-ignore]
)
but that seems to be what the CI-based pre-commit linter wants
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a swing at fixing this @rabernat. Let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Joe! I like it!
…nto v3-implement-chunks-attr
* implement .chunks on v3 arrays * remove noqa: B009 * make mypy happy * only return chunks for regular chunk grids --------- Co-authored-by: Davis Bennett <davis.v.bennett@gmail.com> Co-authored-by: Joseph Hamman <joe@earthmover.io>
Closes #1925
TODO:
Add docstrings and API docs for any new/modified user-facing classes and functionsNew/modified features documented in docs/tutorial.rstChanges documented in docs/release.rst