Skip to content

implement .chunks on v3 arrays #1929

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jun 3, 2024

Conversation

rabernat
Copy link
Contributor

@rabernat rabernat commented May 29, 2024

Closes #1925

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/tutorial.rst
  • Changes documented in docs/release.rst
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

@@ -343,6 +343,16 @@ def ndim(self) -> int:
def shape(self) -> ChunkCoords:
return self.metadata.shape

@property
def chunks(self) -> ChunkCoords:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a deprecation warning here? I think users should instead look at the chunk grid directly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the chunk grid object is guaranteed to have a consistent structure, beyond {name: str, configuration: dict[str, JSON]}? We might instead want users to look at a normalized (explicit) representation of the chunks, i.e. tuple[tuple[int, ...], ...]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My opinion is that having a backwards compatible .chunks attribute is very important. Deprecating this or changing its behavior will have far-reaching consequences for libraries that depend on Zarr. Given that we only support regular chunk grids so far, this solution feels very reasonable to me. Once other chunk grids are developed this will raise an attribute error, and the user code can handle it some other way (i.e. looking at the chunk grid directly).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rabernat I think your reasoning makes sense, given that irregular chunks are prospective at this point.

Thinking about future changes to support irregular chunks, do you think it would make sense to transform the type of chunks to tuple[tuple[int, ...], ...] (matching the chunks attribute of a dask array), assuming we give a long deprecation warning?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My opinion is that having a backwards compatible .chunks attribute is very important. Deprecating this or changing its behavior will have far-reaching consequences for libraries that depend on Zarr. Given that we only support regular chunk grids so far, this solution feels very reasonable to me. Once other chunk grids are developed this will raise an attribute error, and the user code can handle it some other way (i.e. looking at the chunk grid directly).

Fair enough, we can revisit this once we have other chunk grids.

tuple[tuple[int, ...], ...]

How would that scale if you have billions of chunks?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would that scale if you have billions of chunks?

In that case, we would probably need to switch to a generator of tuple of ints. But I think someone with billions of chunks will have lots of other problems they run into first.

Copy link
Member

@jhamman jhamman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice backward compatible change. We probably want to move users toward something like Array.get_chunk_grid() -> Generator[tuple[int, ...], ...] but for now, this is good for supporting existing applications.

One small suggested change.

Comment on lines 349 to 350
chunk_shape = getattr(self.metadata.chunk_grid, "chunk_shape") # noqa: B009
return chunk_shape
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
chunk_shape = getattr(self.metadata.chunk_grid, "chunk_shape") # noqa: B009
return chunk_shape
return self.metadata.chunk_grid.chunk_shape

B009 is actually useful here.

B009: Do not call getattr(x, 'attr'), instead use normal property access: x.attr. Missing a default to getattr will cause an AttributeError to be raised for non-existent properties. There is no additional safety in using getattr if you know the attribute name ahead of time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok done.

@rabernat
Copy link
Contributor Author

rabernat commented Jun 2, 2024

This is my first PR to the V3 branch.

How are we handling docs? Is there anything to update?

@d-v-b
Copy link
Contributor

d-v-b commented Jun 2, 2024

This is my first PR to the V3 branch.

How are we handling docs? Is there anything to update?

Docs are a (big) tbd, for the moment code changes alone are fine.

@property
def chunks(self) -> ChunkCoords:
try:
return self.metadata.chunk_grid.chunk_shape # type: ignore[attr-defined]
Copy link
Contributor Author

@rabernat rabernat Jun 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fighting with mypy here. My local pre-commit env does not like

# type: ignore[attr-defined, no-any-return]

(says src/zarr/array.py:349: error: Unused "type: ignore[no-any-return]" comment [unused-ignore])

but that seems to be what the CI-based pre-commit linter wants

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a swing at fixing this @rabernat. Let me know what you think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Joe! I like it!

@jhamman jhamman merged commit da9885c into zarr-developers:v3 Jun 3, 2024
18 checks passed
d-v-b added a commit to d-v-b/zarr-python that referenced this pull request Jun 4, 2024
* implement .chunks on v3 arrays

* remove noqa: B009

* make mypy happy

* only return chunks for regular chunk grids

---------

Co-authored-by: Davis Bennett <davis.v.bennett@gmail.com>
Co-authored-by: Joseph Hamman <joe@earthmover.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

v3 arrays have no .chunks attribute
4 participants