Skip to content

implement .chunks on v3 arrays #1929

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jun 3, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions src/zarr/array.py
Original file line number Diff line number Diff line change
Expand Up @@ -366,6 +366,15 @@ def ndim(self) -> int:
def shape(self) -> ChunkCoords:
return self.metadata.shape

@property
def chunks(self) -> ChunkCoords:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a deprecation warning here? I think users should instead look at the chunk grid directly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the chunk grid object is guaranteed to have a consistent structure, beyond {name: str, configuration: dict[str, JSON]}? We might instead want users to look at a normalized (explicit) representation of the chunks, i.e. tuple[tuple[int, ...], ...]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My opinion is that having a backwards compatible .chunks attribute is very important. Deprecating this or changing its behavior will have far-reaching consequences for libraries that depend on Zarr. Given that we only support regular chunk grids so far, this solution feels very reasonable to me. Once other chunk grids are developed this will raise an attribute error, and the user code can handle it some other way (i.e. looking at the chunk grid directly).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rabernat I think your reasoning makes sense, given that irregular chunks are prospective at this point.

Thinking about future changes to support irregular chunks, do you think it would make sense to transform the type of chunks to tuple[tuple[int, ...], ...] (matching the chunks attribute of a dask array), assuming we give a long deprecation warning?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My opinion is that having a backwards compatible .chunks attribute is very important. Deprecating this or changing its behavior will have far-reaching consequences for libraries that depend on Zarr. Given that we only support regular chunk grids so far, this solution feels very reasonable to me. Once other chunk grids are developed this will raise an attribute error, and the user code can handle it some other way (i.e. looking at the chunk grid directly).

Fair enough, we can revisit this once we have other chunk grids.

tuple[tuple[int, ...], ...]

How would that scale if you have billions of chunks?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would that scale if you have billions of chunks?

In that case, we would probably need to switch to a generator of tuple of ints. But I think someone with billions of chunks will have lots of other problems they run into first.

if isinstance(self.metadata.chunk_grid, RegularChunkGrid):
return self.metadata.chunk_grid.chunk_shape
else:
raise ValueError(
f"chunk attribute is only available for RegularChunkGrid, this array has a {self.metadata.chunk_grid}"
)

@property
def size(self) -> int:
return np.prod(self.metadata.shape).item()
Expand Down Expand Up @@ -641,6 +650,10 @@ def ndim(self) -> int:
def shape(self) -> ChunkCoords:
return self._async_array.shape

@property
def chunks(self) -> ChunkCoords:
return self._async_array.chunks

@property
def size(self) -> int:
return self._async_array.size
Expand Down
2 changes: 1 addition & 1 deletion tests/v3/test_group.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ def test_group(store: MemoryStore | LocalStore) -> None:
assert arr.dtype == data.dtype

# TODO: update this once the array api settles down
# assert arr.chunk_shape == (2, 2)
assert arr.chunks == (2, 2)

bar2 = foo["bar"]
assert dict(bar2.attrs) == {"baz": "qux"}
Expand Down