Skip to content

Fix race condition in from_array for arrays with shards #3217

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

bojidar-bg
Copy link

@bojidar-bg bojidar-bg commented Jul 8, 2025

Fixes #3169.

When passing data via create_array(data = ...), the data is inserted by from_array after getting split by chunks. In current main, when using shards, from_array ends up splitting the data by sub-chunks (due to AsyncArray.chunks returning Metadata.chunks which returns the size of the chunks inside shards instead of the size of the "physical" chunk files), which ends up creating a race condition when the writes to multiple sub-chunk end up writing to the same physical chunk file.
This PR changes from_array to instead split the data by shard in case there are shards.

(Probably, there needs to be a lock somewhere in AsyncArray or ShardCodec which prevents multiple writes to the same shard (or, just physical chunk in general) to execute at the same time)

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/user-guide/*.rst
  • Changes documented as a new file in changes/
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

Copy link

codecov bot commented Jul 8, 2025

Codecov Report

Attention: Patch coverage is 85.71429% with 1 line in your changes missing coverage. Please review.

Project coverage is 94.76%. Comparing base (9d97b24) to head (3d79b1c).

Files with missing lines Patch % Lines
src/zarr/core/array.py 85.71% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3217      +/-   ##
==========================================
- Coverage   94.76%   94.76%   -0.01%     
==========================================
  Files          78       78              
  Lines        8642     8648       +6     
==========================================
+ Hits         8190     8195       +5     
- Misses        452      453       +1     
Files with missing lines Coverage Δ
src/zarr/core/array.py 98.35% <85.71%> (-0.12%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@@ -1265,13 +1287,19 @@ def _iter_chunk_keys(
yield self.metadata.encode_chunk_key(k)

def _iter_chunk_regions(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like _iter_chunk_regions should only iterate over the regions spanned by each chunk. Otherwise the name doesn't fit. So adding a flag to this function that makes it do something different (iterate over the regions spanned by each shard) seems worse than implementing a new _iter_shard_regions method, that does exactly what its name suggests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my general POV is that several well-defined functions is better than a smaller number of functions that try to do a lot. since this is private API, adding functions is cheap, so lets create new functions instead of adding functionality to existing ones in this case

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. "private API" might be the magic word there 😂 There are no other users of _iter_chunk_regions (ignoring the a few unit tests); so- may I directly rename the function to _iter_shard_regions in this case? 😁

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

"Stored and computed checksum do not match" in case when shards large than chunks
2 participants