Fix race condition in from_array for arrays with shards #3217

bojidar-bg · 2025-07-08T14:19:23Z

When passing data via create_array(data = ...), the data is inserted by from_array after getting split by chunks. In current main, when using shards, from_array ends up splitting the data by sub-chunks (due to AsyncArray.chunks returning Metadata.chunks which returns the size of the chunks inside shards instead of the size of the "physical" chunk files), which ends up creating a race condition when the writes to multiple sub-chunk end up writing to the same physical chunk file.
This PR changes from_array to instead split the data by shard in case there are shards.

(Probably, there needs to be a lock somewhere in AsyncArray or ShardCodec which prevents multiple writes to the same shard (or, just physical chunk in general) to execute at the same time)

TODO:

Add unit tests and/or doctests in docstrings
Add docstrings and API docs for any new/modified user-facing classes and functions
New/modified features documented in docs/user-guide/*.rst
Changes documented as a new file in changes/
GitHub Actions have all passed
Test coverage is 100% (Codecov passes)

codecov · 2025-07-08T14:31:58Z

Codecov Report

Attention: Patch coverage is 85.71429% with 1 line in your changes missing coverage. Please review.

Project coverage is 94.76%. Comparing base (9d97b24) to head (3d79b1c).

Files with missing lines	Patch %	Lines
src/zarr/core/array.py	85.71%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3217      +/-   ##
==========================================
- Coverage   94.76%   94.76%   -0.01%     
==========================================
  Files          78       78              
  Lines        8642     8648       +6     
==========================================
+ Hits         8190     8195       +5     
- Misses        452      453       +1

Files with missing lines	Coverage Δ
src/zarr/core/array.py	`98.35% <85.71%> (-0.12%)`	⬇️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

d-v-b · 2025-07-08T14:44:32Z

src/zarr/core/array.py

@@ -1265,13 +1287,19 @@ def _iter_chunk_keys(
            yield self.metadata.encode_chunk_key(k)

    def _iter_chunk_regions(


I feel like _iter_chunk_regions should only iterate over the regions spanned by each chunk. Otherwise the name doesn't fit. So adding a flag to this function that makes it do something different (iterate over the regions spanned by each shard) seems worse than implementing a new _iter_shard_regions method, that does exactly what its name suggests.

my general POV is that several well-defined functions is better than a smaller number of functions that try to do a lot. since this is private API, adding functions is cheap, so lets create new functions instead of adding functionality to existing ones in this case

Hmm.. "private API" might be the magic word there 😂 There are no other users of _iter_chunk_regions (ignoring the a few unit tests); so- may I directly rename the function to _iter_shard_regions in this case? 😁

Fix race condition in from_array for arrays with shards

fe9190c

d-v-b reviewed Jul 8, 2025

View reviewed changes

d-v-b mentioned this pull request Jul 10, 2025

Reading sharded array with multiprocessing produces wrong result #3221

Open

Merge branch 'main' into 3169-shard-race

3d79b1c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix race condition in from_array for arrays with shards #3217

Fix race condition in from_array for arrays with shards #3217

Uh oh!

bojidar-bg commented Jul 8, 2025 •

edited

Loading

Uh oh!

codecov bot commented Jul 8, 2025 •

edited

Loading

Uh oh!

d-v-b Jul 8, 2025

Uh oh!

d-v-b Jul 8, 2025

Uh oh!

bojidar-bg Jul 10, 2025

Uh oh!

Uh oh!

		@@ -1265,13 +1287,19 @@ def _iter_chunk_keys(
		yield self.metadata.encode_chunk_key(k)

		def _iter_chunk_regions(

Uh oh!

Fix race condition in from_array for arrays with shards #3217

Are you sure you want to change the base?

Fix race condition in from_array for arrays with shards #3217

Uh oh!

Conversation

bojidar-bg commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

d-v-b Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

d-v-b Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

bojidar-bg Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bojidar-bg commented Jul 8, 2025 •

edited

Loading

codecov bot commented Jul 8, 2025 •

edited

Loading