Description
I just wanted to leave a trace about my experience with Zarr2 to Zarr3 conversion (including shards).
Basically I wanted to check the new Zarr version - all for the sharding feature.
The idea was just to grab some Zarr2 array that I have and convert it to Zarr3 with x2 / x4 sharding to test the performance.
Well, it took a while because first thing that I did was just to:
store = zarr.storage.LocalStore(single_zarr_path)
group = zarr.open_group(store)
arrays_from_group = list(group.arrays())
data= arrays_from_group[0][1]
store_3 = zarr.storage.LocalStore(output_zarr3_path)
zarr_from_array = zarr.from_array(store_3 , data=data, zarr_format=3)
It worked and I got working Zarr3. So easy.
Then I thought there is a sharding parameter there so I will just fill it (with x4 shards) in and get sharded Zarr3:
store_shards = zarr.storage.LocalStore(output_zarr3sharded_path)
zarr_from_array = zarr.from_array(store_shards, data=data, shards=(1, 1, 1, 4096, 4096), zarr_format=3)
It worked not - although it looked like it did. When I showed info_complete() I got information that compression factor got from 1.4 to 5 and it was susicious. It turned out that most of the files are just empty. Then I wanted to compute sum() of all pixels and it failed with some error about encoding / compression - so I went into that dead-end.
Finally I got to the point that the correct (is it?) the way to do it is to create a new empty array and copy the data:
zarr_with_sharding = zarr.create_array(store_shards, shape=data.shape, dtype=data.dtype, chunks=(1, 1, 1, 1024, 1024), shards=(1, 1, 1, 4096, 4096), zarr_format=3, overwrite=True)
zarr_with_sharding[:] = data[:] # assuming that data is smalle
Is there a guideline for people of how to convert their Zarr2 datasets to new Zarr3 with sharding?
For me I did not find anything on that - the only thing is the legacy (?): https://github.com/ome/ome2024-ngff-challenge