Skip to content

Confused about Zarr2 to Zarr3 conversion #3024

Open
@Fafa87

Description

@Fafa87

I just wanted to leave a trace about my experience with Zarr2 to Zarr3 conversion (including shards).

Basically I wanted to check the new Zarr version - all for the sharding feature.
The idea was just to grab some Zarr2 array that I have and convert it to Zarr3 with x2 / x4 sharding to test the performance.

Well, it took a while because first thing that I did was just to:

store = zarr.storage.LocalStore(single_zarr_path)
group = zarr.open_group(store)
arrays_from_group = list(group.arrays())
data= arrays_from_group[0][1]

store_3 = zarr.storage.LocalStore(output_zarr3_path)
zarr_from_array = zarr.from_array(store_3 , data=data, zarr_format=3)

It worked and I got working Zarr3. So easy.

Then I thought there is a sharding parameter there so I will just fill it (with x4 shards) in and get sharded Zarr3:

store_shards = zarr.storage.LocalStore(output_zarr3sharded_path)
zarr_from_array = zarr.from_array(store_shards, data=data, shards=(1, 1, 1, 4096, 4096), zarr_format=3)

It worked not - although it looked like it did. When I showed info_complete() I got information that compression factor got from 1.4 to 5 and it was susicious. It turned out that most of the files are just empty. Then I wanted to compute sum() of all pixels and it failed with some error about encoding / compression - so I went into that dead-end.

Finally I got to the point that the correct (is it?) the way to do it is to create a new empty array and copy the data:

zarr_with_sharding = zarr.create_array(store_shards, shape=data.shape, dtype=data.dtype, chunks=(1, 1, 1, 1024, 1024),  shards=(1, 1, 1, 4096, 4096), zarr_format=3, overwrite=True)
zarr_with_sharding[:] = data[:]  # assuming that data is smalle

Is there a guideline for people of how to convert their Zarr2 datasets to new Zarr3 with sharding?
For me I did not find anything on that - the only thing is the legacy (?): https://github.com/ome/ome2024-ngff-challenge

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions