You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm really interested in moving datasets backed by an object store (S3 in my case). S3 is eventually consistent and so there is an issue whenever you make changes to more than one object at (approximately) the same time since on read you don't know what combination of versions you'll get.
This could be an issue if I update .zarray to grow my array and also update .zattrs with some metadata to reflect this change. On read I could get the new metadata and old shape or via versa. Both which would be bad.
This becomes more pronounced when working complex datasets with coordinates etc, when saved as Zarr by Xarray these end up in different zarrays in the same group. But there is no tie to what version of any object you get and an update then read could result in all kinds of corruption.
Some of this needs to be resolved higher up the tooling (xarray, etc) but I think Zarr development needs to be aware of the challenge and support it.
The text was updated successfully, but these errors were encountered:
I also wonder if dropping .zattrs and making it a property of .zarray or .zgroup would help this somewhat and maybe have other advantages (and disadvantages).
I've written a blog post about this How to (and not to) handle metadata in high momentum datasets so for a more thorough dive please read that but in short:
I'm really interested in moving datasets backed by an object store (S3 in my case). S3 is eventually consistent and so there is an issue whenever you make changes to more than one object at (approximately) the same time since on read you don't know what combination of versions you'll get.
This could be an issue if I update .zarray to grow my array and also update .zattrs with some metadata to reflect this change. On read I could get the new metadata and old shape or via versa. Both which would be bad.
This becomes more pronounced when working complex datasets with coordinates etc, when saved as Zarr by Xarray these end up in different zarrays in the same group. But there is no tie to what version of any object you get and an update then read could result in all kinds of corruption.
Some of this needs to be resolved higher up the tooling (xarray, etc) but I think Zarr development needs to be aware of the challenge and support it.
The text was updated successfully, but these errors were encountered: