-
-
Notifications
You must be signed in to change notification settings - Fork 282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transform Directory storage to Zip storage. #756
Comments
@eddiecong I 've also tried to zip and open and I couldn't figure it out. I guess the docs are outdated or something. Nevertheless, you should be able to transform the existing store to the Zip format using something like rechunker |
The issue is zip was likely compressed whereas |
@jakirkham is right. ZipStore does indeed default to Line 1450 in 7dc8b05
I guess that we need to use import numpy as np
import pandas as pd
import xarray as xr
import zarr
# create a dataset
lon = np.arange(-180, 180)
lat = np.arange(-90, 91)
timestamps = pd.date_range("2001-01-01", "2001-12-31", name="time", freq="D")
ds = xr.Dataset(
data_vars=dict(
aaa=(
["lon", "lat", "time"],
np.random.randint(0, 101, (len(lon), len(lat), len(timestamps))),
)
),
coords=dict(
lon=lon,
lat=lat,
time=timestamps,
),
)
# store the dataset as zarr
ds.to_zarr("foo.zarr") Now convert to a zip archive using:
And try to open the archive: ds = xr.open_zarr(zarr.ZipStore("foo.zarr.zip")) but this throws a
|
Just get back, thanks so much. I believe it is because ZipStorage will compress by default as @jakirkham said. Here is the suggestion for reference, I wonder if it is the relative path problem that causes the GroupNotFoundError @pmav99. [https://stackoverflow.com/questions/67635491/transform-zarr-directory-storage-to-zip-storage/67675357#67675357] |
@eddiecong Not really. I just tried with absolute paths and it still throws the same error. Could someone try to run the snippet I posted on my previous post? Just to confirm that the issue exists. |
@pmav99, I see the same error with your code before. If I change:
to
it works for me. |
@joshmoore thank you. I confirm that your proposal does indeed work. To make this more clear. If the zip archive contains the outer directory, then the So this fails:
while this works:
AFAIK there is no way to create a suitable zip archive using
|
@pmav99 Thanks for the summary. Finally, we decide to use the LMDB storage format, which supports both reads and writes in multiprocessing, by doing so, we did not have to run the additional cmd to zip the directory. |
Problem description
Hi, sry for bothering, I found this statement inside Zarr official documentation about ZipStorage:
Alternatively, use a DirectoryStore when writing the data, then manually Zip the directory and use the Zip file for subsequent reads.
I am trying to transform a DirectoryStorage format Zarr dataset to a ZipStorage. I use zip operation provided in Linux.
zip -r test.zip test.zarr
here test.zarr is a directory storage dataset including three groups. However, when I try to use the codes above to open it, get the error as below:I wonder if my compression method is wrong, and if there some workarounds to transform directory storage to zip storage or some other DB format, cause when the groups rise, the previous storage has so many nodes and not so convenient to transport. Thanks in advance.
Version and installation information
zarr.__version__
: 2.8.1numcodecs.__version__
: 0.7.3The text was updated successfully, but these errors were encountered: