best way to write a zarr store with a single chunk #6697
-
I'm working with some small files and I would like to set the chunk size as 1 on the zarr stores. Motivation here is reduce overhead of GET and PUT calls. I'm also working on a large EC2 machine therefore I can afford to read in more data at once. I like the zarr format as it makes IO in the cloud easy. The default behavior of 0.0.0 0.1.0 1.0.0 1.1.0 2.0.0 2.1.0 3.0.0 3.1.0
I currently create a zarr store with a single chunk using
I can't think of a better alternative nor how I could specify this better in |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
@raybellwaves, does the following accomplish what you are looking for? In [22]: chunks = dict.fromkeys(ds.dims, -1)
In [23]: chunks
Out[23]: {'lat': -1, 'time': -1, 'lon': -1}
In [25]: ds.chunk(chunks).to_zarr("/tmp/test2.zarr", consolidated=True)
In [31]: !tree /tmp/test2.zarr
/tmp/test2.zarr
├── air
│ └── 0.0.0
├── lat
│ └── 0
├── lon
│ └── 0
└── time
└── 0 |
Beta Was this translation helpful? Give feedback.
-
I would definitely go with @andersy005's answer above! However, in the scenario where you want to avoid Dask entirely, you can also use ds = xr.tutorial.open_dataset("air_temperature")
for var in ds.variables:
ds[var].encoding['chunks'] = ds[var].shape
ds.to_zarr("test2.zarr", consolidated=True)
!tree test2.zarr
|
Beta Was this translation helpful? Give feedback.
@raybellwaves, does the following accomplish what you are looking for?