You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was originally planning to virtualize this [C]Worthy dataset and save the references using the kerchunk parquet format, but now the timelines have changed such that both icechunk and the [C]Worthy OAE atlas are planned to release on the same day (Oct 15th 2024)! So I could use icechunk's format instead (or just write both)...
I think it's pretty unlikely that virtualizing using icechunk happens by then (I have enough work to do to just release the un-virtualized version of the dataset) but I do need to do all this by December anyway because I submitted this as a talk to AGU 🙃 Regardless of when this dataset is a good real-world test case for icechunk - as I said in zarr-developers/VirtualiZarr#132:
If we can virtualize this we should be able to virtualize most things 💪
Writing virtual references efficiently in bulk (I have ~10 arrays of ~500k virtual chunks to write, and another ~30 arrays with fewer chunks. No groups, all in one group.)
Doing the big metadata-extraction at scale over all the 500k files. There are clever ways we could do this as a parallel tree-reduction (dask.bag, cubed) but getting it done at all is a pre-requisite.
The text was updated successfully, but these errors were encountered:
Should also work with virtual data. Usual CF datasets use int as the raw array dtype and then have attributes like units: days since X, which Xarray / CFTime decode to python datetimes. There is no native datetime type in netcdf.
This issue but for icechunk: zarr-developers/VirtualiZarr#132
I was originally planning to virtualize this [C]Worthy dataset and save the references using the kerchunk parquet format, but now the timelines have changed such that both icechunk and the [C]Worthy OAE atlas are planned to release on the same day (Oct 15th 2024)! So I could use icechunk's format instead (or just write both)...
I think it's pretty unlikely that virtualizing using icechunk happens by then (I have enough work to do to just release the un-virtualized version of the dataset) but I do need to do all this by December anyway because I submitted this as a talk to AGU 🙃 Regardless of when this dataset is a good real-world test case for icechunk - as I said in zarr-developers/VirtualiZarr#132:
Wishlist:
dask.bag
,cubed
) but getting it done at all is a pre-requisite.The text was updated successfully, but these errors were encountered: