Description
Originally posted by @aladinor in #9511 (comment)
Hi everyone,
I've been working with hierarchical structures to store weather radar. We’re leveraging xradar and datatree to manage these datasets efficiently. Currently, we are using the standard WMO Cfradial2.1/FM301 format to build a datatree model using xradar
. Then, the data is stored in Zarr
format.
This data model stores historical weather radar datasets in Zarr
format while supporting real-time updates as radar networks operate continuously. It leverages a Zarr-append pattern for seamless data integration.
I think our data model works, at least in this beta stage; however, as the dataset grows, we’ve noticed longer load times when opening/reading the Zarr
store using open_datatree
. As shown in the following snippet, the time to open the dataset grows as its size increases:
For ~15 GB in size, open_datatree
takes around 5.73 seconds
For ~80 GB in size, open_datatree
takes around 11.6 seconds

I've worked with larger datasets, which take more time to open/read.
The datatree structure contains 11 nodes, each representing a point where live-updating data is appended. This is a minimal reproducible example, in case you want to look at it.
import s3fs
import xarray as xr
from time import time
def main():
print(xr.__version__)
st = time()
## S3 bucket connection
URL = 'https://js2.jetstream-cloud.org:8001/'
path = f'pythia/radar/erad2024'
fs = s3fs.S3FileSystem(anon=True,
client_kwargs=dict(endpoint_url=URL))
file = s3fs.S3Map(f"{path}/zarr_radar/Guaviare_test.zarr", s3=fs)
# opening datatree stored in zarr
dtree = xr.backends.api.open_datatree(
file,
engine='zarr',
consolidated=True,
chunks={}
)
print(f"total time: {time() -st}")
if __name__ == "__main__":
main()
and the output is
2024.9.1.dev23+g52f13d44
total time: 5.198976516723633
For more information about the data model, you can check this raw2zarr
GitHub repo and the poster we presented at the ScyPy conference.