-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Description
When working with large hierarchical datasets, users often need only a subset of groups. Currently open_datatree(group=...) accepts a single literal path to re-root the tree. This proposal extends group to accept glob patterns (e.g., */sweep_0), filtering which groups are opened without loading the entire tree first.
Use cases
Radar data (NEXRAD): Volume scan files contain dozens of sweep groups per VCP. To analyze only the lowest elevation scan across all volumes:
dt = xr.open_datatree("radar.nc", group="*/sweep_0")Climate model output (CMIP): Multi-model archives store data in deeply nested hierarchies like /{model}/{experiment}/{variable}. To load only temperature from all models under a specific experiment:
dt = xr.open_datatree("cmip.zarr", group="*/historical/tas")Or to compare two specific variables across all models:
dt = xr.open_datatree("cmip.zarr", group="*/historical/ta[su]")Proposed API
When group contains glob metacharacters (*, ?, [), it switches from root-selection mode to filter mode. Matching uses the same engine as DataTree.match() (PurePosixPath.match). Root (/) and all ancestors of matched nodes are always included to form a valid tree.
Behavior summary
group value |
Behavior |
|---|---|
None |
Load all groups (unchanged) |
"VCP-34" (no glob chars) |
Root selection (unchanged) |
"*/sweep_0" (glob chars) |
Filter mode — only matched groups + ancestors |
| Pattern matches nothing | Root-only tree |
Reference
PR #10742 (async DataTree open) provides the base for this work.