Skip to content

Support glob patterns in open_datatree(group=...) for selective group loading #11196

@aladinor

Description

@aladinor

Description

When working with large hierarchical datasets, users often need only a subset of groups. Currently open_datatree(group=...) accepts a single literal path to re-root the tree. This proposal extends group to accept glob patterns (e.g., */sweep_0), filtering which groups are opened without loading the entire tree first.

Use cases

Radar data (NEXRAD): Volume scan files contain dozens of sweep groups per VCP. To analyze only the lowest elevation scan across all volumes:

dt = xr.open_datatree("radar.nc", group="*/sweep_0")

Climate model output (CMIP): Multi-model archives store data in deeply nested hierarchies like /{model}/{experiment}/{variable}. To load only temperature from all models under a specific experiment:

dt = xr.open_datatree("cmip.zarr", group="*/historical/tas")

Or to compare two specific variables across all models:

dt = xr.open_datatree("cmip.zarr", group="*/historical/ta[su]")

Proposed API

When group contains glob metacharacters (*, ?, [), it switches from root-selection mode to filter mode. Matching uses the same engine as DataTree.match() (PurePosixPath.match). Root (/) and all ancestors of matched nodes are always included to form a valid tree.

Behavior summary

group value Behavior
None Load all groups (unchanged)
"VCP-34" (no glob chars) Root selection (unchanged)
"*/sweep_0" (glob chars) Filter mode — only matched groups + ancestors
Pattern matches nothing Root-only tree

Reference

PR #10742 (async DataTree open) provides the base for this work.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions