Skip to content

Move station coordinates (lat/lon/alt) to root node for DataTree inheritance #331

@aladinor

Description

@aladinor

Summary

When xradar creates a DataTree via open_*_datatree(), the scalar station coordinates latitude, longitude, and altitude are placed on every sweep as scalar coordinates. Since these values are identical across all sweeps (they represent the fixed radar station location), they should instead be placed as coordinates on the root node and inherited by child sweep nodes via xarray's DataTree coordinate inheritance.

Current behavior

In the file-level backends (e.g., nexrad_level2.py), each sweep store creates its own copy of the station coordinates:

# nexrad_level2.py:1519-1532 — open_store_coordinates()
coords = {
    "azimuth": Variable((dim,), azimuth, get_azimuth_attrs(), encoding),
    "elevation": Variable((dim,), elevation, get_elevation_attrs(), encoding),
    "time": Variable((dim,), rtime, rtime_attrs, encoding),
    "range": Variable(("range",), rng, range_attrs),
    "longitude": Variable((), lon, get_longitude_attrs()),   # ← duplicated per sweep
    "latitude": Variable((), lat, get_latitude_attrs()),     # ← duplicated per sweep
    "altitude": Variable((), alt, get_altitude_attrs()),     # ← duplicated per sweep
    ...
}

Then in common.py:91-146, _assign_root() copies them to the root as data_vars (via reset_coords()):

root = root.assign({
    ...
    "latitude": sweeps[1]["latitude"],
    "longitude": sweeps[1]["longitude"],
    "altitude": sweeps[1]["altitude"],
}).reset_coords()

Result:

dtree = xd.io.open_nexradlevel2_datatree(filepath)

# lat/lon/alt are scalar coords on EVERY sweep
print("latitude" in dtree["sweep_0"].ds.coords)  # True
print("latitude" in dtree["sweep_1"].ds.coords)  # True
# (identical values)

# Root has them as data_vars, NOT coords
print("latitude" in dtree.ds.data_vars)  # True
print("latitude" in dtree.ds.coords)     # False

Proposed behavior

The change should happen at the file backend level — where the sweep stores and root node are constructed:

  1. Sweep backends (NexradLevel2BackendEntrypoint, IrisBackendEntrypoint, etc.): Stop creating latitude, longitude, altitude as sweep-level coordinates in open_store_coordinates()
  2. _assign_root() in common.py: Place latitude, longitude, altitude as coordinates on the root node (use set_coords() instead of reset_coords())
  3. Sweep nodes: Access lat/lon/alt via DataTree coordinate inheritance from root — no duplication
dtree = xd.io.open_nexradlevel2_datatree(filepath)

# Root has lat/lon/alt as COORDINATES
print("latitude" in dtree.ds.coords)       # True

# Sweeps inherit from root — no local copy
sweep_ds = dtree["sweep_0"].to_dataset(inherit=False)
print("latitude" in sweep_ds.coords)       # False

# But access still works via inheritance
print(dtree["sweep_0"].latitude)            # resolves from root

Affected files

The changes are localized to xradar's I/O layer:

File Change
xradar/io/backends/nexrad_level2.py Remove lat/lon/alt from open_store_coordinates()
xradar/io/backends/iris.py Same
xradar/io/backends/odim.py Same
xradar/io/backends/cfradial1.py Same
xradar/io/backends/common.py In _assign_root(), use set_coords(["latitude", "longitude", "altitude"]) instead of reset_coords()

Motivation

1. Storage efficiency for Zarr/Icechunk stores

When writing DataTrees to Zarr, each coordinate becomes a separate array on disk. For a NEXRAD store with 7 VCPs and ~104 sweeps:

  • Current: 3 scalars × 104 sweeps = 312 redundant arrays
  • Proposed: 3 scalars on root = 3 arrays

2. Open performance

Each redundant array requires metadata reads when opening the store. Benchmarking on a KVNX 6-month Icechunk store (126 groups, 1868 arrays), removing these redundant arrays reduces metadata reads by ~17%.

3. xarray DataTree coordinate inheritance is designed for this

xarray's DataTree coordinate inheritance (pydata/xarray#9077) was implemented specifically for this pattern — parent-level coordinates that apply to all children. Radar station metadata is a textbook case.

4. CfRadial 2.0 compliance

CfRadial 2.0 (Section 4.4) defines latitude, longitude, altitude as root-level variables representing the instrument location. Placing them as root coordinates aligns with the spec's intent.

Workaround

Downstream libraries can post-process the xradar output:

dtree = xd.io.open_nexradlevel2_datatree(filepath)

# Promote root data_vars → coords
root_ds = dtree.to_dataset(inherit=False)
station_vars = {"latitude", "longitude", "altitude"}
promote = station_vars & set(root_ds.data_vars)
if promote:
    dtree.ds = root_ds.set_coords(list(promote))

# Drop from sweeps (now inherited)
for node in dtree.subtree:
    if node is dtree:
        continue
    ds = node.to_dataset(inherit=False)
    to_drop = station_vars & set(ds.coords)
    if to_drop:
        dtree[node.path].ds = ds.drop_vars(to_drop)

It would be cleaner and more efficient if xradar did this natively at the backend level.

Related

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions