-
Notifications
You must be signed in to change notification settings - Fork 24
Description
Summary
When xradar creates a DataTree via open_*_datatree(), the scalar station coordinates latitude, longitude, and altitude are placed on every sweep as scalar coordinates. Since these values are identical across all sweeps (they represent the fixed radar station location), they should instead be placed as coordinates on the root node and inherited by child sweep nodes via xarray's DataTree coordinate inheritance.
Current behavior
In the file-level backends (e.g., nexrad_level2.py), each sweep store creates its own copy of the station coordinates:
# nexrad_level2.py:1519-1532 — open_store_coordinates()
coords = {
"azimuth": Variable((dim,), azimuth, get_azimuth_attrs(), encoding),
"elevation": Variable((dim,), elevation, get_elevation_attrs(), encoding),
"time": Variable((dim,), rtime, rtime_attrs, encoding),
"range": Variable(("range",), rng, range_attrs),
"longitude": Variable((), lon, get_longitude_attrs()), # ← duplicated per sweep
"latitude": Variable((), lat, get_latitude_attrs()), # ← duplicated per sweep
"altitude": Variable((), alt, get_altitude_attrs()), # ← duplicated per sweep
...
}Then in common.py:91-146, _assign_root() copies them to the root as data_vars (via reset_coords()):
root = root.assign({
...
"latitude": sweeps[1]["latitude"],
"longitude": sweeps[1]["longitude"],
"altitude": sweeps[1]["altitude"],
}).reset_coords()Result:
dtree = xd.io.open_nexradlevel2_datatree(filepath)
# lat/lon/alt are scalar coords on EVERY sweep
print("latitude" in dtree["sweep_0"].ds.coords) # True
print("latitude" in dtree["sweep_1"].ds.coords) # True
# (identical values)
# Root has them as data_vars, NOT coords
print("latitude" in dtree.ds.data_vars) # True
print("latitude" in dtree.ds.coords) # FalseProposed behavior
The change should happen at the file backend level — where the sweep stores and root node are constructed:
- Sweep backends (
NexradLevel2BackendEntrypoint,IrisBackendEntrypoint, etc.): Stop creatinglatitude,longitude,altitudeas sweep-level coordinates inopen_store_coordinates() _assign_root()incommon.py: Placelatitude,longitude,altitudeas coordinates on the root node (useset_coords()instead ofreset_coords())- Sweep nodes: Access lat/lon/alt via DataTree coordinate inheritance from root — no duplication
dtree = xd.io.open_nexradlevel2_datatree(filepath)
# Root has lat/lon/alt as COORDINATES
print("latitude" in dtree.ds.coords) # True
# Sweeps inherit from root — no local copy
sweep_ds = dtree["sweep_0"].to_dataset(inherit=False)
print("latitude" in sweep_ds.coords) # False
# But access still works via inheritance
print(dtree["sweep_0"].latitude) # resolves from rootAffected files
The changes are localized to xradar's I/O layer:
| File | Change |
|---|---|
xradar/io/backends/nexrad_level2.py |
Remove lat/lon/alt from open_store_coordinates() |
xradar/io/backends/iris.py |
Same |
xradar/io/backends/odim.py |
Same |
xradar/io/backends/cfradial1.py |
Same |
xradar/io/backends/common.py |
In _assign_root(), use set_coords(["latitude", "longitude", "altitude"]) instead of reset_coords() |
Motivation
1. Storage efficiency for Zarr/Icechunk stores
When writing DataTrees to Zarr, each coordinate becomes a separate array on disk. For a NEXRAD store with 7 VCPs and ~104 sweeps:
- Current: 3 scalars × 104 sweeps = 312 redundant arrays
- Proposed: 3 scalars on root = 3 arrays
2. Open performance
Each redundant array requires metadata reads when opening the store. Benchmarking on a KVNX 6-month Icechunk store (126 groups, 1868 arrays), removing these redundant arrays reduces metadata reads by ~17%.
3. xarray DataTree coordinate inheritance is designed for this
xarray's DataTree coordinate inheritance (pydata/xarray#9077) was implemented specifically for this pattern — parent-level coordinates that apply to all children. Radar station metadata is a textbook case.
4. CfRadial 2.0 compliance
CfRadial 2.0 (Section 4.4) defines latitude, longitude, altitude as root-level variables representing the instrument location. Placing them as root coordinates aligns with the spec's intent.
Workaround
Downstream libraries can post-process the xradar output:
dtree = xd.io.open_nexradlevel2_datatree(filepath)
# Promote root data_vars → coords
root_ds = dtree.to_dataset(inherit=False)
station_vars = {"latitude", "longitude", "altitude"}
promote = station_vars & set(root_ds.data_vars)
if promote:
dtree.ds = root_ds.set_coords(list(promote))
# Drop from sweeps (now inherited)
for node in dtree.subtree:
if node is dtree:
continue
ds = node.to_dataset(inherit=False)
to_drop = station_vars & set(ds.coords)
if to_drop:
dtree[node.path].ds = ds.drop_vars(to_drop)It would be cleaner and more efficient if xradar did this natively at the backend level.
Related
- pydata/xarray#9077 — Coordinate inheritance for DataTree (implemented)
- pydata/xarray#9640 — Slow open_datatree for zarr stores with many coordinate variables