-
Notifications
You must be signed in to change notification settings - Fork 14
Fix incorrect dimension used for temporal weights generation #749
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes an issue in the temporal group average API where weights were calculated using an incorrect dimension. The changes introduce label-based indexing for calculating temporal weights and add a utility function to identify the correct bounds dimension.
- In xcdat/temporal.py, the subtraction-based weights computation is replaced with a diff along the dynamically determined bounds dimension.
- In xcdat/bounds.py, a new function get_bounds_dim is added to identify valid bounds dimensions.
- Updates in init.py and tests ensure the new functionality is correctly exposed and validated.
Reviewed Changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
xcdat/temporal.py | Replaced fixed indexing with a diff on the correct bounds dimension. |
xcdat/bounds.py | Added get_bounds_dim utility to determine the correct bounds dimension. |
xcdat/init.py | Updated import to include get_bounds_dim. |
tests/test_bounds.py | Added tests for get_bounds_dim functionality. |
Files not reviewed (1)
- docs/api.rst: Language not supported
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #749 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 16 16
Lines 1681 1687 +6
=========================================
+ Hits 1681 1687 +6 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @pochedls, this PR is ready for review. I re-ran your example and it works now. Unit tests have also been added.
Example
# %%
import xcdat as xc
import xarray as xr
# tutorial data
ds = xc.tutorial.open_dataset("tas_amon_access")
dsc = xr.concat((ds, ds, ds), dim="member")
dsc_avg = dsc.temporal.group_average("tas", freq="year")
# %%
# other data
fn = "/p/user_pub/climate_work/pochedley1/extreme2324/tlt90_Amon_E3SM-2-0_historical-ssp370_ensemble_gr_187001-202412.nc"
ds = xc.open_dataset(fn)
ds_avg = ds.temporal.group_average("tlt90", freq="year")
Output
print(ds_avg.time[0:10])
<xarray.DataArray 'time' (time: 10)> Size: 80B
array([cftime.Datetime360Day(1870, 1, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.Datetime360Day(1871, 1, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.Datetime360Day(1872, 1, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.Datetime360Day(1873, 1, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.Datetime360Day(1874, 1, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.Datetime360Day(1875, 1, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.Datetime360Day(1876, 1, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.Datetime360Day(1877, 1, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.Datetime360Day(1878, 1, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.Datetime360Day(1879, 1, 1, 0, 0, 0, 0, has_year_zero=True)],
dtype=object)
Coordinates:
* time (time) object 80B 1870-01-01 00:00:00 ... 1879-01-01 00:00:00
Attributes:
bounds: time_bnds
axis: T
realtopology: linear
print(ds_avg.tlt90[0:10])
<xarray.DataArray 'tlt90' (member: 10, time: 155)> Size: 12kB
array([[-0.11043303, -0.16092832, -0.08829556, ..., 0.79177042,
1.16704312, 0.86311752],
[ 0.11128599, -0.05549273, 0.04180344, ..., 0.93199587,
1.02952424, 1.03610563],
[-0.11125527, -0.06464632, 0.08757703, ..., 0.83790361,
1.01119917, 1.19474977],
...,
[ 0.05384331, 0.04045577, -0.04728429, ..., 0.9439962 ,
1.02731631, 0.76063299],
[ 0.25416844, 0.07389999, 0.23717786, ..., 0.83700803,
1.08495403, 0.85116314],
[ 0.02509733, -0.0937595 , -0.06834454, ..., 1.04069364,
1.14190511, 0.99018879]])
Coordinates:
* member (member) <U9 360B 'r1i1p1f1' 'r2i1p1f1' ... 'r9i1p1f1' 'r10i1p1f1'
* time (time) object 1kB 1870-01-01 00:00:00 ... 2024-01-01 00:00:00
Attributes:
comment: Temperature of the lower boundary of the atmosphere
long_name: tlt Equivalent MSU Brightness Temperature
standard_name: surface_temperature
cell_methods: area: time: mean
cell_measures: area: areacella
units: K
operation: temporal_avg
mode: group_average
freq: year
weighted: True
@pochedls I wonder if assuming dimensional order and length affects the spatial averager/spatial bounds. Is there ever a case where lat/lon bounds have extra dimensions? (e.g., "member" with time bounds). We assume fixed order of dimensions using positional indexing in Line 424 in c198620
Line 496 in c198620
Lines 587 to 601 in c198620
|
@tomvothecoder – I did not realize that was the issue (that The spatial averager does assume that bounds are 2D |
Got it, I'll open another ticket for us to handle this issue with the spatial averager API. UPDATE: #750 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR looks good to me.
- The previous method would take the difference of the time bounds using indexing, which assumed the bounds dimension was in a specific order. It did not take into account more than two dimensions (e.g., member, time, bounds)
c198620
to
58d9791
Compare
Description
This PR addresses an issue in the temporal group average API where weights were calculated using positional indexes to subtract time bounds. This approach mistakenly assumes a fixed order and number of dimensions.
The Issue
For example, in issue #748, the time bounds have dimensions arranged as (member, time, bound). The current logic incorrectly uses the "time" dimension for calculating weights instead of the "bound" dimension. This error ultimately leads to the following downstream issue:
The Solution
To resolve this, the calculation now uses label-based indexing via the new
get_bounds_dim()
function. This change ensures that the correct dimension ("bound") is identified and used, regardless of the order or number of dimensions in the time bounds array.Checklist
If applicable: